{"channel":"cities","content":"5/5\r\n\r\nit has been a drudge to get work done.\r\n\r\n----\r\n\r\nthe end-goal is within sight.  one more round of prompt-tuning, a $1 gpt-4.1-nano run, and some rounds of \"LLM consensus checking\".\r\n\r\nthe goal then becomes *applications*.\r\n\r\n# LLM benchmarks. (<xantham> use the *machine* to feed the *machine*) (<red> \"which word has this definition\" questions will be possible at some point.  but not yet.)\r\n# Elementary education. (<red> which words should a 3rd/5th grader know?  be studying?) (<green> so far no useful progress on \"how easy/hard is it to spell this word\")\r\n# Second-language learning. (<red> a \"which is the Chinese for this word in this sentence\" app.)\r\n# Text difficulty.  A smarter metric than Flesch-Kincaid.\r\n\r\n----\r\n\r\nThe \"cosine similarity\" question of \"how similar are these word definitions\" is not yet solved.  I'm not sure I can solve it.\r\n\r\nI can test it; I have several ways of generating embeddings.  And (probably) these can include a sentence as context.\r\n\r\n----\r\n\r\nI also have no solutions for the \"group different word-forms with the same meaning\".  For << jump >>, << jumps >>, << jumped >>, for example.\r\n\r\nThis would be a much more substantial problem for more highly-conjugated languages.  With English, it is almost avoidable.\r\n\r\n----\r\n\r\nAround word 5000, i am seeing << raft >>, << yield >>, << algebra >>, and << pizza >>.\r\n\r\nThis seems correct enough?  \"Algebra\" is more common in encyclopedic contexts, and \"pizza\" doesn't show up in the 19th century corpus at all.\r\n\r\n----\r\n\r\nBut, as far as \"exploration\" is concerned, I am reaching diminishing returns.\r\n\r\nI have one more list of \"see if Claude can do this quickly\".  After that, \"glenora\" will become an inactive project.","created_at":"2025-04-16T18:57:28.551568","id":351,"llm_annotations":{},"parent_id":349,"processed_content":"<p>5/5\r</p>\n<p>it has been a drudge to get work done.\r</p> <hr class=\"section-break\" /> <p>the end-goal is within sight.  one more round of prompt-tuning, a $1 gpt-4.1-nano run, and some rounds of \"LLM consensus checking\".\r</p>\n<p>the goal then becomes <em>applications</em>.\r</p>\n<ul>\n<li class=\"number-list\"> LLM benchmarks. <span class=\"colorblock color-xantham\">\n    <span class=\"sigil\">\ud83d\udd25</span>\n    <span class=\"colortext-content\">( use the <em>machine</em> to feed the <em>machine</em>)</span>\n  </span> <span class=\"colorblock color-red\">\n    <span class=\"sigil\">\ud83d\udca1</span>\n    <span class=\"colortext-content\">( \"which word has this definition\" questions will be possible at some point.  but not yet.)</span>\n  </span>\r</li>\n<li class=\"number-list\"> Elementary education. <span class=\"colorblock color-red\">\n    <span class=\"sigil\">\ud83d\udca1</span>\n    <span class=\"colortext-content\">( which words should a 3rd/5th grader know?  be studying?)</span>\n  </span> <span class=\"colorblock color-green\">\n    <span class=\"sigil\">\u2699\ufe0f</span>\n    <span class=\"colortext-content\">( so far no useful progress on \"how easy/hard is it to spell this word\")</span>\n  </span>\r</li>\n<li class=\"number-list\"> Second-language learning. <span class=\"colorblock color-red\">\n    <span class=\"sigil\">\ud83d\udca1</span>\n    <span class=\"colortext-content\">( a \"which is the Chinese for this word in this sentence\" app.)</span>\n  </span>\r</li>\n<li class=\"number-list\"> Text difficulty.  A smarter metric than Flesch-Kincaid.\r</li>\n</ul> <hr class=\"section-break\" /> <p>The \"cosine similarity\" question of \"how similar are these word definitions\" is not yet solved.  I'm not sure I can solve it.\r</p>\n<p>I can test it; I have several ways of generating embeddings.  And (probably) these can include a sentence as context.\r</p> <hr class=\"section-break\" /> <p>I also have no solutions for the \"group different word-forms with the same meaning\".  For <span class=\"literal-text\">jump</span>, <span class=\"literal-text\">jumps</span>, <span class=\"literal-text\">jumped</span>, for example.\r</p>\n<p>This would be a much more substantial problem for more highly-conjugated languages.  With English, it is almost avoidable.\r</p> <hr class=\"section-break\" /> <p>Around word 5000, i am seeing <span class=\"literal-text\">raft</span>, <span class=\"literal-text\">yield</span>, <span class=\"literal-text\">algebra</span>, and <span class=\"literal-text\">pizza</span>.\r</p>\n<p>This seems correct enough?  \"Algebra\" is more common in encyclopedic contexts, and \"pizza\" doesn't show up in the 19th century corpus at all.\r</p> <hr class=\"section-break\" /> <p>But, as far as \"exploration\" is concerned, I am reaching diminishing returns.\r</p>\n<p>I have one more list of \"see if Claude can do this quickly\".  After that, \"glenora\" will become an inactive project.</p>","quotes":[],"subject":"jamestown (part 5)"}
