{"channel":"cities","content":"tasks for the next 24 hours:\r\n\r\n# come up with a \"cookbook\" corpus (<green> consisting of recipes and descriptions of foods) for wordfreq\r\n# get a LLM-script to take a list of 2000 words and return the LONDON (<green> LONDON is a placeholder; it could be << color >> or << positive adjective >> or << words like devil >>) words.\r\n# write a script to print the \"top 2000 words by part-of-speech\". (<orange> well, actually, I already *have* these lists ... from the previous version of the database) (<red> maybe \"find the list\" is more accurate)\r\n# write a script that will populate a few of the \"sub-dictionaries\" in the new format (<red> Countries, Nationalities, Numbers, and Colors will be the first 4, as they are fairly easy to check for completeness)\r\n\r\n(<red> will the \"Trakaido sub-dictionaries\" be the canonical source-of-truth for what the GUIDs are?  A flat-file is more cumbersome than a database for adding languages, linking to << derivative forms >>, etc.  But, it is easier for humans to read, and to put in Git repos.)\r\n\r\n----\r\n\r\ntasks for the 48 hours after that:\r\n# ensure the categories are stored in the << Lemma >> table.\r\n# re-assess the \"GrammaticalForm\" enum, because it doesn't work across languages.  Maybe it needs to be \"EnglishGrammaticalForm\", \"LithuanianGrammaticalForm\", etc.\r\n# generate \"verb forms\" - which requires some form of \"aggregation\" of WordToken entries\r\n# generate all the \"Level 1-5\" entries from wordfreq (<red> right now, colors like << orange >> are excluded because they are recent borrowings in Lithuanian.  this is a very language-specific choice.)\r\n# consider how to handle \"phrases\" (<red> if you learn << Malonu susipa\u017einti >> before << Malonu >>, it's a phrase) and sentences","created_at":"2025-07-25T15:57:07.455679","id":638,"llm_annotations":{},"parent_id":636,"processed_content":"<p>tasks for the next 24 hours:\r</p>\n<ul>\n<li class=\"number-list\"> come up with a \"cookbook\" corpus <span class=\"colorblock color-green\"><span class=\"sigil\">\u2699\ufe0f</span><span class=\"colortext-content\"> consisting of recipes and descriptions of foods</span></span> for wordfreq\r</li>\n<li class=\"number-list\"> get a LLM-script to take a list of 2000 words and return the LONDON <span class=\"colorblock color-green\"><span class=\"sigil\">\u2699\ufe0f</span><span class=\"colortext-content\"> LONDON is a placeholder; it could be <span class=\"literal-text\">color</span> or <span class=\"literal-text\">positive adjective</span> or <span class=\"literal-text\">words like devil</span></span></span> words.\r</li>\n<li class=\"number-list\"> write a script to print the \"top 2000 words by part-of-speech\". <span class=\"colorblock color-orange\"><span class=\"sigil\">\u2694\ufe0f</span><span class=\"colortext-content\"> well, actually, I already <em>have</em> these lists ... from the previous version of the database</span></span> <span class=\"colorblock color-red\"><span class=\"sigil\">\ud83d\udca1</span><span class=\"colortext-content\"> maybe \"find the list\" is more accurate</span></span>\r</li>\n<li class=\"number-list\"> write a script that will populate a few of the \"sub-dictionaries\" in the new format <span class=\"colorblock color-red\"><span class=\"sigil\">\ud83d\udca1</span><span class=\"colortext-content\"> Countries, Nationalities, Numbers, and Colors will be the first 4, as they are fairly easy to check for completeness</span></span>\r</li>\n</ul>\n<p><span class=\"colorblock color-red\"><span class=\"sigil\">\ud83d\udca1</span><span class=\"colortext-content\"> will the \"Trakaido sub-dictionaries\" be the canonical source-of-truth for what the GUIDs are?  A flat-file is more cumbersome than a database for adding languages, linking to <span class=\"literal-text\">derivative forms</span>, etc.  But, it is easier for humans to read, and to put in Git repos.</span></span>\r</p>\n<hr class=\"section-break\" />\n<p>tasks for the 48 hours after that:\r</p>\n<ul>\n<li class=\"number-list\"> ensure the categories are stored in the <span class=\"literal-text\">Lemma</span> table.\r</li>\n<li class=\"number-list\"> re-assess the \"GrammaticalForm\" enum, because it doesn't work across languages.  Maybe it needs to be \"EnglishGrammaticalForm\", \"LithuanianGrammaticalForm\", etc.\r</li>\n<li class=\"number-list\"> generate \"verb forms\" - which requires some form of \"aggregation\" of WordToken entries\r</li>\n<li class=\"number-list\"> generate all the \"Level 1-5\" entries from wordfreq <span class=\"colorblock color-red\"><span class=\"sigil\">\ud83d\udca1</span><span class=\"colortext-content\"> right now, colors like <span class=\"literal-text\">orange</span> are excluded because they are recent borrowings in Lithuanian.  this is a very language-specific choice.</span></span>\r</li>\n<li class=\"number-list\"> consider how to handle \"phrases\" <span class=\"colorblock color-red\"><span class=\"sigil\">\ud83d\udca1</span><span class=\"colortext-content\"> if you learn <span class=\"literal-text\">Malonu susipa\u017einti</span> before <span class=\"literal-text\">Malonu</span>, it's a phrase</span></span> and sentences</li>\n</ul>","quotes":[],"subject":"lake andes, part 3"}
