{"channel":"cities","content":"Another round of \"chainsaw coding\" with Claude. (<xantham> the generators now use generators!) (<red> Previously, Claude's approach was to just make one question without history.  But we do want history.  Use all the canned questions from text files (once), then generate files from local data (without repeating), then ask the LLM.)\r\n\r\nIt mostly works.  The LLM generation is slightly flaky.  It gave both \"sad\" and \"melancholy\" as candidate opposites for \"happy\".  It is including the pinyin for Chinese translations *some* of the time.\r\n\r\nBut at least it is (mostly) using the correct interfaces now. (<red> there are still some useless parameters.  The \"tags\" for questions are meaningless.  The difficulty is arbitrary.  The evaluation criteria are often silly.)\r\n\r\n----\r\n\r\nThe various \"general knowledge\" benchmarks should be quick to write (<xantham> tomorrow).  Right now I see six categories for the initial tests:\r\n> History\r\n> Geography\r\n> Chemistry\r\n> Biology\r\n> Sports (<xantham> American athletes, mostly)\r\n> Music (<red> 20th century English-language music, mostly)\r\n\r\nThese will be \"easy\" questions.  Or, at least, multiple-choice.","created_at":"2025-03-31T22:05:35.391383","id":332,"llm_annotations":{},"parent_id":null,"processed_content":"<p>Another round of \"chainsaw coding\" with Claude. <span class=\"colorblock color-xantham\">\n    <span class=\"sigil\">\ud83d\udd25</span>\n    <span class=\"colortext-content\">( the generators now use generators!)</span>\n  </span> <span class=\"colorblock color-red\">\n    <span class=\"sigil\">\ud83d\udca1</span>\n    <span class=\"colortext-content\">( Previously, Claude's approach was to just make one question without history.  But we do want history.  Use all the canned questions from text files (once), then generate files from local data (without repeating), then ask the LLM.)</span>\n  </span>\r</p>\n<p>It mostly works.  The LLM generation is slightly flaky.  It gave both \"sad\" and \"melancholy\" as candidate opposites for \"happy\".  It is including the pinyin for Chinese translations <em>some</em> of the time.\r</p>\n<p>But at least it is (mostly) using the correct interfaces now. <span class=\"colorblock color-red\">\n    <span class=\"sigil\">\ud83d\udca1</span>\n    <span class=\"colortext-content\">( there are still some useless parameters.  The \"tags\" for questions are meaningless.  The difficulty is arbitrary.  The evaluation criteria are often silly.)</span>\n  </span>\r</p> <hr class=\"section-break\" /> <p>The various \"general knowledge\" benchmarks should be quick to write <span class=\"colorblock color-xantham\">\n    <span class=\"sigil\">\ud83d\udd25</span>\n    <span class=\"colortext-content\">( tomorrow)</span>\n  </span>.  Right now I see six categories for the initial tests:\r</p>\n<ul>\n<li class=\"arrow-list\"> History\r</li>\n<li class=\"arrow-list\"> Geography\r</li>\n<li class=\"arrow-list\"> Chemistry\r</li>\n<li class=\"arrow-list\"> Biology\r</li>\n<li class=\"arrow-list\"> Sports <span class=\"colorblock color-xantham\">\n    <span class=\"sigil\">\ud83d\udd25</span>\n    <span class=\"colortext-content\">( American athletes, mostly)</span>\n  </span>\r</li>\n<li class=\"arrow-list\"> Music <span class=\"colorblock color-red\">\n    <span class=\"sigil\">\ud83d\udca1</span>\n    <span class=\"colortext-content\">( 20th century English-language music, mostly)</span>\n  </span>\r</li>\n</ul>\n<p>These will be \"easy\" questions.  Or, at least, multiple-choice.</p>","quotes":[],"subject":"minot (part 5)"}