{"channel":"cities","content":"I'm not sure yet what evaluations I want to do for Claude 3.7 and/or ChatGPT 4.5.\r\n\r\nSome questions I have been pondering asking it to evaluate:\r\n> write the short-story << novelization of chess >>. (<red> the stories generated by the current models have all been fairly similar. two kingdoms, one of ebony and one of ivory. rather than fight a war, they use magical powers.  the explanations for why each piece moves the way it does are entirely post-hoc.)\r\n> answer some questions about << High Physics >>. (<red> Does it even make sense to talk about \"the inside\" of a black hole, considering it would take an infinite amount of time to reach it?)\r\n> write a lexer/parser based on a BNF spec. (<red> these continue to have a few issues)\r\n> improve the graphic design of this website\r\n\r\nOne of the difficulties is that a truly \"frontier\" task cannot be effectively graded by the *machine*, or indeed by non-experts in the field.\r\n\r\n----\r\n\r\neducation thoughts for the week:\r\n> \"reader\" texts.  designed to << train >> elocution.  (<xantham> unique new york.  how now brown cow.)\r\n> is reading \"comprehension\" different from short-term memory?\r\n> the differences between << primary education >> and << secondary education >>.  roughly: secondary (<green> middle school and high school) students should be expected to do a certain amount of \"independent learning\".  students who cannot do so, should be in an alternative program. (<xantham> the \"Least Restrictive Environment\" clause of the IDEA was unavailable for comment.)\r\n> using \"word lists\" to generate << grade level >> texts; also, a better terminology than << grade level >>\r\n\r\n----\r\n\r\nother tasks will be minimal.  it is a short week; I fly out very early on Thursday.","created_at":"2025-03-03T17:00:54.108242","id":272,"llm_annotations":{},"parent_id":265,"processed_content":"<p>I'm not sure yet what evaluations I want to do for Claude 3.7 and/or ChatGPT 4.5.\r</p>\n<p>Some questions I have been pondering asking it to evaluate:\r</p>\n<ul>\n<li class=\"arrow-list\"> write the short-story <span class=\"literal-text\">novelization of chess</span>. <span class=\"colorblock color-red\">\n    <span class=\"sigil\">\ud83d\udca1</span>\n    <span class=\"colortext-content\">( the stories generated by the current models have all been fairly similar. two kingdoms, one of ebony and one of ivory. rather than fight a war, they use magical powers.  the explanations for why each piece moves the way it does are entirely post-hoc.)</span>\n  </span>\r</li>\n<li class=\"arrow-list\"> answer some questions about <span class=\"literal-text\">High Physics</span>. <span class=\"colorblock color-red\">\n    <span class=\"sigil\">\ud83d\udca1</span>\n    <span class=\"colortext-content\">( Does it even make sense to talk about \"the inside\" of a black hole, considering it would take an infinite amount of time to reach it?)</span>\n  </span>\r</li>\n<li class=\"arrow-list\"> write a lexer/parser based on a BNF spec. <span class=\"colorblock color-red\">\n    <span class=\"sigil\">\ud83d\udca1</span>\n    <span class=\"colortext-content\">( these continue to have a few issues)</span>\n  </span>\r</li>\n<li class=\"arrow-list\"> improve the graphic design of this website\r</li>\n</ul>\n<p>One of the difficulties is that a truly \"frontier\" task cannot be effectively graded by the <em>machine</em>, or indeed by non-experts in the field.\r</p> <hr class=\"section-break\" /> <p>education thoughts for the week:\r</p>\n<ul>\n<li class=\"arrow-list\"> \"reader\" texts.  designed to <span class=\"literal-text\">train</span> elocution.  <span class=\"colorblock color-xantham\">\n    <span class=\"sigil\">\ud83d\udd25</span>\n    <span class=\"colortext-content\">( unique new york.  how now brown cow.)</span>\n  </span>\r</li>\n<li class=\"arrow-list\"> is reading \"comprehension\" different from short-term memory?\r</li>\n<li class=\"arrow-list\"> the differences between <span class=\"literal-text\">primary education</span> and <span class=\"literal-text\">secondary education</span>.  roughly: secondary <span class=\"colorblock color-green\">\n    <span class=\"sigil\">\u2699\ufe0f</span>\n    <span class=\"colortext-content\">( middle school and high school)</span>\n  </span> students should be expected to do a certain amount of \"independent learning\".  students who cannot do so, should be in an alternative program. <span class=\"colorblock color-xantham\">\n    <span class=\"sigil\">\ud83d\udd25</span>\n    <span class=\"colortext-content\">( the \"Least Restrictive Environment\" clause of the IDEA was unavailable for comment.)</span>\n  </span>\r</li>\n<li class=\"arrow-list\"> using \"word lists\" to generate <span class=\"literal-text\">grade level</span> texts; also, a better terminology than <span class=\"literal-text\">grade level</span>\r</li>\n</ul> <hr class=\"section-break\" /> <p>other tasks will be minimal.  it is a short week; I fly out very early on Thursday.</p>","quotes":[],"subject":"grenora (part 2)"}