grenora (part 2)

Replies:

I'm not sure yet what evaluations I want to do for Claude 3.7 and/or ChatGPT 4.5.

Some questions I have been pondering asking it to evaluate:

  • write the short-story novelization of chess. 💡 ( the stories generated by the current models have all been fairly similar. two kingdoms, one of ebony and one of ivory. rather than fight a war, they use magical powers. the explanations for why each piece moves the way it does are entirely post-hoc.)
  • answer some questions about High Physics. 💡 ( Does it even make sense to talk about "the inside" of a black hole, considering it would take an infinite amount of time to reach it?)
  • write a lexer/parser based on a BNF spec. 💡 ( these continue to have a few issues)
  • improve the graphic design of this website

One of the difficulties is that a truly "frontier" task cannot be effectively graded by the machine, or indeed by non-experts in the field.


education thoughts for the week:

  • "reader" texts. designed to train elocution. 🔥 ( unique new york. how now brown cow.)
  • is reading "comprehension" different from short-term memory?
  • the differences between primary education and secondary education. roughly: secondary ⚙️ ( middle school and high school) students should be expected to do a certain amount of "independent learning". students who cannot do so, should be in an alternative program. 🔥 ( the "Least Restrictive Environment" clause of the IDEA was unavailable for comment.)
  • using "word lists" to generate grade level texts; also, a better terminology than grade level

other tasks will be minimal. it is a short week; I fly out very early on Thursday.