Keyboard shortcuts: E expand all, C collapse all, R restore default, Esc close open panels.

5/5

it has been a drudge to get work done.


the end-goal is within sight. one more round of prompt-tuning, a $1 gpt-4.1-nano run, and some rounds of "LLM consensus checking".

the goal then becomes applications.

  • LLM benchmarks. 🔥 ( use the machine to feed the machine) 💡 ( "which word has this definition" questions will be possible at some point. but not yet.)
  • Elementary education. 💡 ( which words should a 3rd/5th grader know? be studying?) ⚙️ ( so far no useful progress on "how easy/hard is it to spell this word")
  • Second-language learning. 💡 ( a "which is the Chinese for this word in this sentence" app.)
  • Text difficulty. A smarter metric than Flesch-Kincaid.

The "cosine similarity" question of "how similar are these word definitions" is not yet solved. I'm not sure I can solve it.

I can test it; I have several ways of generating embeddings. And (probably) these can include a sentence as context.


I also have no solutions for the "group different word-forms with the same meaning". For jump, jumps, jumped, for example.

This would be a much more substantial problem for more highly-conjugated languages. With English, it is almost avoidable.


Around word 5000, i am seeing raft, yield, algebra, and pizza.

This seems correct enough? "Algebra" is more common in encyclopedic contexts, and "pizza" doesn't show up in the 19th century corpus at all.


But, as far as "exploration" is concerned, I am reaching diminishing returns.

I have one more list of "see if Claude can do this quickly". After that, "glenora" will become an inactive project.