minot (part 4)

yesterday:

  • three new "greenland" benchmarks: letter count, unit conversion, part-of-speech detection. 💡 ( it should not be a surprise to the contemporary reader that the models struggle the most with "letter count" - how many "r"s are in strawberry.)

still to do:

  • code cleanup 💡 ( the "run" method is written in slightly different form seven times)
  • more benchmarks
  • dashboard UI improvements