minot (part 3)
Channel: Cities - Project Journal
In reply to: minot (part 2) (View Chain)
today: https://spaceship.computer/greenland/model_summary.html
These are "proficiency" metrics. 🔥 ( although, every time I use the word "proficiency" I want to change it)
They are simple tasks, currently: translate a word, choose a definition, choose an antonym, find the misspelled word. And, for the >4B models, as long as the model knows the language, it does fairly well. The 1B models do have some difficulties.
The timing data is interesting. It is, roughly, a linear relation to model size. The 9B models are about 4 times slower than the 1B models. Phi-4 (the largest model tested) is also very clearly the slowest model.
Some of the models I was looking at before (Granite, ExaONE, Hermes, Tulu, Mistral) did not make this round of tests. For Mistral, the 12B model is too old, and their newest release, at 24B, is too large. The others didn't distinguish themselves enough from similar Llama models to be worth my time (and hard-drive space).
remaining todo:
- standardize the logging of prompts and responses. the full text ⚙️ ( that is, including the system prompt) should be stored.
- fix the benchmarks. some of the definitions are too similar. ⚙️ ( previously we had kingdom and realm as choices. now the closest is honest and sincere.) some of the translations are still a bit rough. 💡 ( the translation of "beautiful" into French is beau/belle, the LLMs are very reasonably just returning "beau" as the translation)
- fix the model warming. Just calling the "warm model" function correctly doesn't do enough warming.
- add additional tests. hopefully now it will take less than 1 hour to make new tests.
some of the suggestions regarding new tests:
Part of Speech Tagging - Present a sentence and ask the model to identify the part of speech (noun, verb, adjective, etc.) for a specific word.
Unit Conversion - Test ability to convert between simple units (kilometers to miles, pounds to kilograms).
Analogies - Simple analogies like "day is to night as hot is to ___".
Tense Transformation - Provide a sentence in one tense and ask the model to convert it to another tense.
Active/Passive Voice Conversion - Convert sentences between active and passive voice.