proficiencies and quals

In my testing, I am starting to make a distinction between two types of "tests" for LLMs.


A proficiency test covers simple tasks. Some examples:

  • Repeat the misspelled word in this sentence.
  • Translate this word from English to French. 💡 ( the linguistic knowledge of models is an unresolved question. Should it know 5 languages, or 40, or 400? In the specific case of English/French: it is plausible to claim that one cannot truly know the English language without knowing French. The LLM should also know French.)
  • Choose the definition of this word.

The accuracy in performing these tasks is, often, surprisingly bad, compared to performance on other tasks. This may be due to a lack of training for these tasks.


On the other hand, a qual test ⚙️ ( possibly for "qualification") are more difficult.

  • Write two paragraphs about the city of Toulouse.
  • Explain the theory of relativity to a nine-year-old.
  • Answer these questions from the GRE Verbal Reasoning section.

From a technical perspective: many of these are free-form responses that are scored by a larger LLM.

The interesting question is not whether any LLM can answer these, but whether an LLM under 16GB in size can do so.


the Frontier tests are not particularly interesting.