another generation
Channel: LLM - Large Language Model discussion
Gemma3 is out: https://blog.google/technology/developers/gemma-3/
Also out, since my last evaluations: Claude 3.7, ChatGPT 4.5, QwQ-32B.
There are a few "smoke tests" I want to run. But, beyond that, I'm not certain I will have the time or interest to do any deep evaluations.
I already know that "8B" models can do some tasks at a reasonable speed, and can't do other tasks. It is very unlikely that the new models will move the needle.
As far as the new very-large models are concerned: my initial impressions have not shown them to be a substantial improvement. There is more "DeepSeek" style internal narrative, but the results are often worse as a result. 💡 ( was it a bad test? do I need to change the prompts? or are they privileging "results that make stupid people think the machine is smart" over accurate results?)