phi4 preliminary thoughts

Initial indications are that phi4:14b is substantially slower than gemma2:9b on a 24GB MacBook Pro, with no noticeable performance difference. 💡( part of the slowness is related to the "two phase response" code I added to work around Ollama's inability to do structured JSON responses correctly. Even adjusting for that, it is twice as slow as other models)

It also still has some of the "excessively long responses" problems that made phi3.5 nearly-unusable on benchmark tasks, and completely unreliable for production tasks. At least these seem to be sensible responses rather than end-of-message token bugs.

Some of the issues may be related to memory pressure, but without Safari running there should be plenty of RAM.

I see no reason to use this model over gemma2:9b or qwen2.5:7b locally.