Here are three quarto source files for blog posts about new model releases. A new release of Gemini 2.5 Pro Flash just came out. I'd like to compare: Gemini 2.5 Pro Flash as the reference model, GPT o4-mini, Gemini 2.0 Flash, Gemini 2.5 Pro, GPT 4.1-nano, Claude Sonnet 3.7.
Gemini 2.5 Pro Flash is a "small"er model, which makes it like 2.0 Flash. It's also a thinking model, which makes it like o4-mini. o4-mini, being both cheap and thinking, is probably the closest analogue.
Try to write the source code implementing the eval exactly as I would. As with the o3 and o4-mini post, don't give so much explanation on how the vitals package works / what's happening under the hood. When writing the code, do it exactly as you'd imagine I would; just pattern match what's already there and don't include new code comments. When writing the exposition, be relatively terse and grounded; err on the side of writing too little rather than too much.
The "new Gemini 2.5 Pro update" is the newest blog post—refer to that most c