As added context to ensure no benchmark gaming, here a quite impressive Shitaki ...

As added context to ensure no benchmark gaming, here a quite impressive Shitaki Mushroom riding a rowboat: https://imgur.com/Mv4Pi6p

Prompt: https://t3.chat/share/ptaadpg5n8

Claude 4.5 Haiku (Reasoning High) 178.98 token/sec 1691 tokens Time-to-First: 0.69 sec

As a comparison, here Grok 4 Fast, which is one of worst offenders I have encountered in doing very good with a Pelican Bicycle, yet not with other comparable requests: https://imgur.com/tXgAAkb

Prompt: https://t3.chat/share/dcm787gcd3

Grok 4 Fast (Reasoning High) 171.49 token/sec 1291 tokens Time-to-First: 4.5 sec

And GPT-5 for good measure: https://imgur.com/fhn76Pb

Prompt: https://t3.chat/share/ijf1ujpmur

GPT-5 (Reasoning High) 115.11 tok/sec 4598 tokens Time-to-First: 4.5 sec

These are very subjective, naturally, but I personally find Haiku with those spots on the mushroom rather impressive overall. In any case, the delta between publicly known benchmark and modified scenarios evaluating the same basic concepts continues to be smallest with Anthropic models. Heck, sometimes I've seen their models outperform what public benchmarks indicated. Also, seems Time-to-first on Haiku is another notable advantage.