Gotta love how Claude is always conventiently left out of all of these benchmark...

BrunoDCDO · 2025-04-29T02:19:45 1745893185

I think it's actually due to the fact that Claude isn't available on China, so they wouldn't be able to (legally) replicate how they evaluated the other LLMs (assuming that they didn't just use the numbers reported by each model provider)

dimgl · 2025-04-28T23:10:19 1745881819

I'm actually finding Claude 3.7 to be a huge step down from 3.5. I dislike it so much I actually stopped using Claude altogether...

Philpax · 2025-04-28T22:44:43 1745880283

Er, I love Claude, but it's only topping one or two benchmarks right now. o3 and Gemini 2.5 are more capable (more "intelligent"); Claude's strengths are in its personality and general workhorse nature.

chillfox · 2025-04-28T23:49:10 1745884150

Yeah, just a shame their API is consistently overloaded to the point of being useless most of the time (from about midday till late for me).

int_19h · 2025-04-29T03:22:59 1745896979

Gemini Pro 2.5 usually beats Sonnet 3.7 at coding.

ramesh31 · 2025-04-29T03:30:37 1745897437

Agreed, the pricing is just outrageous at the moment. Really hoping Claude 3.8 is on the horizon soon; they just need to match the 1M context size to keep up. Actual code quality seems to be equal between them.