Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Gotta love how Claude is always conventiently left out of all of these benchmark lists. Anthropic really is in a league of their own right now.


I think it's actually due to the fact that Claude isn't available on China, so they wouldn't be able to (legally) replicate how they evaluated the other LLMs (assuming that they didn't just use the numbers reported by each model provider)


I'm actually finding Claude 3.7 to be a huge step down from 3.5. I dislike it so much I actually stopped using Claude altogether...


Er, I love Claude, but it's only topping one or two benchmarks right now. o3 and Gemini 2.5 are more capable (more "intelligent"); Claude's strengths are in its personality and general workhorse nature.


Yeah, just a shame their API is consistently overloaded to the point of being useless most of the time (from about midday till late for me).


Gemini Pro 2.5 usually beats Sonnet 3.7 at coding.


Agreed, the pricing is just outrageous at the moment. Really hoping Claude 3.8 is on the horizon soon; they just need to match the 1M context size to keep up. Actual code quality seems to be equal between them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: