I think it's actually due to the fact that Claude isn't available on China, so they wouldn't be able to (legally) replicate how they evaluated the other LLMs (assuming that they didn't just use the numbers reported by each model provider)
Er, I love Claude, but it's only topping one or two benchmarks right now. o3 and Gemini 2.5 are more capable (more "intelligent"); Claude's strengths are in its personality and general workhorse nature.
Agreed, the pricing is just outrageous at the moment. Really hoping Claude 3.8 is on the horizon soon; they just need to match the 1M context size to keep up. Actual code quality seems to be equal between them.