Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GPT4 is much stronger though. You’re comparing apple to oranges.


https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

1250 vs 1253 ELO, is that really "much stronger"?


Yes, it is much stronger from personal experience on real world queries. A lot less hallucinations, more ability to answer nontrivial questions, a lot more coverage of the tail. Which is not surprising for a much larger model, but unlikely to make much of a difference in largely superficial evals. Source: I personally use both, as well as Anthropic models multiple times daily, and use them in batch use cases as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: