Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seemingly this didn't make frontier models (gpt-o4, gemini-2.5-pro, etc) more likely to give a wrong answer (no stats are reported for failure rates on these models, but slow-down-rate is for similar models), however it does make them think longer sometimes.

https://arxiv.org/pdf/2503.01781



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: