Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Despite the fact that their models are used in hiring, business, education, etc this multibillion company uses one benchmark with very artificial questions (BBQ) to evaluate how fair their model is. I am a little bit disappointed.


It's because these industries don't create their own benchmarks. The only ones creating evals are the AI company themselves or open source software engineers




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: