This article was published a long time ago, in March. | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		cubefox 58 days ago \| parent \| context \| favorite \| on: Measuring AI Ability to Complete Long Tasks This article was published a long time ago, in March.

yismail 58 days ago [–]

That's true, but it looks like it's been updated since then because the benchmarks include Claude Opus 4.5

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact