Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These OCR improvements will almost certainly be brought to google books, which is great. Long term it can enable compressing all non-digital rare books into a manageable size that can be stored for less than $5,000.[0] It would also be great for archive.org to move to this from Tesseract. I wonder what the cost would be, both in raw cost to run, and via a paid API, to do that.

[0] https://annas-archive.org/blog/critical-window.html





This is a really interesting "data flywheel" -- better model >> more usable data >> even better model

surely there's an upper limit to this though with models literally eating themselves.

We can wait for that to start appearing in tests or benchmarks first.

When a human students learns to read more carefully we don't consider that a negative.

More Data for the Data Gods!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: