> The system used optical character recognition—the same technology that lets yo... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		croemer on July 4, 2024 \| parent \| context \| favorite \| on: Shipt’s algorithm squeezed gig workers, who fought... > The system used optical character recognition—the same technology that lets you search for a word in a PDF file That's not correct, at least for "digitally-born PDFs" that were made on a computer and haven't been scanned. In that case, the PDF can be parsed directly, without OCR, to get text. That's what a tool like PyPDF2 does, for example.

alwa on July 4, 2024 [–]

It sounds like they were parsing screenshots that workers submitted by SMS

croemer on July 4, 2024 | [–]

I'm not disputing that they used OCR. What's wrong is that searching text in PDFs doesn't usually involve OCR.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact