> The system used optical character recognition—the same technology that lets you search for a word in a PDF file
That's not correct, at least for "digitally-born PDFs" that were made on a computer and haven't been scanned. In that case, the PDF can be parsed directly, without OCR, to get text. That's what a tool like PyPDF2 does, for example.
That's not correct, at least for "digitally-born PDFs" that were made on a computer and haven't been scanned. In that case, the PDF can be parsed directly, without OCR, to get text. That's what a tool like PyPDF2 does, for example.