I think using vision models for browsing is the wrong approach. It is the same a...

webmaven · 2025-02-05T03:01:19 1738724479

If you're working from the markup rather than the appearance of the page, you're probably increasing the incentives for metacrap, "invisible text spam" and similar tactics.

taneq · 2025-02-05T06:21:12 1738736472

PDFs are more akin to SVG than to a Word document, and the text is often very far from “available”. OCR can be the only way to reconstruct the document as it appears on screen.

jejeyyy77 · 2025-02-05T21:42:31 1738791751

no, websites/pdfs were designed and laid out visually by humans for humans.

if you are just parsing the text you’ve lost a ton of information encoded in the layout/formatting.

that doesn’t even yet consider actual visual assets like graphs/images, etc