With so much recent focus by OpenAI/Google on AI's visual capabilities, does any...

jazzyjackson · on May 14, 2024

You might enjoy this breakdown of the lengths one person went through to take advantage of the iOS vision API and creating a local web service for transcribing some very challenging memes:

https://findthatmeme.com/blog/2023/01/08/image-stacks-and-ip...

discussed on HN:

https://news.ycombinator.com/item?id=34315782

aragonite · on May 14, 2024

This is so good - thanks for sharing this!

nunez · on May 15, 2024

This is a work of fucking art.

thesandlord · on May 14, 2024

We use GPT-4o for data extraction from documents, its really good. I published a small library that does a lot of the document conversion and output parsing: https://npmjs.com/package/llm-document-ocr

For straight OCR, it does work really well but at the end of the day its still not 100%

aragonite · on May 14, 2024

Thanks! look forward to checking this out as soon as I get home.