Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That is... astronomically different. Is GPT-5.1 downscaling and losing critical information or something? How could it be so different?




This is my default explanation for visual impairments in LLMs, they're trying to compress the image into about 3000 tokens, you're going to lose a lot in the name of efficiency.

I found much better results with smallish UI elements in large screenshots on GPT by slicing it up manually and feeding them one at a time. I think it does severely lossy downscaling.

It has a rather poor max resolution. Higher resolution images get tiled up to a point. 512 x 512, I think is the max tile size, 2048 x 2048 the max canvas.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: