Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And if you just want to go by the pixel data, look into "perceptual hashing". https://github.com/rivo/duplo works quite well for me, even when dealing with watermarks or slight colour correction / sharpening. You could even go further and improve your success rate with Neural Hash or something similar.


is there an option to just calculate image hash (but on image data, not the full file of image data + metadata) without any transforms? So that if it matches you can be 100% certain it's the same image


Unfortunately, (almost all!) image hashing don't detect color differences--they map images to greyscale first. This may be fine for many situations, but it will return the same result for a sepia tint, a full color original with incorrect white balance, and the final result you made after mucking with channels for a couple minutes.

I also found that there really isn't one "best" image hash algorithm. Using _several different_ image hash algos turns out to be only fractionally more expensive during both compute and query times, and substantially improves both precision and recall. I'm using a mean hash, gradient diff, and a DCT, all rendered from all three CIELAB-based layers, so they're sensitive to both brightness and color differences.


The library I posted uses colour information. It won't map to greyscale first.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: