Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What way is the hash represented? Have you looked at NN libraries like faiss [0] and NGT [1]? Those can quite easily handle a nearest neighbor search of 10 million vectors and, from my understanding, they turn their vectors into some kind of hash that is then searched.

[0] - https://github.com/facebookresearch/faiss

[1] - https://github.com/yahoojapan/NGT



The hashes are 16 character hexadecimals represented as strings. Had a quick look at the faiss package and it looks promising. Would consider it for the next versions.


If you're interested in collaboration I'd be happy to help with a prod-focused version. My work has a need for a shardable daemon for dedup tasks. My personal email is in my description and I'm also available via josh@xix.ai.

We also have an image heavy production use case that would be able to yield some nice metrics from this tool.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: