What way is the hash represented? Have you looked at NN libraries like faiss [0] and NGT [1]? Those can quite easily handle a nearest neighbor search of 10 million vectors and, from my understanding, they turn their vectors into some kind of hash that is then searched.
The hashes are 16 character hexadecimals represented as strings. Had a quick look at the faiss package and it looks promising. Would consider it for the next versions.
If you're interested in collaboration I'd be happy to help with a prod-focused version. My work has a need for a shardable daemon for dedup tasks. My personal email is in my description and I'm also available via josh@xix.ai.
We also have an image heavy production use case that would be able to yield some nice metrics from this tool.
[0] - https://github.com/facebookresearch/faiss
[1] - https://github.com/yahoojapan/NGT