A lot of the comments here are getting really hung up on cracking non cryptographic hashes, because they're not supposed to be secure.
It's just a nice demonstration of the properties we want in a cryptographic hash function, by showing how easy it can be to abuse hashes that are not designed with those properties.
Thanks! Bookmarked. This could be a really useful resource when testing for e.g. thundering herd issues in data structures, while not being in control of the hash function.
None of these hashes are cryptographic. They're designed to be /fast/ and so therefore easier to break. They're mostly only used in developer-controlled situations (hash maps etc) where you probably control the keys (if you don't.. well you're at risk). The risk/cost of a collision (the break) is decreased performance, not a security failure.
Crypto hashes don't have a problem of being slow.. they are designed to be slow.
The conclusion tries to tie weak/fast hashes to crypto/secure hashes (the https mention), but they're worlds apart.
Cryptographic hashes are not designed to be slow. They are designed to be as fast as possible while still adhering to their security requirements (first and second preimage resistance, collision resistance).
Password hashes are designed to be slow, but those are a very distinct category of hash.
The point of this post is to demonstrate how easy it is to generate collisions for HashDoS attacks.
Cryptographic hash functions are usually designed to be fast since they are on the hot path for cryptographic signatures and for hash-based message authentication codes.
Password hashes are designed to be slow, but they are an entirely different beast.
The point about https in the conclusion is that the author’s PolyMur hash uses a polynomial construction related to GCM and poly1305, which makes it much harder to invert.
Slowness isn't always a desireable property of cryptographic hashing. Generally you only want that property when you are trying to do some kind of key derivation from a human-generated password. In plenty of circumstances you want them to be fast, but hashes that have the desired security properties are all slower than the hashes designed to be fast and work well enough with non-adversarial input.
> They're mostly only used in developer-controlled situations (hash maps etc) where you probably control the keys (if you don't.. well you're at risk).
> Crypto hashes don't have a problem of being slow.. they are designed to be slow.
No they're not. Only password-based key derivation functions are designed to be slow, every other hash tries to be as fast as possible within their security budget.
> The conclusion tries to tie weak/fast hashes to crypto/secure hashes (the https mention), but they're worlds apart.
Actually, the specific link I made between cryptographic strength universal hashing and maximum-speed universal hashing is precisely because they're not worlds apart. They use the same techniques, and stand on the same mathematical background.
Siphash and Blake3 or NOT reasonably fast, they are dirt slow. They should not be used in hash tables at all. Fighting DOS attacks can only be done with detecting DOS attacks, which is much cheaper than using a slow, better hash function. Count the collisions. And then either stop the attack, or fallback to a tree.
This person's testing has blake3 at 250 cycles for small hashes. It's not a thousand times slower. And for a lot of uses, 250 cycles is plenty fast. That's like one uncached memory access.
o1hash is a gimmick. And that's way faster than a CPU can load or store data so you can't do anything with it.
When we look at hashes mentioned in this discussion where people are trying to be fast, blake3 is at 3000MB/s when vectorized, and cityhash, murmurhash, and siphash range from 5x faster to 2x slower if I take the smhasher numbers. There's not that much difference.
But also the number we need here is the latency of a small hash, not the max throughput. If blake3 takes 250 cycles for that, how much faster do you think you can go? You can't fit a reasonable hash into 10 cycles. You can't even fit o1hash into 10 cycles. And all of those numbers are pretty fast for most purposes.
AES-NI is a great tool! But isn't that arguing in favor of a crypto hash?
I think it's a "reasonable hash" too, considering against non-interactive attackers it's provably secure (collision probability bounded up to n * 2^-60.2 for n bytes).
> I will quickly explain some of the basics of hash function security and then show how easy it is to break this security for some commonly used non-cryptographic hash functions.
OK so non-cryptographic hashes pretend to have security now?
> A lot of problems don’t necessarily require secure hash functions, and people would much prefer a faster hash speed.
Ah wait: no you're saying that instead of using a cryptographic hash, one may not require a secure hash so may pick a non-cryptographic hash. So now you're saying that non-cryptographic hashes aren't secure.
So which one is it? Are non-cryptographic hashes secure or not?
Because if they're not secure, I don't see which security there is to be broken there.
I was there when all the Java and PHP webservers could be DoS due to collisions. I don't dispute it's doable to find collision in a non-cryptographic hash.
But I'm not sure I'd define finding collisions in a non-cryptographic hash as "breaking its security".
I think you have to give the author a little credit. Sometimes developers pick a non-cryptographic hash and they don't realize they actually need collision resistance, e.g. because an attacker-controlled input can cause an O(1) hash table operation to go O(n).
These kinds of articles are great for red teams, since they actually have to demonstrate the problem instead of just switching out the algorithm.
I think the point is "here's the desired properties for a cryptographic hash, and here's a practical demonstration of how non-cryptographic hashes don't have them". I don't think this is meant to be a surprising outcome if you know about the definition of cryptographic hashes, more just an interesting demonstration of how specifically non-cryptographic hashes are non-cryptographic.
It's just a nice demonstration of the properties we want in a cryptographic hash function, by showing how easy it can be to abuse hashes that are not designed with those properties.