Breaking CityHash64, MurmurHash2/3, wyhash, and more

MattPalmer1086 · on Nov 3, 2024

A lot of the comments here are getting really hung up on cracking non cryptographic hashes, because they're not supposed to be secure.

It's just a nice demonstration of the properties we want in a cryptographic hash function, by showing how easy it can be to abuse hashes that are not designed with those properties.

aappleby · on Nov 2, 2024

Nooooo my hash functions! :D

-Austin (author of MurmurHash)

tommiegannert · on Nov 3, 2024

Thanks! Bookmarked. This could be a really useful resource when testing for e.g. thundering herd issues in data structures, while not being in control of the hash function.

gnabgib · on Nov 3, 2024

None of these hashes are cryptographic. They're designed to be /fast/ and so therefore easier to break. They're mostly only used in developer-controlled situations (hash maps etc) where you probably control the keys (if you don't.. well you're at risk). The risk/cost of a collision (the break) is decreased performance, not a security failure.

Crypto hashes don't have a problem of being slow.. they are designed to be slow.

The conclusion tries to tie weak/fast hashes to crypto/secure hashes (the https mention), but they're worlds apart.

stouset · on Nov 3, 2024

Cryptographic hashes are not designed to be slow. They are designed to be as fast as possible while still adhering to their security requirements (first and second preimage resistance, collision resistance).

Password hashes are designed to be slow, but those are a very distinct category of hash.

fanf2 · on Nov 3, 2024

The point of this post is to demonstrate how easy it is to generate collisions for HashDoS attacks.

Cryptographic hash functions are usually designed to be fast since they are on the hot path for cryptographic signatures and for hash-based message authentication codes.

Password hashes are designed to be slow, but they are an entirely different beast.

The point about https in the conclusion is that the author’s PolyMur hash uses a polynomial construction related to GCM and poly1305, which makes it much harder to invert.

rcxdude · on Nov 3, 2024

Slowness isn't always a desireable property of cryptographic hashing. Generally you only want that property when you are trying to do some kind of key derivation from a human-generated password. In plenty of circumstances you want them to be fast, but hashes that have the desired security properties are all slower than the hashes designed to be fast and work well enough with non-adversarial input.

astrange · on Nov 3, 2024

> They're mostly only used in developer-controlled situations (hash maps etc) where you probably control the keys (if you don't.. well you're at risk).

It's very common for keys to contain user input.

orlp · on Nov 3, 2024

> None of these hashes are cryptographic.

I never said they were.

> Crypto hashes don't have a problem of being slow.. they are designed to be slow.

No they're not. Only password-based key derivation functions are designed to be slow, every other hash tries to be as fast as possible within their security budget.

> The conclusion tries to tie weak/fast hashes to crypto/secure hashes (the https mention), but they're worlds apart.

Actually, the specific link I made between cryptographic strength universal hashing and maximum-speed universal hashing is precisely because they're not worlds apart. They use the same techniques, and stand on the same mathematical background.

Vecr · on Nov 3, 2024

Depending on the length of what you're hashing, either one of the cut down Siphashes or Blake3 are reasonably fast.

The Siphashes are too short I think for general use, but when cut down and keyed they're okay for a hashmap.

rurban · on Nov 3, 2024

Siphash and Blake3 or NOT reasonably fast, they are dirt slow. They should not be used in hash tables at all. Fighting DOS attacks can only be done with detecting DOS attacks, which is much cheaper than using a slow, better hash function. Count the collisions. And then either stop the attack, or fallback to a tree.

Vecr · on Nov 4, 2024

1) they're not that slow, and 2) I'm talking about the cut down versions of Siphash

rurban · on Nov 4, 2024

They are. See https://rurban.github.io/smhasher/

Fast hashes are around 20.000 MiB/sec, blake2 around 350, siphash13 1800. That's insanely slow, for no benefits.

tedunangst · on Nov 3, 2024

Crypto hashes are not designed to be slow.

rurban · on Nov 3, 2024

It's relative. They are about 1000x slower than fast, good enough hash functions. Dirt slow.

Nobody is talking about password hashes here.

Dylan16807 · on Nov 3, 2024

https://github.com/Cyan4973/xxHash/issues/450

This person's testing has blake3 at 250 cycles for small hashes. It's not a thousand times slower. And for a lot of uses, 250 cycles is plenty fast. That's like one uncached memory access.

That test does mention batching but I don't think it makes a particularly large difference, the paper shows 4 cycles per byte for 64 byte inputs and I don't think they're getting fancy: https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blak...

rurban · on Nov 4, 2024

The fastest realistic hash is o1hash with 11.629.440 MiB/sec. The typical crypto hash is 100—200 MiB/sec. Thats about 10.000x faster.

A modern AES-NI hash is 30.000 MiB/s. That's about 50x faster than blake3 or siphash13. And also not reversible.

Dylan16807 · on Nov 4, 2024

o1hash is a gimmick. And that's way faster than a CPU can load or store data so you can't do anything with it.

When we look at hashes mentioned in this discussion where people are trying to be fast, blake3 is at 3000MB/s when vectorized, and cityhash, murmurhash, and siphash range from 5x faster to 2x slower if I take the smhasher numbers. There's not that much difference.

But also the number we need here is the latency of a small hash, not the max throughput. If blake3 takes 250 cycles for that, how much faster do you think you can go? You can't fit a reasonable hash into 10 cycles. You can't even fit o1hash into 10 cycles. And all of those numbers are pretty fast for most purposes.

AES-NI is a great tool! But isn't that arguing in favor of a crypto hash?

orlp · on Nov 4, 2024

> You can't fit a reasonable hash into 10 cycles.

Not 10 cycles, but my polymur-hash can hash any value <= 49 bytes in 21 cycles on an Apple ARM machine: https://github.com/orlp/polymur-hash/.

I think it's a "reasonable hash" too, considering against non-interactive attackers it's provably secure (collision probability bounded up to n * 2^-60.2 for n bytes).

rurban · on Nov 6, 2024

Yes, polymur is probably the very best right now

TacticalCoder · on Nov 3, 2024

> I will quickly explain some of the basics of hash function security and then show how easy it is to break this security for some commonly used non-cryptographic hash functions.

OK so non-cryptographic hashes pretend to have security now?

> A lot of problems don’t necessarily require secure hash functions, and people would much prefer a faster hash speed.

Ah wait: no you're saying that instead of using a cryptographic hash, one may not require a secure hash so may pick a non-cryptographic hash. So now you're saying that non-cryptographic hashes aren't secure.

So which one is it? Are non-cryptographic hashes secure or not?

Because if they're not secure, I don't see which security there is to be broken there.

I was there when all the Java and PHP webservers could be DoS due to collisions. I don't dispute it's doable to find collision in a non-cryptographic hash.

But I'm not sure I'd define finding collisions in a non-cryptographic hash as "breaking its security".

jnwatson · on Nov 3, 2024

I think you have to give the author a little credit. Sometimes developers pick a non-cryptographic hash and they don't realize they actually need collision resistance, e.g. because an attacker-controlled input can cause an O(1) hash table operation to go O(n).

These kinds of articles are great for red teams, since they actually have to demonstrate the problem instead of just switching out the algorithm.

rcxdude · on Nov 3, 2024

I think the point is "here's the desired properties for a cryptographic hash, and here's a practical demonstration of how non-cryptographic hashes don't have them". I don't think this is meant to be a surprising outcome if you know about the definition of cryptographic hashes, more just an interesting demonstration of how specifically non-cryptographic hashes are non-cryptographic.