BLAKE2 seems like a good fit for btrfs checksums -- I'm not sure why they haven'...

arielweisberg · on March 22, 2014

Cryptographic hashes and checksums serve different purposes. CRC32c has a hardware implementation on Intel that can do between 3-4 gigabytes a second per core (depends on if you pipeline instructions) with software implementations doing 800 megabytes a second per core.

Fast cryptographic hashes are more like 100 megabytes a second per core or slower. You can use cryptographic hashes in performance sensitive scenarios like git to get a unique fixed size handle on some bytes, but CRC32c is faster if all you want to detect is random changes.

I wouldn't expect a filesystem to need collision resistant cryptographic hashes.

There is hardware support for SHA-1 and SHA-2 now, but is recent and I haven't seen an implementation or benchmark. I doubt it's as fast as the hardware CRC32c implementation.

kzrdude · on March 22, 2014

Your estimation of BLAKE2's data/second is indeed off by an order of magnitude <https://blake2.net/ >, and the demand is certainly there to have better checksum algorithms in filesystems, both for integrity and deduplication. ZFS for example uses SHA-2.

BLAKE2's authors do say they designed a fast algorithm envisioned for storage, so that's why I think it fits btrfs.

arielweisberg · on March 22, 2014

Replying to all the helpful posters here.

Thanks, that's very information. I checked out the link and it is very cool how blake2 can be tuned for different roles. I hadn't though of how newer filesystems are doing deduplication, my head is stuck in the ext4 era.

I still think performance matters. Even at 800 megabytes a second you are talking about committing two entire cores to checksums on 10-gig E if you need to move data around. An entire core if you are talking about sequentially scanning an SSD. I suppose this will stop mattering as we get more cores.

If a filesystem is using CRC32 for something, it doesn't need the properties of a cryptographic hash or they are doing it wrong. I can see how you can argue against CRC for reliability.

I am not sure whether you actually risk a corrupt block every 1 in 2^32 blocks. Most blocks won't be corrupt so the CRC only has to detect a much smaller number of errors. Assuming every block had an error needing detection you will miss an error every 16 terabytes (assuming other things as well). Assuming 1% of blocks are corrupt you would miss an error every 1.6 petabytes? Maybe I am thinking about this wrong, and I recall other factors like block size effecting CRC's reliability.

harshreality · on March 22, 2014

Forget about collision resistance. What makes me nervous about CRC32c is that it's slower than xxhash in software, approximately the same speed as blake2 in software, yet results in a much larger 1-in-2^32 chance that data could be corrupt yet the corruption won't be detected.

I'm also not sure what sort of risks and attacks cloud storage providers have to deal with, but AWS S3 for instance computes MD5 hashes for each object. If they or any other storage providers need guarantees about collision and preimage resistance, crc32c, or even xxhash, won't suffice (and MD5 may not, either, but the fact that they haven't run screaming from it yet suggests that they don't use it in a way where its known weaknesses matter).

floody-berry · on March 22, 2014

BLAKE2b is <3.5cpb on Sandy Bridge, or ~570mb/s at 2ghz. 5cpb in general on 64 bit x86, 400mb/s. 6-8 cpb in general on 32 bit x86 with SSSE3, 250-333mb/s. It definitely might be overkill for some situations, but it's fairly affordable overkill.

pbsd · on March 22, 2014

FWIW, BLAKE2 does over 1 gigabyte per second on recent Intel chips. With the parallel modes it should be able to reach 3-4 gigabytes per second with multiple cores.