I develop backup software among other things and I can tell you, that a CRC is neat but I've gotten many many reports where a CRC did not catch data corruption (but SHA-2 did). Much fewer reports indicated that the CRC caught it first.
My conclusion: A CRC32 over a couple megs of data is only good enough to likely catch "gross" corruption, like truncating the data, not finer corruption like memory and disks tend to do it, especially when we're talking about bit rot.
Also, unless you're using an optimized implementation -- and zlib is not optimized -- using eg. PCLMUL or at least a slice-by-4/8 variant it's a waste of time, because eg. BLAKE2b isn't much slower than zlib CRC, but will catch All The Corruption!1 [1] (Also, Python 3.6 added the full might of BLAKE2 to the stdlib, although it's the reference implementation and not some fancy SSE4/AVX(2) impl.)
[1] A technical note: There is a tradeoff between checksum length and data length; with a good checksum increasing it's length eventually increases the error rate over the same medium (aka false positives [the checksum is corrupted instead of the data]). However, it's usually more important these days to be certain about correctness and not to minimize total indicated BER over a certain channel.
You make an excellent point: probability says a 4-byte CRC32C provides much weaker guarantees as the length of the message gets longer. These CRCs are typically optimized for pretty short messages, and that is what I had in mind when I wrote that article (e.g. the kind of messages that might get exchanged as part of an online serving system).
I think it's an apple to orange comparison; sha1 needs to be compared to a proper crc160, and sha2 to a proper crc256.
(Proper means one nontrivial cycle e.g. An irreducible generating polynomial of a proper size)
I would be surprised if there's material difference in error detection between sha2 and crc255 (other than the 1 bit difference).
However, crc is useless against an adversary - preimage attacks are trivial. at the same size, it becomes an apple to apple comparison, and whether you choose depends on a speed/preimage requirements.
Or go the other way: compare the crc32 to a 4-byte blake2 hash (blake2 can vary its output length on multiples of 8 bits).
I'd expect that, no matter how many bits were corrupted, the pattern of the corruption, or the size of the file, a 4-byte blake2 hash would have always a 1 in 4294967296 chance of not catching the error; for crc32, the chance of not catching the error would increase with the size of the file or the number of corrupted bits. On the other hand, a CRC is guaranteed to catch certain corruption patterns (like all burst errors of up to a certain size), while for a cryptographic hash like blake2, there's no such guarantee.
(A cryptographic hash sent together with the data, by itself, is also useless against an adversary, since the adversary can change the hash. You need at least a keyed hash to protect against that.)
A crc32 will miss an error if and only if crc32(error-pattern)=0, which happens in 1:2^32, assuming a full-cycle polynomial. While these patterns are easy to characterize, and are independent of the data (neither is true for a crypto hash), the probabilities are the same.
Is it really an apple to orange comparison? CRC can't do everything SHA does, but SHA can do what CRC does, so it's more like an apple to apple-and-orange comparison. And that's an easy comparison: pick the second one! You get a free orange!
Is there something that a proper CRC256 would be better at than SHA-256?
Crc is much simpler; depending on the polynomial, it can be as little as 2 xor gates and a shift register; if you do your own asic/fpga, I suspect it could be 10-100 times smaller and faster than sha256. In a modern CPU, You can do something like 100 bits/clock of CRC, 10-100 times slower for Sha.
The other thing that crc can do that sha256 cannot is an error correcting code of sort; if your errors are limited to a small subset (say, only bursts), you can tabulate the crc of those errors, and then (expected-crc xor computed-crc) is the crc of the error; if it's in your list, you know what the error is. I had actually worked on an device where this was useful some 20 years ago.
That second bit is pretty cool, I didn't know about that. I actually experimented a bit with using cryptographic hashes as an error correcting code, but you have to brute force it, so it's only practical for small amounts of data and small errors.
CRC64 will probably[1] suffice for any data package you'll see on your life. A cryptographic hash of 64 bits won't.
That said, it's probably easier to find a reliable implementation of SHA256 than of some CRC64, so, if the added CPU cost doesn't bother you[2], I'd say SHA256 is the best choice.
1 - CRC40 will likely be enough. Those extra 24 bits are a huge safety margin.
2 - And it normally shouldn't, not even for performance sensitive applications. The hash is very fast, and you are doing it before IO. Except for some fast disappearing niches, it just won't matter.
What would a CRC64 do better than, say, using the first 64 bits of the SHA-256 (other than performance, of course)? I see talk of CRCs always catching contiguous errors smaller than the CRC size, is it just that a CRC is guaranteed to do that and a 64-bit cryptographic hash isn't?
Yes. CRC is linear in the data and the error. This makes it easy to analyze the crc of specific error sets, and if non of them have zero crc, then if the error is in that set it is guaranteed to be found.
Of course it's a bit of an apples to oranges comparison.
An interesting question about CRCs is whether you can implement them efficiently. CRC32 is possible with some help from the CPU (CLMUL as mentioned, or complete instructions for specific polynomials, like crc32c); eg. on Haswell you get to 14-16 GB/s with a CLMUL approach on CRC32. That's obviously much faster than the hashes I mentioned. With the zlib implementation, on the other hand, you only get about one GB/s and there it's really close to B2b already.
Table-based approaches (classic zlib implementation, or slice-by-n, which needs n-times as many tables) would not work well with longer polynomials; the tables would outgrow L1/L2 cache sizes rather quickly. I'd have to think a bit about the tableless CLMUL approach.
zlib does not use crc at all (zip does, but zlib doesn't). It uses adler-32 which shares most properties with crc32 (prob of missing a random error is almost the same, error patterns are easy to analyze and are 16-bit additive and independent of data).
However, because SHA-256 is so widely used, efficient implementations are available off the shelf; I've even seen talk about providing it in hardware on the CPU. Given that, you might as well just go ahead and use it.
Various CPUs have hardware SHA-1/2 support, eg. everything with ARM NEON = many smartphones and ARM chips, even some ARM boards. Most Sun/Oracle SPARC machines have crypto engines that do this, and many other algorithms; the same goes for POWER.
I'm not quite sure why no mainstream x86 extensions have extra stuff for this (VIA PadLock does SHA), but I can take a guess: ARX is already reasonably efficient in software (more so with newer designs, eg. BLAKE2 + AVX2) - unlike AES. And unlike AES there aren't really many problems with side channels, especially when we think about GCM.
>"A CRC32 over a couple megs of data is only good enough to likely catch "gross"
The CRC on TCP MSS or on the ethernet frame would never be higher than 1460 or 1500 byes(or 9K in the case of jumbo frames), how or where would a CRC over multiple megs come into play in regular networking?
CRC is used on jumbo ethernet of of sized of 9K and it works just fine. In fact this is the reason that 9K is the limit on jumbo frames.
The author of the article was arguing that you should add a CRC to the data being sent across TCP/IP. So that would encompass the whole payload, not just per-packet.
My conclusion: A CRC32 over a couple megs of data is only good enough to likely catch "gross" corruption, like truncating the data, not finer corruption like memory and disks tend to do it, especially when we're talking about bit rot.
Also, unless you're using an optimized implementation -- and zlib is not optimized -- using eg. PCLMUL or at least a slice-by-4/8 variant it's a waste of time, because eg. BLAKE2b isn't much slower than zlib CRC, but will catch All The Corruption!1 [1] (Also, Python 3.6 added the full might of BLAKE2 to the stdlib, although it's the reference implementation and not some fancy SSE4/AVX(2) impl.)
[1] A technical note: There is a tradeoff between checksum length and data length; with a good checksum increasing it's length eventually increases the error rate over the same medium (aka false positives [the checksum is corrupted instead of the data]). However, it's usually more important these days to be certain about correctness and not to minimize total indicated BER over a certain channel.