"You mean due to exhausting their endurance? This is something you monitor, you should have plenty of time to replace the drives before it becomes a concern.
For other failures, how's it going to be any different from normal HD's? There's always the risk that having the same models/batches in the same conditions might lead to a cluster of failures, but RAID's still likely to save you from plenty of other failure modes."
He's talking about something else. Physical drives fail due to physical reasons, but SSDs fail for logical reasons.
A weird firmware bug or an unexpected usage pattern or wear pattern (not just burning the entire things lifetime up) can cause the SSD to die.
If you have a mirror, and you have two identical SSDs, and you produce such a condition ... then no more mirror. They both die identically.
See my response to the parent about how you can easily address this by mixing SSDs in mirrors.
> Physical drives fail due to physical reasons, but SSDs fail for logical reasons.
HDD's also have a long and distinguished history of nasty firmware bugs, and this is only going to get worse as things like drive-managed SMR and hybrid flash caches get more common and their internal complexity ramps up.
Both also fail due to electrical reasons. Chips fail due to manufacturing defects, solder degrades, capacitors dry out, etc.
And I'll reiterate the uncorrected bit error rate. Transient errors are more common with SSDs, and are unlikely to happen identically with multiple units.
> If you have a mirror, and you have two identical SSDs, and you produce such a condition ... then no more mirror. They both die identically
I have a pair of mirrored SanDisk Extreme Pro's in my main server. Both suffered from a firmware bug that caused data corruption. ZFS was able to repair all damage, because they didn't fail identically.
Also thanks to the mirror I was able to upgrade the firmware without taking the server down.
Mixing different SSDs might be a good idea, but you can make much the same arguments for doing the same with HDD's, and like with HDD's it's still better than nothing to have redundancy with "identical" drives.
I am actually referring to endurance. Irrespective of the RAID mode, I would expect each drive to be written exactly the same amount of data at the same time. If all models are identical and have been purchased at the same time, one would expect that the same firmware will allocate the same writes to the same cells (in RAID 1 it will be the same data, in RAID 5 the data will be slightly different, but the size will be the same). So in theory the wear should be identical. Obviously this only applies to a RAID mode where you have parity.
But I read that like 3-4y ago. We should have more experience with SSDs in production now so I was wondering if that thinking still applied.
> If all models are identical and have been purchased at the same time, one would expect that the same firmware will allocate the same writes to the same cells
Not really, you'd expect them to diverge over time as the cells aren't going to be all 100% identical - they'll have different failures, different error rates, and different read patterns will result in different read-disturb errors, all of which will effect even completely deterministic block allocation.
Regardless, as I said, you monitor endurance. Drives are unlikely to just silently wear out out of the blue unless you've been completely ignoring their SMART readings, in which case, more fool you.
He's talking about something else. Physical drives fail due to physical reasons, but SSDs fail for logical reasons.
A weird firmware bug or an unexpected usage pattern or wear pattern (not just burning the entire things lifetime up) can cause the SSD to die.
If you have a mirror, and you have two identical SSDs, and you produce such a condition ... then no more mirror. They both die identically.
See my response to the parent about how you can easily address this by mixing SSDs in mirrors.