not all raid cards support this, however most consumer devices do. If you have a card that doesn't do this, either its a highend jobby with another mechanism, or a pile of shite.
second when you actually bump into a hard error, you'll see things in /var/log/messages that say "disk timed out, retrying" (but then you're probably fucked by then)
However using ZFS/BTRFS data is checksum'd when written and read. This means that bit flips and otherwise silent bit rot has a greater chance of being caught early. Performing "scrub" (consistency checks) often means you're more likely to catch errors early. (this is another reason why you'll want, but ECC ram. as it'll tell you when you get bit flips in ram, which undermines on disk checksums)
Part of the reason why people choose raid6/z2 is because it takes a long time to rebuild an array after a disk crash. It involves reading all the data again and computing the hamming code to recover the lost data.
when you buy a job lot of disks, they tend to be the same batch, which means they could be prone to the same bug/defect. So this general means that multiple disk failure happen at the same time. having another parity disk means you can withstand more failure before total loss.
However, none of these mechanisms are a replacement for a backup. even if you have a 36 gig raid, you still need a backup. There will always be a time where something goes horrifically wrong and you need to get access to an archived version.
anyone who fights against this has yet to experience the fun of total failure.
ZFS and BTRFS both provide snapshot capability. This substantially changes the metric for a backup - i.e. if disaster recovery is an absolute priority, then you need it, but your backups will include your snapshot set.
The most visible, even if you are not using ZFS is using the SMART output http://www.techrepublic.com/blog/linux-and-open-source/using... There are a number of metrics that might indicate the health of the drive
not all raid cards support this, however most consumer devices do. If you have a card that doesn't do this, either its a highend jobby with another mechanism, or a pile of shite.
second when you actually bump into a hard error, you'll see things in /var/log/messages that say "disk timed out, retrying" (but then you're probably fucked by then)
However using ZFS/BTRFS data is checksum'd when written and read. This means that bit flips and otherwise silent bit rot has a greater chance of being caught early. Performing "scrub" (consistency checks) often means you're more likely to catch errors early. (this is another reason why you'll want, but ECC ram. as it'll tell you when you get bit flips in ram, which undermines on disk checksums)
Part of the reason why people choose raid6/z2 is because it takes a long time to rebuild an array after a disk crash. It involves reading all the data again and computing the hamming code to recover the lost data.
when you buy a job lot of disks, they tend to be the same batch, which means they could be prone to the same bug/defect. So this general means that multiple disk failure happen at the same time. having another parity disk means you can withstand more failure before total loss.
However, none of these mechanisms are a replacement for a backup. even if you have a 36 gig raid, you still need a backup. There will always be a time where something goes horrifically wrong and you need to get access to an archived version.
anyone who fights against this has yet to experience the fun of total failure.