Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tape Archive System: write a tape, restore it again, and do MD5sum against the original data. Then we know it can be restored correctly, and the original data is deleted.

Should be bullet proof?

Alas, the 'write to tape' scripts I'd inherited didn't warn if they couldn't load a tape into the drive.

There was a tape jammed in the drive, so the tape robot was refusing to load any new tapes, but kept on writing and restoring from the same tape over and over again.

Stupidly, we didn't do any 'check a tape from 3 weeks ago' for a while.

Lost quite a bit of data. We still have the md5sums though... Still gets shivers thinking about it



I know one large company where contract operators managed to destroy every copy of a very large companys payroll by loading tape after tape onto a malfunctioning tape deck.


Yes, this was why I was paranoid about the whole write/restore/compare process.

I, being mainly a software guy, didn't consider the hardware robot as something that might fail w/o error.

Now, the process looks like:

Check drive is empty. Load Tape. Write Tape Unload. Load tape "42" from anther slot. Write 'slartibartfast' to that tape. Unload. Load original tape. Restore & Compare. Unload. Load tape "42". Restore, and make sure all it has is 'slartibartfast'.

This seems to me to have removed most of the possible silent-failure situations. If anyone can think of part of this algorithm that might fail, let me know!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: