Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If it's deterministic enough to be worth sourcing drives from different batches,

I suspect you're using a mistaken premise.

It's worth sourcing from different batches because failures are not deterministic. Instead, we merely have probabilities based on past experiences (usually from vast data generously provided by operators of spindles at huge scale).

> why wouldn't it be enough to add small amount of writes on purpose

Well, it's not enough, because it might only protect against simultaneity of certain failures. It also doesn't actually reduce the potential impact of the failures, merely buying more reaction time. By distributing a single batch of drives across many arrays, even a simultaneous failure is just increased replacement maintenance cost (if that's even the strategy, rather than enough hot spares and abandon-in-place), without the looming data loss. With the software staggering of write amplification, each failure could be the start of a cascade, in which case replacement takes on a time-critical aspect. This replacement emergency ends up being an operational (not software) solution, as well.

My worry would be that the software scheme provides a false sense of security.

Additionally, you may want to quantify what "small amount" is, considering you're suggesting such an algorithm would allow for failure multiple weeks apart. 3 weeks is 2% of 3 years. For an array of 12 drives, does that mean that the 12th drive would need 22% the writes of the 1st drive?

Of course, beyond any performance hit, write amplification for SSDs has other deleterious effects (as per the article). A software solution would have to account for yet another corner case.. or just stop trying to re-invent in software what already has a pretty comprehensive solution in operations.

> Power cut / spinup / other conditions can be replicated from the OS level as well.

Not necessarily, although I suspect that true nearly always on modern equipment. However, that's not what I meant. What I meant was failures that occur more frequently merely with the time the drive has spent powered on (or powered on and spinning). Even if that could be simulated relativistically somehow, that wouldn't be a software solution, either.

Also, adding a "chaos monkey" of the kind that powers down a drive in a running array would both introduce a performance hit that I expect a majority of environments would find unacceptable (more than would find write amplification acceptable) and would introduce additional wear and tear on mechanical drives. The latter may be worth it, but I'd be hard pressed to quantify it. It would be different if limited to hot spares, but that's also of limited utility.

You'd also have to be extremely careful in implementation, as a bug here could make a previously viable array into a data-lost array. If such a technique reveals a drive failure, I'd want it to stop immediately so as to be able to replace it with a different one from a different batch and have enough replacements on hand, in case all the rest suffer the same fate.

> I didn't list rather than ignored them.

Unfortunately, it's impossible to tell the difference in discussions on this topic, because, as I mentioned, so few people have first hand knowledge (or have done the research). Even before "the cloud", there was more mythology than hard data (including about temperature, until Google published data debunking that).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: