Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Elevated Error Rates" is such a BS term. They were down. Man up and own the mistake.


As someone downstream of providers like Stripe who is on call for issues like this, that term is actually quite helpful to me. It tells me that I should be expecting delays and timeouts, and that some percentage of operations are likely to complete, whereas a complete outage likely means requests are failing immediately or failing to connect. This is important information when reviewing our options. During a full outage, aside from failover (when possible and not automated), we usually don’t need to take any action. When dealing with greatly increased error rates, it may be beneficial for us to disable the API on our end in order to avoid a lot of hung open connections and delayed responses for our users. We’d rather that operations fail immediately and completely instead of forcing users to wait around for operations that are unlikely to complete anyway.


We had a couple payments go through during the "downtime". Maybe "Severely elevated error rates" would be better?


I'd agree if that were actually true, but it's not.

With large enough services there is always some acceptable level of errors due to 0.001% probability events. When there's an outage, it's not usually everything down, but even 0.1% of jobs failing ends up affecting a lot of users.

Even 10% of jobs failing still isn't "down", it's "partly down", even if you have to issue credits for SLA violations and publish a public postmortem later.


It now just says "Down".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: