Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

2-3 hours per year is a lot of downtime. Most competent bare metal providers see maybe one major outage of less than an hour every 3-5 years. Nothing other than a facility wide power outage, if the load somehow gets dropped because the generators don't start right away as they should, or a misbehaving (only partially failing) core network infrastructure device should result in major outages when all the proper redundancies are in place.

Specific providers aside, there's more complexity involved in a large cloud provider's infrastructure and much more that can go wrong as a result. Having a code update, or some orchestration issue from your infrastructure provider be potential points of major outages are huge and unnecessary risks. You don't need that much scale, just utilizing enough resources to fill up a few whole physical machines for a few hundred dollars a month. Add some globally distributed BGP Anycast DNS and database replication and you have enough redundancy to withstand most of the worst major infrastructure failures.

I would understand if AWS was super simple and convenient, but these days the learning curve seems far greater than setting up the above described bare metal solution. While being almost an order of magnitude more expensive for the equivalent amount of resources.

How did we end up here? Does brand recognition just trump all technical and economic factors, or what am I missing?

Disclaimer: I run a bare metal hosting provider



> Having a code update, or some orchestration issue from your infrastructure provider be potential points of major outages are huge and unnecessary risks.

I trust any of the big cloud providers to do these things more reliably than I can. In particular, if I'm going to replicate a database across data centers within a region (availability zones as the big cloud providers call them), I'm quite sure that a managed database service will be more reliable than my own hand-configured cluster.


Is 3 hours per year really that bad? It’s 99.96% (three 9’s) uptime which I’d think is fine for most small to medium businesses.


From a dedicated hosting provider's perspective, some customers will notice interruptions more than 5 minutes (1 monitoring cycle), start submitting trouble tickets at 15 minutes, and require RFO's at 30+ minutes of downtime. An hour and up even just one time, we would probably start to see cancellations. Not going to say I'm not envious that AWS seems to have a much more outage tolerant customer base.


I don't want to advertise your particular company, but if we are talking about numbers, how does your bare metal offer compares to a Amazon ec2 offer for example? And how would a customer that need to scale their load do it?


A c4 dedicated host at AWS (simply picked as the 2nd listed in the Dedicated Hosts Configuration Table as c3 the first doesn't show up under the Instance Types page) comes with an E5 2666v3, 64GB of RAM, and no storage for $810/mo.

The CPU model is non-standard, but at 10 cores and 2.90GHz is effectively a slightly higher clocked version of the E5 2660v3 (10 cores, 2.60GHz). The first google result for a 2660v3 dedicated server with an order page that allows to adjust the options (13 usable IP's, 64GB RAM, minimal 120GB SSD storage) comes out to $275.

And this is based on whole box to whole box comparison. The cost of individual instances at AWS equivalent to one of those boxes can be much higher depending on the type and size.


Don’t mind me. I’m just here to corroborate your claims of downtime as a consumer of bare-metal hosting providers for 15-something years.


Did you ever faced a situation where you or your clients needed more compute power and a cloud scalability would've been more convenient/cheaper?


No, a datacenter holds a seriously large amount of compute. Just a single rack is 38U usable in most cases, depending on density of compute and power availability you can get a good 2,000 CPU cores and a few dozen TiB of DDR4 from a single rack (with something like a DELL MX7000 chassis).

And it's incredibly rare you'd be limited to a single rack of course.

Cloud has many tangible benefits, but "amount of compute available" is not one of them. Time to acquire compute, though, is. (and, obviously, management of the resources/datacenter operations).

Cloud is almost never cheaper, even factoring in salaries. It's just very convenient if you're small enough not to have people providing compute very well internally. (and, internally, people tend to understaff/underfund the teams that would do the same job as cloud operators are doing)


Cloud can often be cheaper than on prem or at a colo if you are both willing to be “cloud native” and change your processes and you have people who actually know what they are doing and not a bunch of “lift and shifters” who are old school netops guide, got one AWS certification and now only know how to click around in the UI and duplicate an on prem infrastructure.


Maybe if your load is unusually extremely erratic. In the vast majority of cases, you could purchase 2-3x more than you need in bare metal hosting resources (with data centre and hardware operations already outsourced), making scaling not an issue, and still see significant cost savings compared to public cloud which is typically 6-7x the cost for equivalent resources.


If all you care about is running a bunch of VMs yes. But if you’re just running a bunch of VMs and using cloud hosting - you’re doing it wrong.

- we don’t want to maintain our own build servers. We just use CodeBuild with one of the prebuilt Docker containers or create a custom one. When we push our code to Github, CodeBuild brings up the Docker container, runs the build and the unit tests and puts the artifacts on S3.

- We don’t want to maintain our own messaging, sftp, database, scheduler, load balancer, oath, object storage, servers etc. We don’t have to, AWS does that.

- We don’t want to manage our own app and web servers. We just use Lambda and Fargate and give AWS a zip file/docker container and it runs it for us.

- We need to run a temporary stress test environment or want to do a quick proof of concept, we create s Cloud Formation template, spin up the entire environment, run our tests with different configurations and kill it. When we want to spin it up again, we just run the template.

We don’t have to pay for a staff of people to babysit our infrastructure between our business support contract with AWS and the MSP, we can get all of the troubleshooting and busy work support as needed.

I’m a software engineer/architect by title, but if you look at my resume from another angle, I would be qualified to be an AWS architect. I just don’t enjoy the infrastructure side that much.


>I just don’t enjoy the infrastructure side that much.

That's fair, that's totally your right.

However, you're talking about absolute cost and unfortunately you're examples weave through true and false quite frenetically.

> - we don’t want to maintain our own build servers. We just use CodeBuild with one of the prebuilt Docker containers or create a custom one. When we push our code to Github, CodeBuild brings up the Docker container, runs the build and the unit tests and puts the artifacts on S3.

Like all things business, "want" and "cost" are different, in this case, depending on your size of course, it could easily be cheaper to have a dedicated "build engineer" maintaining a build farm. This is how the majority of people do it. (I work in the video games industry, it's _MUCH_ cheaper to do it this way for us)

> - We don’t want to maintain our own messaging, sftp, database, scheduler, load balancer, oath, object storage, servers etc. We don’t have to, AWS does that.

Again, those are "wants", TCO can be much lower when out of the cloud. But again, depends on scale. (as in, lower scale is cheaper on cloud, not larger scale).

> - We don’t want to manage our own app and web servers. We just use Lambda and Fargate and give AWS a zip file/docker container and it runs it for us.

I mean, 1 sysadmin can automate/orchestrate literally thousands of webservers.

>- We need to run a temporary stress test environment or want to do a quick proof of concept, we create s Cloud Formation template, spin up the entire environment, run our tests with different configurations and kill it. When we want to spin it up again, we just run the template.

Yes, this is a real strength of cloud.

> We don’t have to pay for a staff of people to babysit our infrastructure between our business support contract with AWS and the MSP, we can get all of the troubleshooting and busy work support as needed.

Yes, but you are paying "overhead" for all of that, and not having talented engineers on your payroll who understand your business critical systems is, in my opinion, foolish.

I've dealt with vendor support and it's incredibly hit and miss, and it's much more "miss" when you're a smaller customer to the vendor. Of course, this is anecdotal.


> depending on your size of course, it could easily be cheaper to have a dedicated "build engineer" maintaining a build farm. This is how the majority of people do it. (I work in the video games industry, it's _MUCH_ cheaper to do it this way for us)

Cheaper to have a dedicated build engineer than using Codebuild? I just looked at my bill for August, my startup has $50K/mo. AWS spend across 4 regions in US/EU/Asia. We use Codebuild to build and deploy all of our infra from GitHub, including a ton of EC2 for our dedicated apps.

Guess how much my bill for Codebuild was in August? 17 cents! $0.17 CodeBuild $0.06 Asia Pacific (Singapore) $0.07 Asia Pacific (Tokyo) $0.02 EU (Frankfurt) $0.00 EU (Ireland) $0.02 US West (Oregon)

I'd like to see you hire a build engineer for $0.17. AWS services are dirt cheap because they let you automate all of the stuff that would require dedicated engineers for, while you can focus on your business, or what differentiates you.


I’m not disagreeing with you. The only times you save money by going to the cloud is by reducing the number of people you need or if you have a lot of elasticity in demand. I would never recommend anyone going to the cloud just to reproduce an infrastructure they could do at a colo.

Like all things business, "want" and "cost" are different, in this case, depending on your size of course, it could easily be cheaper to have a dedicated "build engineer" maintaining a build farm. This is how the majority of people do it. (I work in the video games industry, it's _MUCH_ cheaper to do it this way for us)

That’s $80K to $100K. You can buy a lot on AWS for that price...

Again, those are "wants", TCO can be much lower when out of the cloud. But again, depends on scale. (as in, lower scale is cheaper on cloud, not larger scale).

That’s another $100K to $200K....

I mean, 1 sysadmin can automate/orchestrate literally thousands of webservers.

That’s yet another $100K. You’re up to at least $250K - $500K In salaries.

Yes, but you are paying "overhead" for all of that, and not having talented engineers on your payroll who understand your business critical systems is, in my opinion, foolish.

I am one of the “talented engineers” that’s why I mentioned I could go out and get a job tomorrow as an AWS architect - my resume is very much buzzword compliant with what it would take from both a development, Devops, and netops side to hold my own in a small to medium size company. I just find that side of the fence boring - we outsource the boring work or the “undifferentiated heavy lifting”.

As that side got too much for me, we hired one dedicated sysadmin type person to coordinate between what he would do himself, our clients and our MSP.

I’m actually trotted our as the “infrastructure architect” to our clients even though my official title and day to day work is a developer. I haven’t embarrassed us yet.

I've dealt with vendor support and it's incredibly hit and miss, and it's much more "miss" when you're a smaller customer to the vendor. Of course, this is anecdotal.

I agree completely. If it’s something complex, I either do it myself or have very detailed requirements on what our needs are. But honestly, the more managed services you use, the less you have to do that part.


I'll note that he says he works in the video game industry, where infra engineer/sysad/etc salaries are, in my anecdotal experience, significantly lower than even run of the mill positions in "regular" companies, and especially lower than SV/startup/big tech companies. Offers I've received from several game companies were less than 50% of what I had from other companies, and I was told when I tried to negotiate that they wanted people that were passionate about games and what they were building, and not people just looking for a cushy job. That can change the economics on the situation.


I’m not even coming from the perspective of a Silicon Valley big tech company. I’m in Atlanta. We would need at least three additional employees to handle our infrastructure/dev ops workload at a colo and that would be around a half a million dollars for the fully allocated cost to hire them. As opposed to the additional cost of the business support plan, the cost of the MSP, and hiring senior developers who are know their way around AWS. Also we finally hired one person to coordinate everything and take the busy work off the back of the leads.

We can do a lot with a half million dollars a year on AWS.


To add a little context I currently spend around 500,000 CAD on infra in GCP per month, which is roughly half of my total infrastructure (in terms of raw compute/bandwidth use). The remaining metal costs 100,000 CAD / month.

As I was implying. You’re just outsourcing your ops. At scale, you end up spending significantly more than you expect.


That’s the difference. Whether you operate at a small scale or a large scale, if you have web servers, database servers, load balancers, build servers, network infrastructure etc. If you are at a colo, you still have a minimum number of people you have to hire and no one who is any good is going to work below market rates and if they are any good, they would probably be bored out of their minds at a small company. “Outsourcing your ops” makes perfect sense until it doesn’t.

Also, when I put my software architect hat on (and take my infrastructure hat off), it’s a lot quicker to get things done just to ask our MSp to open an empty account in our AWS Organization, spin up the entire infrastructure, pilot it, get it approved, audited, and then run the same template in the production account without having to wait on a change request, approvals, pre approval security audits, etc.

I’m also not advocating all in on cloud. With a Direct Connect from your colo to your cloud infrastructure, it makes sense sometimes to have a hybrid solution. Everything from using your cloud infrastructure as a cold DR standby, using it for green field development where a team doesn’t need to be shackled by change requests, committees, etc


I think we’re agreeing but we draw the line in different places.

Cloud is great for speed of deployment. But once something is made, stable and has predictable load then it’s a huge cost saving to bring it in-house. Many don’t, probably because they’ve used some cloud only technologies or fear the migration path will take time.

So you just continually line bezos pockets instead of using the cost savings to remain liquid.


3 hours of downtime per year equals to 99,96% uptime.

In what world is that a lot of downtime?


Reliability is weird, you're only as reliable as the sum of all your critical components.

Usually you strive for "five 9's" in infrastructure, obviously there's a lot of wiggle room depending on business case. But reliability for individual components gets exponentially harder with each 9 after the first 2.

99.96% uptime of a datacenter is shockingly low, taking connection issues into account (IE; number of successful inbound packets vs unsuccessful ones, not just served requests). For context my company has around 15 datacenters around the world which routinely hit 5-9's, with only a few issues of datacenters being down for 2-3 minutes during a particularly bad ISP outage.

The overwhelming majority of degradations are ones related to bad code being deployed. But since reliability is a sum of all components availability it follows that permitting more outages is less preferable. Especially since they affect all or at least the majority of components in a given region.


In a world where you have SLAs with your customers, in which you commit to something better?


Damn, these ships must really be run tightly.

In every company I have worked for, the amount of outages caused by bugs and other post deployment issues was already above that number.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: