Kubernetes is 5 years old. This is very, very young for mission-critical infrastructure management software.
Having a certain level of doubt in young open source projects is responsible, in my opinion. I'm interested to hear other people's perspective on production-readiness of k8s for mission-critical applications.
If security got to be the number one concern for whether things were deployed or not, then sure we could likely take a more conservative view.
However realistically k8s is in heavy deployment in a wide variety of industries including public sector, financial services, retail, technology ... and it's clear that this kind of concern is not the primary consideration.
The tradeoffs Monzo made are not ones that apply to most business. For most businesses, you have a profitable and sustainable model and you want to mitigate the possibility that you sink the ship by screwing the pooch on security or availability.
Monzo, on the other hand, was default-dead, so betting the farm on a relatively unproven technology perhaps wasn't risking as much. Nobody talks about the startups that used unproven tech and sank.
I don't think Monzo had to adopt k8s to survive. It's an infrastructure technology not something which provides a unique advantage from an app. development perspective.
In most other industries, saying something is in "heavy development" is usually the same as "unstable". (Unstable usually is interpretted as Bad in software engineering -- but the dictionary definition of "unstable" only means "prone to change", which I think is an accurate characterization of k8s considering its degree of maturity)
Whether or not something is a smart choice to use in mission-critical production applications doesn't depend on the number of big banks or big tech companies that use the technology.
At the end of the day, Kubernetes is a tool that will change
very rapidly over the next 5 years. I could see k8s being a decent choice to use in a tech project that you expect to actively maintain and improve for the next 5+ years, AND if you (and your developers) are willing to invest time (potentially a lot of time) every year keeping up to speed with how k8s evolves through every version release. That's the primary risk in using something like k8s.
Sure rapid development is likely to equal lots of change, but it's far from alone in that regard.
The last decade has been dominated by rapid adoption of technologies that were under heavy development at the time, from Ruby on Rails, to Node.JS to Golang to Rust.
The simple reality of modern IT is that companies are unwilling to wait until a technology has stabalized before making use of it.
Personally I'd rather they did, but my opinion has little weight in that regard.
Kubernetes has already seen far more production-hours of operation than most infrastructure management software will ever see. Age is no substitute for experience.
The better question is when you deploy k8s in a production how do you ensure none of the risks are being exploited.
Given todays landscape of hardware and software exploits adding a complex orchestration layer with identified issues seems like less than prudent behavior.
I currently work on kubernetes in production and am migrating large clients into these systems. I see the distinct lack of knowledge around securing systems and more so when adding kubernetes.
I'm not running antagonistic workloads in k8s though, I'm just running my own junk, each component of which also has its own laundry list of security nightmares.
"Only two remote holes in the default install, in a heck of a long time!"
Due respect to smart acquaintances who work on OpenBSD, but to most people who secure application deployment environments, this is not the reassuring statement OpenBSD seems to think it is.
What's funny about it is, if you're going to make up a benchmark (and theirs is contrived; it was "no remote vulnerabilities", as I recall, when I was involved with the project, then "no remote vulnerabilities in the default install", then "only one remote vulnerability in the default install"), make up one where your number is zero, not "just 2 in a heck of a long time".
But more substantively: the reason you run an operating system is to do stuff on it. It isn't 1996 any more and nobody gets public shell accounts on Linux systems or OpenBSD systems; similarly, remotely-exploitable vulnerabilities in other operating systems are also exceedingly rare, and so OpenBSD's benchmark excludes the LPEs that actually make up the meaningful attack surface of a modern OS.
What's a more important question is what features the operating system provides to harden the non-default programs that inevitably have to run on it. OpenBSD has historically lagged here, though they're upping their game recently.
Despite briefly being involved with the project during "The OpenBSD Security Audit" in the late 1990s, I have a longstanding bias against OpenBSD that I should be up front with: we shipped an appliance on OpenBSD at Arbor Networks, and I spent several days debugging a VMM problem that would zombify pages of memory and gradually suffocate our systems. When I presented evidence to Theo, he said (not a literal quote) "don't bother me about this, Chuck Cranor" --- I think it's Chuck Cranor but could be wrong --- "wrote this VMM as his graduate project and I've got nothing to do with it". For whatever that's worth, I've felt OpenBSD is an unserious option for deploying real systems other than near-stateless network middleboxes ever since.
To be fair, hardly anyone uses openbsd compared to kubernetes. And last I checked, most openbsd services are disabled by default, so it makes it hard to break in, but unusable in its default state.
If we have to count the exploits in every new thing against some grand total of allowable exploits then there will never be new things. The question was not whether k8s added to the universe of exploits, but whether the exploits make it unready for production. Personally I was more bothered by some of the code quality issues than the list of specific high severity exploits. It's a large project and issues like this will be found.
When the complexity of the attack surface gets to the degree of k8s I would say that is a problem.
The fact very few and I do mean very few people understand the low level functions going on ( like the multiple layers of nat via iptables ) and they are simply struggling to keep it running its pretty obvious they arent qualified to run this in production.
I have been at google HQ in kubernetes discussions and its frightening how little people know about the internals of it.
We already depend upon layer after layer of highly complex software. I'd argue that the complexity of k8s is not out of line with its scope. I don't want to get into a debate about specific things like netfilter. Yeah it's an odd setup and full of warts, but it's completely pluggable. On GKE for example you can now run in a mode where the pod networking is handled as a VPC subnet with load balancing directly to pods. And that's sort of the point: it's the maturing abstractions that are valuable, not the specific implementation of a part like networking.
As for struggling to run it, our experience has been different. Granted we're a small user. Our largest cluster has just over 100 nodes. Our highest volume service hits about 15k req/sec at peak. We're on GKE which is a well-managed implementation and that also makes it less risky. In two years of production the platform has been extremely reliable. Moreover we've been able to do things that would have been a lot harder before, such as autoscaling the service I mentioned above so that we're not paying for capacity we don't need off peak.
You keep saying that the attack surface is high, but is it higher than all other software we consider suitable for this purpose?
Does anyone understand the JVM and servlet containers? Does anyone understand OpenSSL's state machine? Does anyone understand hardware load balancers? Does anyone understand speculative execution? Does anyone understand the Postgres query planner? Does anyone understand all the same-origin policies? Does anyone understand their laptop's power supply?
I've seem a lot of people build a lot of successful systems on things they don't know every detail of, even when not knowing those details is quite dangerous. That Kubernetes is yet another one of these building blocks isn't an indictment of Kubernetes, it's an indictment of the compulsion to understand everything.
Can you name one security vulnerability from this document that, in a functionally-similar architecture that used OpenBSD and didn't use Kubernetes, would have been prevented by OpenBSD's security model?
("Don't build the system you want to build, build the system I want you to build" isn't an answer.)
Thing is everyone I have worked with uses k8s because its the new cool toy. None of them have a requirement to create a large expensive platform which costs more than simple hardware so a company can bring products to market faster
Everyone thinks they can save money with k8s. You wont. Especially in AWS
It's production ready, folks have been running it on production and will be running it on production. Sure it has issues from inside the cluster. But if you secure it and it's not accessible from outside, it's good to go. Probably more secure than trying to run 500 boxes at once.
Yes, it significantly reduces the number of machines. That's the main benefit. You can binpack your pods by sizing it up well and maxing out resources on each machine.