Can you do that without knowing what apps are running on them? Your entire platf...

secondcoming · on Jan 27, 2022

> Your entire platform say "do a rolling update" and don't look back?

Yes. GCP literally has a button that allows you to Replace/Restart all the instances in an Instance Group. Our deployment system just pushes a new GCP template and GCP does all the work rolling it out. No Kubernetes.

koffiezet · on Jan 28, 2022

Yes there are other solutions out-there that have been pre-engineered, but he was talking about VM's and apps and rebooting them. Doesn't sound like instance groups to me.

bsagdiyev · on Jan 27, 2022

If things are architected to be updated in a rolling fashion. If it's some single node app you're screwed. But we're talking about K8s so I would hope you're using it for its purpose and not some tiny cluster for a 3 request per minute app.

res0nat0r · on Jan 27, 2022

https://kubernetes.io/docs/concepts/workloads/controllers/de...

koffiezet · on Jan 28, 2022

I was talking about doing this on non-k8s platforms :)

hedora · on Jan 27, 2022

Yes. You can do this without knowing about or modifying the applications. There was a big push for hardware consolidation a decade or so ago, where legacy apps were crammed on to fewer and fewer machines, and were cheaper/faster/more reliable as a result.

However, a permanent machine failure or unexpected reboot implies a few seconds or minutes of downtime.

These systems usually use a local disk, a synchronously updated disk across town (low latency, but on a different power grid, outside most natural disaster blast radii), and a far away asynchronously updated disaster recovery disk.

The disaster recovery disk provides a crash-consistent state that's less than a few seconds out of date.

These days, the "disks" tend to be deduped and compressed SSD's that use parity encoded raid. Their hardware cost is lower than a single copy on ext4, but they come with an enterprise tax / support contract, etc.

In practice, it's durable unless corporate HQ is wiped out. If that happens, it won't be the weak link in your business continuity plan (but a few customers' orders might've been dropped).

This stuff is tremendously unsexy, but it'll plug along just fine until you have some workload that can't be partitioned, and that needs more than roughly 40-100gbit of network/disk bandwidth, 128 cores, or 1TB of DRAM.

(Edit: Forgot to mention that the the apps run inside a VM infrastructure that supports live migration.)

koffiezet · on Jan 28, 2022

Well, I didn't claim it was impossible, I only claimed I wouldn't want to pay for the engineering that would go into enabling this when building that from scratch. But there are of course alternatives to k8s which also enable this.

dosethree · on Jan 27, 2022

for stateless apps (with HA, capacity etc) its easy