I've seen devs toss crud into infra with debug logs enabled, with millions of lines of deprecated log messages, etc., and the infra budget eats their costs.
It's insane. Unless you're literally Facebook, or ingesting data from CERN's LHC... what possible use case requires 100GB of text data ingest per day?
Maybe it's a case of someone throwing a Service Mesh into a Microservices K8s cluster and logging all the things?
4-5MB/min per VM for input/output traffic in compressed logs of application servers. Around 100ish VMs/site. That's a half GB/min. 700GB+/day in plain text logs per a single site from app servers alone.
Normally that's no issue as the data is stored in SANs and not sent onto the cloud for analysis, just giving a perspective.
It's absolutely caused by exactly those things you've mentioned. I think we could drop it down by 75% easily if we simply had people putting severity levels in correctly and disabled storing debug logs except in experimental environments.
90%+ of our logs are severity INFO or have no severity at all. It's like pulling teeth to even get devs to output logs using the corporate standard json-per-line format with mandatory fields.
Still, once you're running hundreds of VMs processing a big data pipeline it's not hard to end up with massive amounts of logs. It's not just logging, really, it's also metrics and trace information.
It's insane. Unless you're literally Facebook, or ingesting data from CERN's LHC... what possible use case requires 100GB of text data ingest per day?
Maybe it's a case of someone throwing a Service Mesh into a Microservices K8s cluster and logging all the things?