Yes, we can do that through flamegraphs. For any container running on our system, we can get the call stack of the running process, time taken, resources used etc. We can attach to containers, proxy traffic etc, all through Kubernetes.
I said this earlier, but we've essentially separated operation concerns from our applications, and that opens us up to relying more on knowledge of Linux which is easier to hire for, and we can reuse that knowledge and all our tooling with any other languages we want to use.
We run between 200k and 2m processes per beam VM. I don't know how it'd be possible to get as precise metrics as we need just from relying on linux utilities. And in the same cluster, some processes although identical in code have dramatically different workloads.
I said this earlier, but we've essentially separated operation concerns from our applications, and that opens us up to relying more on knowledge of Linux which is easier to hire for, and we can reuse that knowledge and all our tooling with any other languages we want to use.