Google's SRE book covers some of this (if you aren't cheekily referring to that)...

ignoramous · on Feb 9, 2020

Your comment reminds me of this excellent ACM article by Facebook on the topic: https://queue.acm.org/detail.cfm?id=2839461

I've read the first SRE book but having worked on large-scale systems it is impossible to relate to the book or internalise the advice/process outlined in it unless you've been burned by scale.

I must note that there are two Google SRE books in-circulation, now: https://landing.google.com/sre/books/