In my experience on services with billions of users - no one knows the whole thing. There are potentially thousands of hops in a roundtrip of a given system from the user to some source of truth and back. The larger companies grow, the more complex these systems get, the higher the load, the more likely we are to see a break. Systems break constantly, recover constantly, and very rarely does the user see it. So perhaps another way to reform this question is - why are the users seeing it now?