There’s a ton of stuff on this list that you bloody well can test in preproduction and you’re a damn fool (or you work for them) if you don’t/can’t.
- A specific network stack with specific tunables, firmware, and NICs
- Services loosely coupled over networks
- Specific CPUs and their bugs; multiprocessors
- Specific hardware RAM and memory bugs
- Specific distro, kernel, and OS versions
- Specific library versions for all dependencies
- Build environment
- Deployment code and process
- Specific containers or VMs and their bugs
- Specific schedulers and their quirks
That’s 40% of that list, or 5/8ths of the surface area of 2 problem interactions. CI/CD, Twelve Factor… you can fill an entire bookcase with books on this topic. Some of those books are almost old enough to drink. Someone whose by-line is “been on call half of their life” has had time to read some of them.
To be fair, I've had to argue with a lot of managers prior to The Cloud about how the QA team was given shit hardware instead of identical hardware. The IT manager even had a concrete use case for identical hardware that I thought was for sure going to win me that argument but it didn't.
If you don't have enough identical hardware for pre-prod, then you probably don't have spare servers for production either. If you get a flash traffic due to a news article, or one of your machines develops a hardware fault, then you have to order replacements. At best you might be able to pull off an overnight Fedex, but only if the problem happens in the morning.
If, however, you have identical QA hardware, you can order the new hardware and cannibalize QA. Re-image the machine and plop it into production. QA will be degraded for a couple of days but that's better than prod having an issue.
With the Cloud, the hardware is somewhat fungible, so you can generally pick identical hardware for preprod and prepare an apology if anyone even notices you've done it. If the nascent private cloud computing vendors manage to take off, they'll have to address that phenomenon or lose a lot of potential supporters at customer sites.
I'm sure there are clueless companies/managers that don't quite get it in infra land (and that are still great places/people to work for and products to work on) and if you find yourself in one of those situations, it's pretty rational to need prod if it's the only instance of your problem because of large divergences in the things you and the article mention. You're not wrong. But something that I've been a stickler on since our company's beginnings is that dev is really, as much as is feasible and useful, an exact copy of prod. And it's working so far. We have yet to scale to massive heights, I'll admit that. But it's a principle that I've seen more than a few companies simply neglect.
There’s a ton of stuff on this list that you bloody well can test in preproduction and you’re a damn fool (or you work for them) if you don’t/can’t.
That’s 40% of that list, or 5/8ths of the surface area of 2 problem interactions. CI/CD, Twelve Factor… you can fill an entire bookcase with books on this topic. Some of those books are almost old enough to drink. Someone whose by-line is “been on call half of their life” has had time to read some of them.