Interesting read. Coincidentally this is a current conversation at our office (1...

Interesting read. Coincidentally this is a current conversation at our office (10K+ FTEs): use/extend open-source or develop institutional proficiency at aspects of distributed computing?

Ours is the usual case: we have homegrown brokered message queues plus Kafka and 21 more. We have homegrown single-master replicated DB, but also use DB2, Oracle, Postgres, MySql. We have private in-core caches with speciality code to knit together data and also use Memcache, Redis. Some of that code is installed from DPKGs; some if it slightly edited code and installed as service so internal users aren't bothered with the icky-details.

We have the Jekyll & Hyde behavior mentioned in OP's write-up and from commenters: on stuff considered core to the company and clients for a sufficiently complex system, we're not going to rely on 3rd party trouble ticketing systems and support. It's in-house. But that resolve falls away in stages and never lasts. We also have people who strongly push the business practical side: we're not here to write Azure or Kafka. We're here to make business apps. Get focused, and get client focused. That comes with an interesting blend of boredom and brand awareness gone wrong: I looked around & I saw or heard people in Dept X are working on caching solutions so ... what'd be the point of you doing it?

Now, here's the interesting question: suppose we needed a distributed ledger (without blockchain) or we needed 2PC for some homegrown caching. Now what? Do we write that in-house as a reusable components? Dig it out of PG or MySql code if they have it and we know were to look? Suppose we make a business case to upgrade a cluster to kernel bypass NICs. Should we go into Redis and self-modify the code to support that I/O path? No? Frankly, what then is the purpose of a CS guy/gal with a MS in distributed computing otherwise?

My own perspective on this:

- The first 1/3rd of this back and forth is the proxy conversation that flies above reality which is more about risk management and perception of failure i.e. company reputation & gossip mongering.

- The second 1/3rd of this is what people talk about because we can't talk about the underlying CS concepts. Most company employees usually have no ability to formally describe, or otherwise even sketch 2PC giving pseudo-code for it on a whiteboard. Or take this another way: see if you can get your lead on your homegrown replicated DB to explain how it works and why it works like we'd see in a senior level CS class: clear, specific, and yet abstract enough not be talking about superfluous implementation details. Can the audience understand it?

- Insufficient grounding in customer needs so one can better determine what clients really need, and what's lacking in the current offerings. Failure here will send you on goose chases to no where.

Gedanken experiment: you and I start a new software company broadly in distributed ledgers, caches. We are able to get 25% of MS' distributed computing leads --- not all --- but none of these guys and gals are slouches. What now? Do we write it in house or expand and extend some IBM offering? The the difference is we have talent, and there's some reason to believe they can deal with a distributed system's worth of risk.

To end this with humor, here's a nice analogy I once heard: On a Saturday, mom and dad were sick of the noise of their three kids. And the house was a mess. So they send them outdoors to work their energy off. At sunset the house is clean, and they're sitting on the porch watching the sun go down over some wine. They see the kids coming back: they are filthy dirty. They see work: all 3 kids needs baths. The clothes have to be washed. And they'll probably destroy the bathrooms they just cleaned. So the father looks at the mother and says: As I see it, we could clean these kids up or make a new one. Not sure if making a new one is wisdom, but it's a hell of a lot more fun.