IT Operations has a Cultural Problem

_pghu · on Dec 1, 2011

Let's see. Linkbait title, latest buzzword, consistent use of "bureaucracy" (I do not think that word means what you think it means - you cannot wave a magic DevOps wand and make bureaucracy go away) and insinuations that operations departments are "outdated"... Yeah, seems like a worthwhile article with a good point.

I sure want Joe Random Engineer committing code that goes live on our real, grown-up site, where we make real money, and a failure leads to us losing real money. Ops departments exist so that that can't happen. If that means you have to wait a day before testing your latest code in production, I don't see this as a bad thing.

The "cultural problem" is in people who think that operations departments don't need to exist because "like, how hard is it to run servers? We'll just put it 'on the cloud' and magically all of our security, reliability, and availability problems will be solved"

The biggest piece of nonsense is this concept of a "private cloud". What the fuck is a private cloud? Oh, you mean a remote datacenter, like we've had since the 70s. OK.

gabrtv · on Dec 1, 2011

Jamming ephemeral cloud infrastructure into an ITIL-style bureaucracy is like jamming a square peg into a round hole. You can push as hard as you want -- it ain't gonna fit.

jvehent · on Dec 1, 2011

looks like somebody didn't like it when boss said "no, you can't put the new accounting system on heroku"

ddw · on Dec 1, 2011

A few of the responses here are snarky and understandably so considering the author's reasons for this post, yet the problem remains. How does cloud computing fall within the traditional IT ops model? Anecdotally I've worked for a large city and they're still a little hesitant of cloud computing because they see it as a threat to their employees. I'm not sure how it'll shake out but they'll move towards cloud computing eventually and developers instead of operations could/should manage it.

gabrtv · on Dec 1, 2011

I would argue that that operations engineers need to become more like developers, not that developers should be operating critical systems.

The cultural problem can also be framed as a transition from a server-centric operations model to an application-centric one -- something James Urquart wrote a great post about for GigaOM: http://gigaom.com/cloud/what-cloud-boils-down-to-for-the-ent...

ogghead · on Dec 1, 2011

"agile management" is pretty much always going to mean "less management," so it's understandable that management is kind of schizophrenic about the DevOps movement

Schmidt · on Dec 1, 2011

It's not about less management, it's about trusting your employees and their judgement. Accepting that failures happen and learn from the mistakes.

_kotv · on Dec 1, 2011

In my experience this is completely true of large organizations, "Most operations departments are inflexible and inefficient because they rely on specialized engineers glued together with manual processes and a large IT bureaucracy – all fundamentally at odds with the fast-moving, application-centric world of cloud computing."

This is a cycle. If the management of the Operations organization is measured based on reducing downtime they control what they can, Release & Change Management. This kills frequent small releases, so development teams have to build big releases. If management in development organizations are measured mostly by delivering on schedule they cut scope. You end up w/ development organizations delivering the minimum to ensure they meet the project mostly artificial timelines for huge releases. Suggesting small frequent releases sounds good to development (assuming they can reduce the operational paperwork associated w/ releasing), but jeopardizes Operation's control of stability so Operation's resists it. Suggesting that more get delivered in each huge release jeopardizes Development's ability to meet project deadlines because there is so much unknown and the commitment is expected up front, a quarter or more (I've seen 18 months) in advance.

There are reasons for all of this; it's not bad people, just a consequence of large organizations. Reducing downtime reduces costs because you can cut support staff. Delivering on time increases productivity because code that isn't being used is useless code.

drivingmenuts · on Dec 1, 2011

If my local server providing a vital service goes down, I catch hell. If my cloud server providing a vital service goes down, I catch hell and can't do anything about it except bitch at customer service who has their own set of priorities and a TOS protecting them from any meaningful action on my part.

So, what's the right option there?

gabrtv · on Dec 1, 2011

Clearly the right option depends on the specifics of the service and the team managing it. However, unless you have a spare server sitting around, you're in the same boat either way, right?

Outages at serious cloud providers like AWS are usually restricted to availability zones, though there have been a few high profile exceptions where entire regions were affected. In general though, with AWS you can redeploy your server rapidly if you have your infrastructure blueprints kept as code, and your data backed up to EBS snapshots or S3.

Just this week I had a high-traffic Wordpress blog shit the bed on EC2/RDS. Using the tools we built at OpDemand, I was able to clone the platform and get it back up and running in < 60 minutes without any HA. I think < 60 min recovery time is probably a stretch for most on-premise environments..

r00fus · on Dec 1, 2011

This is service contract management, not technical or systems management.

Cloud is IT infrastructure outsourcing and (just like other outsourcing) requires a strong contract. Being able to specify, clarify and hold your provider responsible on contract specifics and making your management comfortable with it is becoming part of the IT operations job.

You need to set expectations with management. One of the benefits of outsourcing is that you should be have an exit or ability to switch to a similar provider... otherwise you are a captive customer.

zenpocalypse · on Dec 1, 2011

troll much?