I built a system where our developers can do instant deployments of any of our software packages (and instant point-in-time rollbacks), and then do zero-downtime restarts of services.
Now we deploy dozens of times a day and I never get called on a Friday night because someone did something stupid.
Edit: I do get called when I did something stupid and it broke the deployment system. But that's gotten much rarer lately.
I've built a similar system, around 5 year ago. Users were able to deploy any version to any cluster from a nice UI. Basically you could select any version, click "Install" button and follow the logs in real time.
Behind the scenes it was a decentralized continuous delivery system. Very cool stuff, highly automated. Reduced a lot of work and sped up development cycles from months to minutes. Served quite a large software development organization (1000+). I think we had 1000 servers in 5 datacenters around the globe.
Nowdays I'm working on an open source version of that system, it's still missing few critical features but hopefully I'll get the first release out next spring.
btw, I'm looking for projects/contacts that would be interested in trying out how the system would fit their needs.
Hey Mikko; I'm a sysadmin at a research university, and I'd be very curious in at least "picking your brain" about your tool. I can't make any promises about actual usage, but I always love to see a novel approaches to relevant problems.
Cool, sysadmin at a research university sounds like a nice position to be at.
Yes, it's already on github. Unfortunately, since it's missing those critical features it's not easy to see how the whole system is going to work. If you to talk just drop me an email at gmail. mikko.apo is the account.
I really hate the idea that deploying on a Friday afternoon is a bad idea. It's only bad when you have shit developers or shit processes that don't catch broken code.
Personally, I think it's better to release at 5pm on a Friday. Once people stay late a few times to fix their broken shit they'll be smarter about not checking in crap.
> It's only bad when you have shit developers or shit processes that don't catch broken code.
Or when the bug is only triggered in specific user profiles.
Or when all the devs went on a retreat in the mountains with no cell service.
Or when a dev makes a mistake (which we know never happens to even the best devs)
Or when the only developer that knows which one of the 1000 changes that were pushed could be the one breaking, turned his phone off.
Or when a flaw is discovered in the process for the first time (which we know never happens because everyone's process is perfect, until it isn't)
Or how change management's requirement that the fix be tested and verified by all affected teams might have people staying a few hours after 5pm on a Friday when they just want to get their weekend started.
Or how 10 different people from 10 different teams might need to be called and kept to work until 2am because the change can't be pulled because the database was already modified and the old client data is already expired from cache and a refresh would destroy the frontend servers.
Yes, this! "Good code" and a CI box and deployment automation and some chef recipies don't spell ultimate success.
It drives me nuts when people tell me off for saying 'yeah yeah, no, automating our entire infrastructure of 5 servers isn't really worth it right now', like I'm some unprofessional bozo.
I pretty much have experience with all but one or two of your suggested scenarios, and by now I have no patience for annoying software developers who think that using chef or puppet somehow sufficiently embiggens them to run ops on their own (of course dev ops is almost a political assault on existsing ops guys, not merely a nice new solution to existing problems).
Sigh. This is why I don't work on teams these days (if I can help it).
EDIT: Though I also agree with the sub-parent, that deploying on 5pm is fine in certain teams and certain projects, the most important thing is are the guys pushing to do the deploy going to own the deployment? Are they going to hang around for another 60 minutes to check everything is OK? Are they going to be available at 10pm or on Saturday if something goes wrong and are they going to own it? If the answer is no, then nope, don't do it.
I'd agree that in a perfect scenario, you should be able to push code at any time confidently. But for many companies and projects this is not really available. As well, in many organizations the person who has to fix broken stuff is not the same as who develops and pushes code. I'm not saying that's a good thing, but it is reality for many people.
Even if it only happens once in your career, once you've had a dev push out code at 5pm friday night, jet out the door and hit the bar, meanwhile you (the sysadmin/ops on call) get woken up at 1am by site down alert, and have to debug/rollback the changes while the dev who pushed them is unreachable, you learn to really avoid friday evening pushes. Fool me once...
Its sad that most places dont have a proper technical copy (with a full copy of live data) to do full tests on TDD is all very well but you need to test the entire system.
Yeah, because all problems are foreseeable and only ever caused by crap code... right.
No matter how great your processes and your code are, no test can catch everything that can go wrong in a live environment, and doubly not if your system interfaces with anything third party.
It's not always the person who pushed it on a friday that ends up fixing it, though. They can be unreachable, without a computer, etc etc. It's just easier to change less during hours you have less people on hand, is all.
Maybe it's me, but I have no problem staying late on a Friday to fix my screw-up. However, I'm terrified of having to fix something Monday morning while everyone else is watching.
But the real reason we deploy weekday mornings is so everyone is on deck and we can get outside help if required. When I was doing system integration, the problem was never in my code, it was the vendor's. Testing can only get you so close to the real world.
Seen this pattern many times before. One of those 'en' strings is the current user's language being written into the source, the other is hardcoded. If your server-side templating engine is impotent and only supports variable interpolation without conditionals, this approach is easier than pulling the right JS snippet from somewhere else.
- Somebody deployed new features on a Friday at 5pm.
- Fifteen hundred machines running mod_perl.
- Supporting Oracle - TWICE.
- It turns out your entire infrastructure is dependent on a single 8U Sun Solaris machine from 15 years ago, and nobody knows where it is.
- Troubleshooting a bug in a site, view source.... and see SQL in the JS.