LXC – Running 14,000 tests per day and beyond

arohner · on Sept 23, 2013

If you're interested having this kind of setup for your own team, without having to muck with servers, you can use something like https://circleci.com. We use LXC, and pay lots of attention to having fast I/O. [disclosure: I'm founder/CTO of circle]

rsanders · on Sept 24, 2013

We switched to CircleCI from our own, painfully maintained Jenkins box to test our Rails apps. Between their clearly highly tuned infrastructure, nice little tweaks like caching the installed gemset between builds, and automatic parallelization of the test suite, our average build times dropped by 50-75% and became much more consistent. We did give up some flexibility in determining which branch names automatically run, or whether builds only run once a pull request is opened, but it's well worth it.

arohner · on Sept 24, 2013

Thanks! It's great to hear you're enjoying it.

amackera · on Sept 23, 2013

We've been using Circle for quite a few months now and it's amazing. Thanks for a great service!

thejosh · on Sept 24, 2013

Stupid question, do you also support bitbucket?

matlock · on Sept 24, 2013

at Codeship (https://codeship.io, I am one of the founders) we support Bitbucket and have been for a while. We use LXC as well and are currently looking into Docker for our next infrastructure steps.

arohner · on Sept 24, 2013

Not yet. We're github-only at the moment.

malandrew · on Sept 24, 2013

I love github and have never used bitbucket or any other service, however I'm always disappointed that no one has tried creating a library that provides an abstraction layer for many of the features on github and other companies like github. This would help foster diversity in a market that is currently a monoculture.

pedoh · on Sept 23, 2013

Part two is up: http://codeascraft.com/2013/09/23/lxc-automating-containers-...

It looks like they're not leveraging LXC via Docker. I wonder if that's because they've been doing it this way pre-Docker, or if there are some technical reasons why it made sense to skip it.

bobf · on Sept 23, 2013

They've probably been doing it long before Docker, as most people who use LXC have been. Also, it makes sense to skip it because Docker adds relatively little for their use case.

jaytaylor · on Sept 23, 2013

Precisely, you've nailed it.

Docker is great when it fits your use-case, but there are lots of practical ways in which LXC stands great all on it's own.

I've commented on HN before about why I often choose plain LXC over Docker: https://news.ycombinator.com/item?id=6378823

themckman · on Sept 23, 2013

You have any good resources on vanilla LXC?

jaytaylor · on Sept 24, 2013

The ubuntu docs are pretty good: https://help.ubuntu.com/lts/serverguide/lxc.html

And also the lxc-user mailing list is a great resource: https://lists.sourceforge.net/lists/listinfo/lxc-users

shykes · on Sept 23, 2013

Yeah, in general if you have a production system in place there's not much incentive to rip it out and replace it with something else - especially if your system is relatively new and shiny.

It becomes more interesting when a) your system has been running long enough that you've learned operational and design lessons that justify an upgrade, or b) you're starting fresh on a new architecture.

Most of the Docker community is made of these groups A and B. If you're starting from scratch, or if making significant changes anyway, it makes a lot of sense to federate the development effort and reap the benefits of code reuse, common tools etc.

JPaulEtsy · on Sept 24, 2013

That is absolutely correct. We started this before Docker existed. Also LXC is simple enough to do on your own that we haven't had a compelling reason to replace our tools with any others. Docker definitely has some cool features though!

contingencies · on Sept 24, 2013

Indeed, LXC rocks for this.

Drawbacks are that it's nontrivial to set up and requires some rigid formalism in developer output that sometimes demands training and/or cultural change. But it's definitely something everyone should consider.

In my (currently internal, heavily LXC-utilizing but) infrastructure and OS neutral project, I am looking at specifically this sort of automation but for complex topologies of interdependent services, HA clustering layers, complex emulated network topologies (bonded links, multiple VLANs), etc. Plans are to include failure testing at the communications level (slow links, lossy links, cable snaps, switch failures, etc.) in addition to resource levels (disk, memory, etc.).

Outputs of a successful automated testing environment can include amazingly detailed information for capacity planning, automatically generated security policies (both for container-side, host-side and infrastructure-side deployment).

It's a fascinating area and one that is ripe for great change. Many people have needs here, the question is how to meet them at the intersection of current infrastructure and codebases, existing teams, business level concerns, varying hardware availability, etc. Both pre-commit and post-commit hooks are useful for different types of automation. IMHO LXC's blazing speed broadens significantly what can be tested with pre-commit.

jtreminio · on Sept 23, 2013

You say your dev's local development environment is different to prod. Are you guys letting each dev set up their own environment by hand, or have you provided a Puppet or Chef repo that they can clone and have an exact replica up and running within minutes with Vagrant?

JPaulEtsy · on Sept 24, 2013

It's only somewhat different. We try to keep everything as similar as possible. We use chef to get new and old VMs up to date on our current setup, reusing recipes from prod to dev whenever possible. That being said, every dev is allowed to modify/change their vm in any way they seem fit. It is recommended to speak with us before doing any wild changes of configurations that might cause chef runs to start failing or make your VM not as good of a representation of production.

We also allow our developers to connect to a proxy to our production MySQL shards from their development environments in a read only mode. This allows them to leverage the large data sets that are quite hard to replicate in our development architecture. There is also a limited read/write mode that we are working on (with the proxy filtering dangerous queries). But all that is another blog post for another day.

We also do not use vagrant, opting for QEMU/KVM on physical hardware. The same tooling you saw in part 2 of my blog post also creates our development VMs as well.

rorrr2 · on Sept 23, 2013

It doesn't matter. Developers almost never develop/test in an environment that mimics production - multiple load balancers, multiple app servers, multiple database servers, failover to a second data center, etc.

If your dev environment is not different from prod, you're either insanely rich of your server setup is trivial.

npsimons · on Sept 23, 2013

I would argue that a dev environment that is identical to prod, no exceptions, is too constraining. As the OP points out, having root access in the VM to go willy nilly and try out new tools is a must for developers.

I'm kind of surprised they didn't have Jenkins setup from the start; I'm also a bit taken aback that they don't use automated code reviews before accepting patches to their "deploy" branch. Even for a small project, it's not that hard to setup Jenkins+Gerrit to reject patches that break tests (or have to pass whatever other hurdles you want).

JPaulEtsy · on Sept 24, 2013

We only really have one branch "master". We encourage the engineers to push small changes, behind config flags if necessary, all the time so there is never any huge merge conflicts, etc. This also means you don't push your code to master until you are up in our push queue.

We've also had Jenkins set up for a long time now, we just used LXC to drastically improve our performance and scalability.

Here is an old blog post explaining some of how it all works: http://codeascraft.com/2011/04/20/divide-and-concur/

xiongchiamiov · on Sept 24, 2013

When do you do code review?

JPaulEtsy · on Sept 24, 2013

We use a review script that creates a temporary branch in github and sends an email to everyone you specify to review it. We then kill that branch when the review is over. Any time you push code, you run our test suite on your changes then create a review. Since these are encouraged to be small and behind config flags to not affect all our users immediately, it happens quite often. Once feedback is taken from the review you enter our push queue yourself and push it out yourself. If the code is possibly dangerous, we recommend those pushes wait from Friday night to Monday morning for safety's sake.

lotyrin · on Sept 23, 2013

Luckily, most of my projects fall into having production be trivial. Unluckily though, that tends not to stop teams I work with from breaking parity somehow.

sir_charles804 · on Sept 23, 2013

I would love to know the OP's (or anyone else's) thoughts on Vagrant and why it is not in use to mimic prod on all of the devs' machines.

bazzargh · on Sept 23, 2013

They've 2 use cases, this test one, which has a rationale linked from 'workload' in article:

"Run end-to-end, the 7,000 trunk tests would take about half an hour to execute. We split these tests up into subsets, and distribute those onto the 10 machines in our Jenkins cluster, where all the subsets can run concurrently..."

http://codeascraft.com/2011/04/20/divide-and-concur/

...so clearly running these tests on a single devs machine would be a bottleneck. The other use case is the dev env: in a previous blog they described how they're using their own internal cloud to run the dev vm's faster on dedicated hardware (with easy, one click provisioning):

http://codeascraft.com/2012/03/13/making-it-virtually-easy-t...

which makes sense. Why emulate prod running on dev's own boxes when you can pool the hardware and get better utilisation, & at the same time run them faster?

contingencies · on Sept 24, 2013

Why emulate prod running on dev's own boxes

It is a good way to have faith in your ability to execute your stuff on new/unfamiliar/heterogenous environments, which can be valuable. People may be geographically distributed. People may wish to work offline. People may want a degree of control not available or even feasible on shared hardware resources.

bazzargh · on Sept 24, 2013

The 'offline' one I agree with, but don't think that's precluded (they have the chef recipes, so why not); but it's an option not the common case. And the last one too - we've got a similar setup, and there are times when there were issues with our xen vms that we explored outside of the shared cluster. One more justification would be overflow - this has happened here, we don't have room for all the VMs on the cluster so some of us run them locally.

So yes, don't rule it out, but still seems to make economic sense to share hardware.

rurounijones · on Sept 24, 2013

What on earth kind of tests are running that can whip the I/O of 3 SSDs?!

Maybe worth looking at something like a small fusion-io / other PCI memory card

nickstinemates · on Sept 24, 2013

What a cool project! Great work.