Surely at minimum Hadoop developers could tell you!

marcosdumay · on April 22, 2015

Have you even been in a project where the developers didn't know how to build it? It's a strange situation, with huge environments being passed from one computer to another, and treasured with more care than the code itself.

StillBored · on April 22, 2015

This happened to me about a decade ago. A very smart sysadmin in the company created an acronis image for machine deployments. They very carefully documented everything they changed, and how to recreate it. Then someone else created an image from one of the imaged machines without documenting what they changed. This happened a couple dozen or so times until the image pretty much was a mess of hand installed binaries, configuration hacks, etc. It literally took another person 6 months to untwist what was actually on the machine by md5suming the crap out of everything guessing at versions until they found a match, and documenting it.

That sounds like the state of a lot of docker images.

EC1 · on April 22, 2015

Well fuck me. I just spent two weeks fiddling with Vagrant and Docker and finally got everything up and humming only to come into this thread. Going to refrain from slapping the SysAdmin title on myself for now.

_clhx · on April 22, 2015

Docker is awesome, but you shouldn't be using blind base images. Use Dockerfiles, they're self-documenting.

tracker1 · on April 22, 2015

Unless you build your own base images... odds are you will be using something someone else built. Even the host OS probably wasn't compiled by you.

In general, my base images are often debian:wheezy, ubuntu:trusty or alpine:latest ... From here, a number of times I've tracked down the dockerfiles (usually in github) for a given image... for the most part, if the image is a default image, I've got a fair amount of trust in that (the build system is pretty sane in that regard)... though some bits aren't always as straight forward.

I learned a lot just from reading/tracing through the dockerfiles for iojs and mono ... What is interesting is often the dockerfile simply adds a repository, and installs package X using the base os's package manager. I'm not certain it's nearly as big of a problem as people make it out to be (with exception to hadoop/java projects, which tend to be far more complicated than they should be).

golang's onbuild containers are really interesting. I've also been playing with building in one node container with build tools, then deploying the resulting node_modules + app into another more barebones container base.

_clhx · on April 23, 2015

Well, you have to trust something somewhere. Unless you're always compiling from source (which you can do with Docker), and you've read the source, etc.. but even then, you have to trust the compiler and the hardware.

Anyway, yes, you can make your own base images. But, images `should` be light enough where you can build them each iteration. I've done dev stacks where literally each `save/commit/run of a test` built the docker container from the dockerfile in the background! With the caching docker does it really doesn't add any overhead to the process.

> What is interesting is often the dockerfile simply adds a repository, and installs package X using the base os's package manager.

Yup! Pretty much. Other than some config stuff for very specific use cases (VPN, whatever.)

pnathan · on April 22, 2015

A legend at one company about 5 years ago is that the company's next world-shaking product was being built partially with a single computer that was shipped around from office to office, because no one knew how to build the build environment again. Again this was circa 2010. :-)