Although vulnerability scanners can be a useful tool, I find it very troublesome...

djsumdog · on Feb 26, 2019

This entire depending on a base container for setup and then essentially throwing away everything you get from the package manager is part of the issue. There is no real package management for Docker. Hell, there isn't even an official way to determine if your image needs an upgrade (I wrote a ruby script that gets the latest tags from a docker repo, extracts the ones with numbers and sorts them in order and compares them to what I have running).

Relying on an Alpine/Debian/Ubuntu base helps to get dependencies installed quickly. Docker could have just created their own base distro and some mechanism to track package updates across images, but they did not.

There are guides for making bare containers, they contain nothing .. no ip, grep, bash .. only the bare minimum libraries and requirements to run your service. They are minimal, but incredibly difficult to debug (sysdig still sucks unless you shell out money for enterprise).

I feel like containers are alright, but Docker is a partial dumpster fire. cgroup isolation is good, the crazy way we deal with packages in container systems is not so good.

Sure if you're just checking for base-distro packages for security vulnerabilities, you're going to find security issues that don't apply (e.g. an exploit in libpng even though your container runs nothing that even links to libpng), but it does excuse the whole issue with the way containers are constructed.

I think this space is really open too, for people to find better systems that are also portable: image formats that are easy to build, easy to maintain dependencies for, and easy to run as FreeBSD jails OR Linux cgroup managed containers (Docker for FreeBSD translated images to jails, but it's been unmaintained for years).

DyslexicAtheist · on Feb 26, 2019

I agree the tooling is a tire fire :(

> e.g. an exploit in libpng even though your container runs nothing that even links to libpng

it's a problem because some services or api's could be abused giving the attacker a path to these vulnerable resources and then use that vulnerability regardless if it is used currently by your service.

I like my images to only contain what's absolutely needed to do that job only. It's not so difficult to do, provided people would be willing to architect systems from the ground up, instead of pulling in a complete debian or fedora installation and then removing things (that should be outlawed imho lol). Not only do I get less attack surface but also smaller updates (which again then is incentive to do more often), less complexity, less logs, easier auditabile (now every log file or even log line might give valid clues), faster incident response, easier troubleshooting, sorry for going on and on ...

It's a cultural problem too: where people working in an environment where it's normal to have every command available on a production system (really?), and where there is no barrier to install anything new that is "required" without discussion & peer review (what are we pair programming for?) or where nobody even tracks the dead weight in production or whether configs are locked down?.

I sometimes think many companies lost control over this long ago. [citation needed] :(

deathanatos · on Feb 27, 2019

> where people working in an environment where it's normal to have every command available on a production system (really?)

Yes, really.

The ops folks I work with banter around the same idea that you're getting at here, that engineers should not have access to the production system they maintain. I'll ask you the question I ask them: when the system has a production outage¹, how am I supposed to debug it, effectively? To do that, I need to be able to introspect the state of the system, and that pretty much necessitates running arbitrary commands.

Even if I'm stripped of such abilities… I write the code. I can just change the code to surface what data I need to do, and redeploy. That can be incredibly inefficient, as deploying often resets the very state I seek to get at, so I have to sometimes wait for it to recur naturally if I don't have a solid means of reproducing it.

_lqaf · on Feb 27, 2019

I think you'll find that at larger companies, or companies with a need for taking their security posture seriously, that is the norm. Even before we grew, only a couple engineers had prod access.

You debug it via tooling, instrumentation and logs. I realize if you're accustomed to sudoing on prod when troubleshooting, this sounds crazy. Trust me, it works fine; better, in fact. Far fewer weird things happen in well-controlled environments.

justinclift · on Feb 27, 2019

Yeah, I've occasionally wondered how (if) the DevOps approach is implemented at places which have to be SOX compliant.

https://en.wikipedia.org/wiki/Sarbanes%E2%80%93Oxley_Act

One of the technical things it does is segregate (on purpose) developers from any kind of production access.

This is because (from historical experience) a "bad apple" developer can do amazingly fraudulent things, and have a more than reasonable chance of covering it up. o_O

caspar · on Feb 28, 2019

Engineering manager (& previously a developer) at a ~2k developer SOX-compliant company chiming in here.

We have a platform as a service team which maintains our PaaS infrastructure (think an internal version of Heroku), and they are the only ones who can SSH into any production systems (<50 engineers, I'd guess).

Engineers write the code (mandatory green build and peer review required before merge to protect against bad actors.. but that's just good engineering practice too!), build their own deployment pipelines on Bamboo or Bitbucket Pipelines to push up code & assets to a docker registry, and ultimately the deployment pipelines make some API calls to deploy docker images. Engineers are also responsible for keeping those services running; most products (such as Jira, Confluence, Bitbucket, etc) also have a dedicated SRE team who are focused on improving reliability of services which support that product.

The vast majority (95%) of our production issues are troubleshooted by looking at Datadog metrics (from CloudWatch & services publish a great deal of metrics too) and Splunk (our services log a lot, we log all traffic, and the host systems also ship their logs off). Fixes are usually to do an automated rollback (part of the PaaS), turn off a feature flag to disable code, redeploy the existing code to fix a transient issue (knowing we'll identify a proper fix in the post incident review), or in rare cases, roll forward by merging a patch & deploying that (~30 mins turnaround time - but this happens <5% of the time). Good test coverage, fast builds (~5 mins on avg), fast deploys, and automated smoke tests before a new deploy goes live all help a lot in preventing issues in the first place.

It's not perfect, but it works a lot better than you might expect.

justinclift · on Feb 28, 2019

Interesting. With the deployed docker images, is that only for internal applications, or do you do external (public facing) applications as well?

Asking because is one of the conceptual things I've been trying to figure out.

Still currently using non Docker deployment of production stuff for our public services. Have looked at Docker a few times, but for deployment to the public internet where being accessible to clients on both IPv4 and IPv6 is mandatory, it just doesn't seem to suit.

Docker (swarm) doesn't seem to do IPv6 at all, and the general networking approach in non-swarm docker seems insecure as hell for public services + it also seems to change arbitrarily between versions. For a (currently) 1 man setup, it seems like a bad use of time to have to keep on top of. ;)

Maybe using Nginx on the public services, reverse proxying to not-publicly-accessible docker container hosts would be the right approach instead?

Btw - asparck.com (as per your profile info) doesn't seem to be online?

caspar · on March 1, 2019

Internal and external apps are both deployed on the same PaaS. Only difference is that internal apps aren't reachable "outside the VPN"; when it comes to building your service, it's an extra line of yaml in your service config. There's a network engineering team who works with the PaaS team to make that happen - it's definitely a nice luxury of a big company that you don't need to worry about setting up VPCs etc yourself.

The actual PaaS relies pretty heavily on AWS Cloud Formation - it predates swarm, mesosphere, kube, etc. So when we deploy a new version of a service, it's really "deploy an auto scaling group of EC2 instances across several AZs fronted by an ELB, then there's an automatic DNS change made which makes the new stack of EC2 instances live once the new stack is validated as working". The upside of the one service per EC2 image approach is no multi-tenancy interference - downsides are cost and it takes a bit longer to deploy. There's a project underway to switch compute to being Kube-based though, so that's promising.

All this is apples and oranges though - solns for big companies don't make sense for a 1-person shop. I still have side projects being deployed via ansible and git pull instead of Docker, because it hasn't been worth the ROI to upgrade to the latter.

Re asparck - yeah, it was my personal site but I struggled to find the time to work on it. In the end I decided it was better to have it offline than terribly out of date, but hopefully I'll resurrect it some day.

crankylinuxuser · on Feb 27, 2019

I work in an org that has strong partitions between devs and sysads. Dev can make and test code. They have no access to prod. Sysads can configure and execute what devs make. They have read-only access to the source repos.

Problems with this are as follows (real, not imagined)

1. AWS cloudformation scripts - who makes them? If dev does, sysads can't change.

2. Does dev have the security mindset to maintain configurations in IaaS things like Cloudformation? Who reviews things like NACLs, Security Groups, VPCs, and the like?

3. Scripts - how big or what impact does a script need to be written by sysad or dev?

4. Oncall - normally are sysads job, but when you implement strong gates between dev/sysad, you need oncall devs.

justinclift · on Feb 27, 2019

Thanks, that all falls in line with roughly the kind of problems I'd expect. :)

hessproject · on Feb 28, 2019

I work on a SOX compliant project, it works pretty much as described. As a developer I have no access to prod, relying on other teams to make system changes and creating a lengthy mandated paper trail for the SOX audit team to look over. Not to say there aren't headaches with the approach, but thankfully at a big enough organization it's become a relatively smooth process.

deathanatos · on Feb 27, 2019

Tooling requires access. Are you saying no dev should ever `strace` a process? (This requires not only access, but presuming my UID != the service UID for the service, sudo, too.)

Note that I'm not saying devs should have access to every production machine; I'm only saying that access should be granted to devs for what they are responsible for maintaining.

Sure, one can write custom tooling to promote chosen pieces of information out of the system into other, managed systems. E.g., piping logs out to ELK. And we do this. But it often is not sufficient, and production incidents might end up involving information that I did not think to capture at the time I wrote the code.

Certain queries might fail, only on particular data. That data may or may not be logged, and root-causing the failure will necessitate figuring out what that data is.

And it may not be possible to add it to the code at the time of the incident; yes, later one might come back and capture that information in a more formal channel or tool now that one has the benefit of hindsight, but at the time of the outage, priority number one is always to restore the service to a functioning state. Deploying in the middle of that might be risky, or simply make the issue worse, particularly when you do not know what the issue is. (Which, since we're discussing introspecting the system, I think is almost always the part of the outage where you don't yet know what is wrong.)

taeric · on Feb 27, 2019

Your system should already surface the information necessary for you to do your job. Or you aren't doing your job.

This is what I always felt was the more appropriate use of tech debt. You are literally borrowing against tech built by others for that likely did not have the same requirements as you.

Is it convenient? Yeah. But it breeds bad choices.

deathanatos · on Feb 27, 2019

> Your system should already surface the information necessary for you to do your job. Or you aren't doing your job.

I only have so much time, and very little is budgeted towards things like pushing information into managed systems. I do that when and where I can, but I do not get (and have never, frankly) gotten sufficient support from management or ops teams to have sufficient tooling/infrastructure to where I can introspect the system sufficiently during issues w/o direct access to the system itself.

The only place where I really disagree on principal (that is, what you propose is theoretically possible given way more time & money than I have, except) is unexpected, unanticipated outages, which IMO should be the majority of your outages. Nearly all of our production issues are actual, unforeseen novel issues with the code; most of them are one-offs, too, as the code is subsequently fixed to prevent reoccurrence.

But right at the moment it happens, we generally have no idea why something is wrong, and I really don't see a way to figure that out w/o direct access. We surface what we can: e.g., we send logs to Kibana, metrics to Prom/Grafana. But that requires us to have the foresight to send that information, and we do not always get that right; we'd need to be clairvoyant for that. What we don't capture in managed systems requires direct access.

taeric · on Feb 28, 2019

Apologies for the slow response.

I'm not really disagreeing. There will be "break glass" situations. I just think these situations should be few and far between, and we should be working to make them fewer and farther. Consider, when was the last time you needed physical access to a machine? Used to, folks fought to keep that ability, too.

Kalium · on Feb 27, 2019

> I'll ask you the question I ask them: when the system has a production outage¹, how am I supposed to debug it, effectively?

High-quality testing of your systems, leading to engineers working on generating playbooks that cover the vast majority of production incidents, could be one approach that some might consider. Designing in metrics that aid debuggability could even be possible in some scenarios! Taken together, this can mean engineers get woken up left often for trivial things.

This isn't impossible. It's not even difficult or complex. It is time-consuming, and definitely requires a shift in mindset on the part of engineers.

deathanatos · on Feb 27, 2019

> leading to engineers working on generating playbooks that cover the vast majority of production incidents

For any incident that happens, I'm going to — if at all possible — fix it in code s.t. it doesn't happen again, ever. There is no playbook: the bug is fixed, outright.

That only leaves novel incidents, for which a playbook cannot exist by definition. Had I thought to write a playbook, I would have just fixed the code.

(I am not saying that playbooks can't exist in isolated cases, either, but in the general case of "system is no longer functioning according to specification", you cannot write a playbook for every unknown, since you quite simply can't predict the myriad of ways a system might fail.)

Kalium · on Feb 27, 2019

You're right! For bugs, they should be fixed and never recur. There is no play book for this. For novel one-offs, they also cannot be anticipated, and thus cannot be planned for.

These points are very wise and correct. Yet, is it possible that situations might occur that don't fall into these situations? For instance, a hardware failure or a third-party service failure, or a common process is mis-applied and needs to be reversed. There could be a vast number of potential scenarios that are neither bugs nor novel events for which playbooks could be authored. There is non-trivial value to be gained in making it easy for an operational team to handle such events, particularly when events that do recur have their handling codified.

You are, of course, absolutely correct to note that many events either will not recur or cannot be anticipated. Yet, might there also be value to be gained by recognizing that there are events outside these categories that can be anticipated and planned for?

ric2b · on March 9, 2019

You're ignoring regressions/maintainability.

Otherwise what's the point of automated testing? Just fix any bugs when they show up and never write tests!

DyslexicAtheist · on Feb 27, 2019

> The ops folks I work with banter around the same idea that you're getting at here, that engineers should not have access to the production system they maintain

that is a point, but it wasn't at all _my_ point. with what is available I was referring to the commands that are installed inside the container which allow potential breakout of the container once the container is compromised.

fwiw there is a breaking point with teams that don't restrict access to the production environment. once too many people have access it becomes unmanageable.

ghostly_s · on Feb 26, 2019

I'm not familiar with Docker infrastructure but what is the alternative to "pulling in a complete debian or fedora installation and then removing things"? Compiling your own kernel and doing the whole "Linux From Scratch" thing? Isn't that incredibly time-intensive to do for every single container?

kccqzy · on Feb 26, 2019

Just have an image with very a minimal user land. Compiling your own kernel is irrelevant because you need the host kernel to run the container, and container images don't contain a kernel.

The busybox image is a good starting point. Take that, then copy your executables and libraries. If you are willing to go further, you can rather easily compile your own busybox with most utilities stripped out. It's not time intensive because you need to do it just once, and it takes just an afternoon to figure out how.

westurner · on Feb 26, 2019

I don't think this is a tooling problem at all.

"The tooling makes it too easy to do it wrong." Compared to shell scripts with package manager invocations? Nobody configures a system with just packages: there are always scripts to call, chroots to create, users and groups to create, passwords to set, firewall policies to update, etc.

There are a bunch of ways to create LXC containers: shell scripts, Docker, ansible. Shell scripts preceded Docker: you can write a function to stop, create an intermediate tarball, and then proceed (so that you don't have to run e.g. debootstrap without a mirror every time you manually test your system build script; so that you can cache build steps that completed successfully).

With Docker images, the correct thing to do is to extend FROM the image you want to use, build the whole thing yourself, and then tag and store your image in a container repository. Neither should you rely upon months-old liveCD images.

"You should just build containers on busybox." So, no package management? A whole ensemble of custom builds to manually maintain (with no AppArmor or SELinux labels)? Maintainers may prefer for distros to field bug reports for their own common build configurations and known-good package sets. Please don't run as root in a container ("because it's only a container that'll get restarted someday"). Busybox is not a sufficient OS distribution.

It's not the tools, it's how people are choosing to use them. They can, could, and should try and use idempotent package management tasks within their container build scripts; but they don't and that's not Bash/Ash/POSIX's fault either.

westurner · on Feb 26, 2019

> With Docker images, the correct thing to do is to extend FROM the image you want to use, build the whole thing yourself, and then tag and store your image in a container repository. Neither should you rely upon months-old liveCD images.

This should rebuild all. There should be an e.g. `apt-get upgrade -y && rm -rf /var/lib/apt/lists` in there somewhere (because base images are usually not totally current (and neither are install ISOs)).

`docker build --no-cache --pull`

You should check that each Dockerfile extends FROM `tag:latest` or the latest version of the tag that you support. Its' not magical, you do have to work it.

Also, IMHO, Docker SHOULD NOT create another Linux distribution.

DyslexicAtheist · on Feb 26, 2019

say you're setting up a firewall, it would be a bad idea to start with a rule that allows all traffic and then think about which services should be rejected? You would start by rejecting everything and then selectively allow ports/traffic.

When bootstrapping a VM or container image it's the same principle (imo), it's safer to once think about what is needed and have that locked down and under control rather than having to keep track of what else was forgotten or might be suddenly a flaw in this ever changing "threat landscape". E.g. the libpng example from above: What else is lying around that is suddenly vulnerable tomorrow?

If you got time to compile your own kernel then why not. It's a process IMO and I usually get to that stage once everything else is locked down. To start a minimal set-up using alpine (or busybox) instead of ubuntu or whatever is already a huge reduction of attack surface (and complexity). Next I want restriction of all system calls on a process basis (if a syscall isn't whitelisted for this specific process it's an error). Once this is in place I might think about the kernel but in my scenarios this is rare. It really depends what your service does (what you want to protect).

It's imo less work (less maintenance costs) in the long run and always knowing what you have (and why) rather than having to think "oh I didn't know there was foo too? what does foo do"? Because as code changes and as you ship updates you will anyway be attending to updates that suddenly break your setup (and run against your whitelist). But doing this review/audit during an upgrade with a system that suffers from "feature and scope creep" is impossible (at least imho).

generally the less I have installed the less I need to track what might be suddenly a problem. Once my container/VM image is the way I want it I push it to my own docker registry and only deploy from there to production.

EDIT: really good talk on docker security (not just microservices) https://www.youtube.com/watch?v=346WmxQ5xtk

barbecue_sauce · on Feb 26, 2019

There are supported linux distributions like Container Linux intended for barebones deployments. Docker images layer, so you can essentially build your own base images from these primitive linux distros, and then have those available for your various applications.

webdevatlurk · on Feb 26, 2019

I think they mean we should prefer using stripped down base images, then installing ONLY what is needed for your application/ci+cd tasks to run.

djsumdog · on Feb 26, 2019

Everything inside your container shares the host kernel, so you wouldn't need a kernel (and I think most official distro containers don't have one either).

eeZah7Ux · on Feb 26, 2019

Use software packages and stay away from containers.

wwright · on Feb 27, 2019

I’m in awe that your experience is that that sort of behavior is rare.

Sanitary environments seem to be in the extreme minority to me!

Perseids · on Feb 26, 2019

The rise of Docker is in particular a pity, as we were getting it right before with Bundler in Ruby, pipenv etc. for Python, Maven for the JVM and NPM for Javascript (well maybe not right in the case of NPM, but better than Docker regarding this point): We declaratively describe which semantic version of a dependency we need, lock down the specific version used for testing and deployment (in a .lock file) and have an easy process to update the specific version to the newest semantic version available and test it before redeployment.

With Docker and its procedural dependency declarations ("apt-get install this", "wget | bash that") we've lost all that precious version information and reproducibility.

Imagine if Docker was like Bundler/Maven/… but for Debian/Ubuntu repositories. You could deterministically reproduce an image from an easily auditable package file and basically do an apt-get update on the .lock file to fix the security vulnerabilities in the selected versions.

mikekchar · on Feb 27, 2019

Randomly picking on your post, but I could easily have responded to a many in this thread. Docker doesn't solve the problem that you are complaining about. It doesn't even try to solve that problem. It's just an environment for building and running containers. You can pick any package manager you want with Docker. The fact that there currently aren't any good package managers to use with Docker isn't Docker's fault.

I think one of the big problems is that people are using insane base images (and by "people" I include myself, because I'm guilty of it too). A Debian distro is not an appropriate base image for a Rails server, for instance.

For what it's worth, though, there is nothing stopping you from using apt the way you want. You have to do it all by hand (i.e. specify the versions you want when you install them). The dockerfile is your lockfile. I have been thinking that something like Guix is a much better fit, though. I just haven't gotten around to trying it...

Gigablah · on Feb 27, 2019

There's alternatives to Dockerfiles for container building and maintenance, such as Packer [1] and Ansible Container [2].

[1]: https://www.packer.io/docs/builders/docker.html

[2]: https://docs.ansible.com/ansible-container/

FWIW I made a presentation years ago about this topic (it's in the second half):

https://go-talks.appspot.com/github.com/gigablah/talks/20150...

tannhaeuser · on Feb 27, 2019

But Docker is the problem, because all that Docker isolates you from is the self-made problem of mixed library situations on host systems, at the expense of loosing an easy ugrade path for said libs, running the Docker daemon as root, and loosing the host authentication context. A very bad trade-off.

gitgud · on Feb 26, 2019

Out of curiosity, what's not right about NPM?

joesb · on Feb 27, 2019

Even if you have `package-lock.json`, if there's new patch version of packages that still satisfy the version spec. NPM will install new version of that package on your machine and silently update your package-lock.json file.

Until recently, they don't follow immutable versioning. Publisher can remove package at will.

The NPM team looks like they ignore any knowledge any existing package manager before them ever did. They just happen to get included by NodeJS and become defacto by convenience.

mlthoughts2018 · on Feb 26, 2019

Also out of curiosity, what is right about pipenv?

alexanderdmitri · on Feb 27, 2019

Not OP, but given the theme of this thread, npm very directly enables library/dependency bloat, if not flat out encourages it.

I remember when I first starting dabbling in web development from an avionics systems background and I'd just made a little toy express app that did a couple tricks and I was kind of exploring and naive and took an innocent gander into that specially icon'd node_modules directory and it was filled with other projects and peaking into a few those I started to get this feeling of dread, the kind that starts in your stomach and slowly diffuses through the rest of you with that leaden feeling of growing hopelessness. Looking back, I should have stopped there, I really should have, but instead I popped open a terminal and, I can still hear each key as my fingers thumped it out: `tree ./node_modules`

Watching the unraveling was me bearing witness to everything I had ever known and loved and had believed in and strived for get thrown naked and defenseless and pathetic into the abyss and still it kept unraveling and still I sat and watched the descent until I no longer knew where I was or even who I was and still the modules unraveled relentlessly.

I spent that night looking into the mirror as if looking into the eyes of a stranger. The person staring back at me was someone I had not only ceased to know but had never known to begin with. And when the sun rose I had changed and I knew then as the sparrows began to chirp into the cold winter twilight spreading light over a now alien and unrecognizable planet that I had lost something I would never regain because to lose it is to simply realize it was never there to begin with.

I steeled my nerves and pulled on my itchy long underwear and buttoned up my plaid button up and I did not put on socks and headed downstairs to the kitchen and put on Now That's What I Call Music 23 and I drank a glass of orange juice and ate my oatmeal and the mix I had hitherto so much enjoyed now struck me as chaotic and ill-thought out and then Abba came on and the song was about a Dancing Queen and I resolved myself once more and brought my dishes to the sink to rinse them and decided there was nothing to do besides async.waterfall the next set of promises only after the necessary precedents resolved (which is how we spoke back in 2016s).

And I found myself back at my machine and worked like hell to reign my terror in because the tree command I had run the previous evening was still printing to stdout and I closed my eyes and blindly mapped my fingers to ctrl+c and felt just a little twinge of reassurance as the reflexive `clear` followed without me having to have consciously willed it and I took a deep breath and scratched my leg because of the itchy long underwear and then I was typing out `npm --i numbButAlive` and by the shift+b,u tap dance I began to feel hope one last time and finger resting slack on the carriage return I couldn't bring myself to push it yet for some reason and I froze and I clearly remember hearing Agnetha at that exact moment singing about that 17 year old dancing queen, "Having the time of your life, oh, see that girl" and then the hope was gone because it wasn't ever there it was really there like the hopelessness of hoping to feel hope and I felt an anger and I revolt in me rise up and the room filled with my voice "Fuck it, fuck it all!" and I meant it: fuck this world and fuck Abba and fuck Now That's What I Call Music Vol. 3 and up (with the exception of Vol. 7, maybe) and fuck all the empty dreams of all the fucking vapid nobodies polluting it with their nonsense parade of nonsense distraction after distraction after distraction from the truth that their life will end in death and so will their children's and their children's children and there's no greater significance to that, no hidden meaning or lesson, only death and emptiness and going through life knowing this and knowing how all everything you do to whittle away at the precious little time you've been alotted is a complete waste despite your protests and denials and self-deception and still you keep denying it to your dying breath because here even acceptance is denial in that acceptance is meaningless to that which is what is with or without your fucking acceptance and I found myself laughing a hollow, bitter laugh, a laugh that didn't sound like me, a sound I could never had made until that moment, this moment, a moment without joy or sadness or anything besides utter resignation and defeat and I tacked a `-g` onto the command for no other reason than fuck it, fuck everything and everyone and all their delusional basis for anything and it was easy to push the carriage return now and as I did so and I tasted something new to me then and it must have been my first taste of true freedom in all its terrifying glory, a freedom that leaves nothing to its beholder and is beholden to no one and ultimately sentenced to an eternity imprisoned within itself to as its becoming of self is just a much a defiance to what it should have been free to become.

markbnj · on Feb 26, 2019

> I feel like containers are alright, but Docker is a partial dumpster fire. cgroup isolation is good, the crazy way we deal with packages in container systems is not so good.

Can you expand on this? With respect to packages a container is just a mounted file system. A base image is just a way to start with a known file system state. You can do whatever you want to the file system state using whatever package manager you like. In what way should docker have attempted to exert more control over the file system to improve package management?

Perseids · on Feb 26, 2019

See my sibling comment to your post: The problem is exactly that it is just a mounted filesystem on which you can run programs. A container should not be described by a Dockerfile which is basically a glorified shell script with a caching layer on-top (and no concept of cache invalidation), but by a dependency file for a proper package manager.

Docker is my go-to example for worse-is-worse, because of that. They have solved only the easy problems and gotten a phenomenally approachable UI as a result (everyone who has used the console to install dependencies can write their own Dockerfile). But in the process they have occupied the niche in which a better packaging solution could have evolved and grabbed all the mindshare with enormous marketing effort (aided by an easy-to-use product).

pests · on Feb 27, 2019

A Dockerfile does not describe a container. A Dockerfile describes how to build a container image. Running that image creates a container.

You can create images in other ways than Docker or Dockerfiles (ocra-build, img). Other programs can run container images (runc, containerd).

For example, Google's BLAZE/BAZEL build software can directly output a container image (and upload it to a registry) and then you can run that with runc on any platform and you haven't touched Docker or a Dockerfile once.

markbnj · on Feb 27, 2019

I disagree. You seem to prefer a strong docker-centric idea of package management. I think that by drawing the line where they did they made it possible for anyone to use whatever package management scheme made sense for a given application. There are already so many alternatives for driving the file system to a particular state that I really fail to see how docker taking an opinionated position would have helped.

Perseids · on Feb 27, 2019

I think we have very different ideas about the problem docker is the solution for.

From the docker.com website ( https://www.docker.com/get-started ):

> Building and deploying new applications is faster with containers. Docker containers wrap up software and its dependencies into a standardized unit for software development that includes everything it needs to run: code, runtime, system tools and libraries. This guarantees that your application will always run the same and makes collaboration as simple as sharing a container image.

Basically I understand that as "write it on my machine, deploy it anywhere". "Everything it needs to run" are the dependencies in my lingo. So for me, all of this is dependency management. I have never asked for a way to drive a file system to a particular state, in the same way that I don't particularly care how a 'node_modules' folder is structured, as long as I can `require` whatever I want inside my programs.

(My point is muddied by the other task docker fulfills for me: Software configuration by creating a directory structure with the following access rights here, writing an nginx config file there. But for me, the ideal scenario would be to reduce the accidental complexity involved in the configuration (I don't care where exactly my program is stored and how exactly it is called, I just want to run it at much reduced privileges and the way I know how to achieve that is to create a custom user and run my program under that user) and define the rest declaratively.)

kayfox · on Feb 26, 2019

I work for a network hardware and security vendor and its utterly disheartening how many customers come to us and don't actually care about the impact of any of the vulnerabilities they ask us about, they just care about the CVSS score, its PCI impact and their often bizarre policy about them. Theres often less concern about actually doing something about security risks and more concern about meeting their compliance goals. Now, this may be biased by who actually reaches out, but it is scary that big names have underlings who dont know the first thing about some of the security issues their "investigating".

In another discussion the other day, I had heard programming these days compared to slowly transitioning out of the hunter-gatherer phase and into more structured society. From what I have seen this largely rings true, we are still relying on software that is largely not engineered, but written with loose engineering. The security industry seems to largely be like this, but more of a wild west (as depicted in Westerns) feel to it. Some companies and organizations have structured strategies for security, but even in large organizations like Equifax theres still a kinda "go shoot the bad guys and tie up the gate so the cattle dont get out" aspect to it, very ad hoc.

I am hoping the industry moves more towards engineering things, standardizing interactions, characterizing software modules, etc so that the security industry can spend less time on wild goose chases when trying to figure out how something is supposed to work and how this latest vulnerability applies to that.

tannhaeuser · on Feb 26, 2019

> I am hoping the industry moves more towards engineering things, standardizing interactions [...]

Good luck with that. From my PoV, the "industry" has moved away from standardization ever since around 2008 when the "consumerization of IT" became a thing. Almost no meaningful standardization has been carried out in this decade. Previous efforts are being derided (XML bashing etc.) by people who haven't experienced 1990s lock-in. The web is driven ad absurdum with folks standing by and cheering. Enterprises are catering to the uneducated with "REST microservice" spaghetti, and younger devs mostly get their information from random (or even targetted) product blogs on the 'web.

jacobsenscott · on Feb 27, 2019

For companies that need to meet regulations compliance goals do indeed trump fixing real security issues. Companies can go out of business or lose big clients if they don't get PCI certification or whatever. The executive in charge of meeting compliance goals has a very clear pass/fail - the company is certified or it isn't. The policies are bizarre because they are driven by auditing firms who are incentivized to make compliance a lot of work, but attainable. A lot of work so they can bill a lot of hours. Attainable because nobody will hire them if they don't provide certification in the end. Effectiveness of the policies is not a factor. It is an entire universe of perverse incentives. The engineers toiling away to meet the goals know it, but there's nothing they can do.

tracker1 · on Feb 26, 2019

It comes down to risk, cost, reward. If it costs you one developer 3 months to build a utility used by 5 people in your company, but would cost 3 years for a team of 5 to write an "engineered" version with security focus, it may never happen.

It depends on need and risk.

flukus · on Feb 26, 2019

> It comes down to risk, cost, reward.

In theory I agree that there are trade offs like this, but in practice I rarely see them being applied properly. A small startup up using electron to build a cross platform app for instance, I can see how that's a good trade off, but then you see multi-billion dollar companies with hundreds of devs and millions of users building electron apps when they can easily dedicate the resource for native ones.

Security tends to be similar, giant (non-tech) companies with lots of important data are the ones that optimize for cost the most and don't care about the risk.

tracker1 · on Feb 27, 2019

I'm not sure that I entirely agree... VS Code is really well supported across platforms that otherwise may have been left behind. Contrast to say MS Teams, which doesn't have as broad support outside Windows/Mac, that irks me more.

I'm a pretty big fan of Electron + Cordova to reach a broader base of users. I don't think it's a net bad as a user who prefers the same tools on windows, linux and mac as much as possible.

There are a lot of things you get in the box with a browser based platform beyond the cross browser support. I mean even reflows and alignments working right are far more consistent, more easily. CSS/Styling is the same as the browser which is very flexible/capable. Some may dislike JS, but it gets the job done.

But on the flip side, I've seen people build an entire application (executable) or service from what could be a simple script in any given scripting language.

pojzon · on Feb 26, 2019

I often compare that to the professional approach of picking correct tool to solve a problem.

If client doesnt need extensive security i would strongly advice him to make compromises to push his product out faster.

Professionals should put client needs at first place and help meet them without making the client run out of money running after some Utopian image he does not even need.

There is a saying which also expresses the same intent: "life is a series of compromises".

Frost1x · on Feb 26, 2019

If you've ever worked in the safety industry, it's not all that different. When it comes to safety features, compliance, and training, most companies take the same approach at meeting minimum legal requirements and passing liability wherever they can for cost savings. If they can cut corners and not hurt profit margins, they will gamble.

If businesses tend to ignore safety and gamble with respect to human life (in scenarios where it results in net cost savings), I have no remote disillusionment that they'll care about data security in cases human life isn't at risk (again, unless it's shown to reduce overall costs and convince management of that).

solatic · on Feb 26, 2019

> Can an attacker actually exploit the vulnerabilities to do bad things? The answer for almost all of these CVEs is "no".

This is a really dangerous way of thinking of container security.

First of all, if I give employee A a list of 500 "vulnerabilities" to fix, and employee B a list of three real, serious vulnerabilities to fix, employee A is much more likely to wait for the problem to go away and employee B is much more likely to find the issue approachable and get it resolved. It sounds like a big difference until you understand that actually employee A was given the same three serious vulnerabilities plus 497 "unexploitable" vulnerabilities, and just didn't know which was which because the three serious vulnerabilities got lost in the noise. You need to instill a zero-tolerance culture to make sure that you don't let serious vulnerabilities stick around. Compliance regimes acknowledge this which is why they're ultimately reasonably effective at getting large, lumbering enterprises to be secure.

Second of all, while much of the open-source software inside your container may not be directly invokable without an exploit, the fact of the matter is that virtually no organization is subjecting every release of software produced in-house to rigorous security auditing. Yeah, your software needs to be pwned before those vulnerabilities matter, but Murphy would like to remind you that your software was being pwned while your wrote your comment and that the attackers have exploited the other software in the container while you were reading mine. And maybe it makes a difference, for example, if your containerized service runs under a limited-privilege service user but a vulnerability in adjacently installed software permits the attacker to escalate to root within the container.

You're right in that most orgs probably have lower-hanging fruit that provides better bang for the buck to improve their org's security posture. But adopting an attitude of "meh, not all CVEs are really CVEs" is irresponsible at best.

DCKing · on Feb 26, 2019

> This is a really dangerous way of thinking of container security.

Isn't "really dangerous" a bit hyperbolic here? I'm describing a process by which you figure out the actual risk of vulnerabilities before treating them further. You can find quotes in my comment to make it looks worse than it is, but I was not expecting to find the process I described to be controversial.

But yes, it's true: I'm advocating a risk-based approach to such vulnerabilities rather than a compliance-based one. I guess which is better depends on organizational fit and personal taste.

I'm also confused what to do with your example of "fixing container vulnerabilities" in this context of base image vulnerabilities. Both employees A and B would have to fix their set of vulnerabilities by either (a) updating the vulnerable base image or (b) switching to a different base image. Fixing base image vulnerabilities is not the pick-and-choose versus all-or-nothing affair you seem to be describing.

solatic · on Feb 26, 2019

> I guess which is better depends on organizational fit and personal taste.

The assumption is that we're talking about big-enough orgs here. If you're running a small enough org, take a weekend, put your system in distroless containers and be done with it. Container scanners don't add much value to small orgs to begin with, their whole value proposition is "you run so many containers that you can barely keep track of them, so here's a tool that helps you understand the true state of things." Process and compliance are practically a given.

> either (a) updating the vulnerable base image or (b) switching to a different base image

Non-trivial in practice. Hopefully your org has standardized on a single base image which the org maintains and takes responsibility for, so (b) is a non-starter. If you could just update the base image fleet-wide overnight without issues then we wouldn't need containers in the first place; if you tried to do that twenty years ago you'd instantly cause rolling outages (now you'll roll back after your canary dies, but it's a moot point, you still aren't de-facto instantly updating). Containerization made it easier and safer to deploy services, but it didn't give you a "click here to magically update everything with no risk of rejection" button. Vulnerable services have often been vulnerable for long periods of time, with complicated update paths, possibly needing in-house patches, etc.

DCKing · on Feb 26, 2019

Well yes, we're talking past each other then. I'm coming from a perspective where I get the space to actually perform risk assessment and where that actually matters. I don't need to comply with a needlessly highly regarded CVE database, nor do I have trouble communicating this. Furthermore, I still find value in vulnerability scanners - to a limited extent - because they allow me to automate manual work.

If you are working in a more political organization (that is not a value judgment - that often comes with organization scale) then other things influence you processes. I'm sorry, but that's not the perspective I take. That doesn't make my approach any more dangerous though - I think it's an appropriate perspective and I'm happy I can take it.

spydum · on Feb 27, 2019

If I may: > I'm describing a process by which you figure out the actual risk of vulnerabilities before treating them further.

The problem here is that your assessment of how a vulnerability might be leveraged or accessed is bound by your own teams limited knowledge. The reality is, the attackers knowledge and creativity is more or less unbounded (and unknown). So making that judgment call of what is a real risk, vs having zero tolerance is a huge gamble IMHO, especially if your teams are not red team wizards.

DCKing · on Feb 27, 2019

No security team can defend against unknown or any degree of unbounded controllers. Everybody has a risk tolerance of some degree.

Moreover, it seems you're stating that "no tolerance" should focus on having no CVEs in container images. Does the CVE database really have that level of authority for people? It seems like the wrong thing to focus on even in these hypothetical no tolerance situations, I'm really not sure what to tell you there.

spydum · on Feb 27, 2019

I mostly agree that it’s nearly impossible to end up with a container image with zero CVEs listed unless you are some sort of wizard. However I think images being built and deployed when there is an available patch is foolish (CVEs without patches are different story).

ben509 · on Feb 26, 2019

I agree that a simple count of vulnerabilities is not useful for determining the actual security of a system.

But it is useful for maintaining a system if you examine this as an ongoing business process in which you're continually trying to minimize a set of "unknowns". For third-party libraries, I argue it's generally cheaper to get rid of unknowns (when you can) rather than take the time to quantify them. What's left over is easier to prioritize.

As you point out, all these things are probably not vulnerabilities, but they might be. What's the likelihood? Well, by upgrading or patching, the probability becomes zero, and then you can stop caring about exactly what it is. Patch it, move on.

(And, to be clear, some unknowns are related to your code, so you do want to investigate those, but those are presumably the unknowns that you have the most expertise with, so they're much cheaper.)

juliusmusseau · on Feb 26, 2019

Exactly!

It often takes more time to assess whether your system is truly vulnerable to a given public exploit than it takes to just grab a newer version of the component.

Also worth considering: getting pwned because of a 0day is no fun, but getting pwned because of an unpatched CVE in your system - priceless.

Kalium · on Feb 26, 2019

One thing to bear in mind when evaluating the significance of vulnerable libraries is that there are different degrees of owned-ness.

* The ability to get a containerized app to promote you to an in-app admin

* Getting RCE as the application user

* Escalating from the application user to the container's root

* Going from container root to attacking the host

Each of these represents, broadly, an increase in threat. Each attack can be aided by outdated and vulnerable versions of libraries or utilities. It's not always obvious what in your container can be attacked or used to escalate, and how a developer intends a container to be used isn't always a good guide.

Designing for safety means designing for safe failure. Designing for security means designing for being pwned and minimizing the blast radius. The common term of art is "defense in depth".

cryptica · on Feb 26, 2019

>> Can an attacker actually exploit the vulnerabilities to do bad things? The answer for almost all of these CVEs is "no".

Yes this is so true, almost all of the time, the vulnerability is only a vulnerability under certain very specific scenarios which are not relevant to the project. For example, it might only be a vulnerability if you pass user input to the function but for all other possible use cases, it's not a vulnerability at all... In fact your system may not even be using that function at all. Snyk will still mark all dependent modules as being vulnerable; but it's a lie; I feel that they are intentionally overzealous just to grab people's attention to their platform... Much like this post "Top ten most popular docker images each contain at least 30 vulnerabilities" - It's attention grabbing but it's not true. The real title should be "Top ten most popular docker images each contain at least 30 possible vulnerabilities none of which are actual vulnerabilities"

I think that Snyk has been very useful for the Node.js ecosystem in terms of encouraging module authors to get rid of unnecessary dependencies but it doesn't change the fact that Snyk is a liar and that we should be cautious with it (some misinformation can be a catalyst for positive change, but too much can be dangerous).

The bad thing about Snyk is that they can only publicly shame open source projects; not commercial solutions (which are usually far worse). They should definitely try to make a distinction between 'vulnerability' and 'possible vulnerability' because it's becoming downright deceptive and it's going to start hurting open source as a whole.

Either they should fix their platform to have fewer false positives, or they should fix their communication around it so that they're not blatantly lying to people and harming the reputation of open source developers who are producing high quality, secure code.

lol768 · on Feb 26, 2019

Completely agree. I found with work projects that often, it's client side libraries it flags up with e.g. a regular expression denial of service vulnerability in it that needs some crazy specific conditions to be met for user input to reach the part of the library that eventually uses it in a regex.

objectified · on Feb 26, 2019

I think it's important to be careful with drawing conclusions about what might or might not be used by whatever is running in your container and exposed to end users, if only for the simple fact that it might not be too obvious how a particular vulnerability gets exploited in the wild. CTF writeups often illustrate very nicely how there are quite creative ways to exploit vulnerabilities. It's not safe to assume you can predict every scenario. The path to an attack can be very complicated (but easy to execute, once the exploit is scripted).

I agree that the article doesn't emphasize what is the actual important point here (it is clickbait-ish), but the numbers they're presenting should (hopefully) trigger people to actually think about putting "continuous eyes" on the container images they're using. Just like you should continuously monitor your code, your application's dependencies, and your host system libraries.

A hacker needs to be right once, you need to be right 100% of the time. That's not marketing.

robszumski · on Feb 26, 2019

Quay.io's scanning feature does one really cool thing: it tells you the version of the package that actually fixes the CVE, eg:

CVE-2017-15232 libjpeg-turbo

Current version: 1.5.2-r0

Fixed version: 1.5.3-r0

This actually helps you get actionable data. You can even sort by "stuff I can fix".

DCKing · on Feb 26, 2019

Although I'm giving Snyk (at least their marketing guys) a hard time for this, their tool does the same. An awesome feature is that they can actually open pull requests with fixed versions (if available) in your repository to get this stuff fixed. Depending on your setup, this means you immediately get continuous integration results and that a developer can quickly determine to perform an automerge (which in case of minor version updates you usually can). Really cool stuff.

tapland · on Feb 26, 2019

'My image only has one vulnerability! (it's a terminal that's reachable from the public internet)'

gpm · on Feb 26, 2019

'My image has zero vulnerabilities! (it's a proprietary terminal that's reachable from the public internet)'.

The flip side of "this reports too many vulnerabilities" is "this reports too few vulnerabilities", it should always be made clear we are talking only about publicly known vulnerabilities, which is a subset of all discovered vulnerabilities, which is a subset of all vulnerabilities.

rkeene2 · on Feb 26, 2019

Hi, I have a terminal that's reachable from the public Internet that lets you execute arbitrary code: http://rkeene.org:8080/

cellularmitosis · on Feb 26, 2019

What you are describing is known as the distinction between a "local exploit" vs. a "remote exploit".

https://en.wikipedia.org/wiki/Exploit_(computer_security)#Cl...

Side note: this is why it is more difficult (from a security perspective) to run a computer lab than to host a web application. Much greater attack surface area when you have users who have shell access.

DCKing · on Feb 26, 2019

No I'm not.

It doesn't matter whether an exploit is locally or remotely exploitable potentially. It matters whether it's exploitable for an attacker.

For example, CVE-2017-5645 is a remote code execution vulnerability in Log4j that will light up your vulnerability scanner like a Christmas tree, but requires you to use Log4j functionality that you will never realistically use in an application container.

dcow · on Feb 26, 2019

What’s alarming to me is not the marketing nature of the message, it’s the fact that you pull a container in its entirety. A container is not a single CVE. CVEs may be evaluated against the worst case scenario, but they’re not evaluated against each other. When you stack vulnerabilities you get difficult-to-detect sophisticated attacks. Not many ATPs I can recall rely on a single vulnerability.

Sure it would be nice to see some examples where chaining the CVEs from a popular image can lead to an actual attack, but I wouldn’t write off problem just because someone is marketing their product with it.

danbolt · on Feb 26, 2019

I’m a hobbyist game developer and I sometimes package web games with Electron. The build/package scripts for the games usually have a small set of npm dependencies, and I’ll usually get a warning indicating that there’s a vulnerability every once in a while.

I usually don’t look into them, since the script rarely gets run and isn’t exposed on the internet, but I do wonder if there could be any real vulnerability.

tracker1 · on Feb 26, 2019

Depends on what modules are used, how they are used and how/when your builds happen and are/were configured.

maerF0x0 · on Feb 26, 2019

IMO the reason why vulnerabilities matter is because you dont want to allow additional area of attack _if an attacker breaches a layer_ . This is similar to defense in depth. We dont just want a shell of security, but also additional challenges for when each layer is breached.

DCKing · on Feb 26, 2019

I'm really not talking about defense in depth. There are vulnerabilities that are not relevant to any layer of defense, such as the hypothetical grep vulnerability I talked about earlier.

Taking a risk-based approach doesn't get you to skip out on thinking about any category of vulnerabilities, at any layer of defense.