The Configuration Complexity Clock (2012)

lmm · on July 9, 2019

Good analysis but too timid in its conclusions.

At every level of complexity, hard-coding a solution is the least evil option. Your codebase is a living expression of your business rules, and it's already using the best format you have for expressing your business domain logic (if this isn't true, use a better language). When the business rules change, the codebase should change. You already need to be able to deploy code changes quickly (e.g. to fix code bugs), so having to do a release/deploy to make a change to your business logic should not be a scary proposition; if it is, improve your release process.

Hard-code everything.

cle · on July 9, 2019

There is a tension between delivery speed and operational safety. The faster you deploy new service code, the faster you can take down your service because of a bug that slipped past your automated tests. This is why one-box deployments and traffic shifting and feature flags exist (which are increasingly config-driven).

And if you have multiple instances of your service running (e.g. regional endpoints), and you care about availability, then you'll be deploying sequentially to each endpoint and watching for anomalies before moving to the next endpoint.

There is still immense value in dynamic config in these scenarios.

lmm · on July 9, 2019

> There is a tension between delivery speed and operational safety. The faster you deploy new service code, the faster you can take down your service because of a bug that slipped past your automated tests.

100% agreed - but people miss that this applies just as much to deploying a "config change" as it does to a code change. Particularly if you're talking about feature flags, a config change and a code change are effectively equivalent - so you should apply the same standard to both. (What that standard should be is still a tradeoff)

> This is why one-box deployments and traffic shifting and feature flags exist (which are increasingly config-driven).

> And if you have multiple instances of your service running (e.g. regional endpoints), and you care about availability, then you'll be deploying sequentially to each endpoint and watching for anomalies before moving to the next endpoint.

I would agree that feature flags are config-like - and view them as the same antipattern. Traffic shifting and sequential deployment are great ideas, but they work great when deploying a code change. Make the change, gradually deploy the new version, roll back if needed. What does dynamic config gain you in that scenario? Only more possibilities to make mistakes, IME (e.g. the new version looked great on the "canary" node, but actually that instance had a different config from the other nodes).

HelloNurse · on July 9, 2019

Not all software is "your service" that you can "take down" to install changes; there are cases in which there are multiple deployments with essentially independent and different configuration data that cannot be part of program code.

A popular example: the Git client, which stores layered configuration files for repositories, users and installations including, for instance, local file names (such as diff tools) and user identities.

Git also serves as an example of how the line between configuration files and data can be blurred: it's customary to put in Git repositories .gitignore and other similar additional configuration files because it's useful and the software only cares about their presence, not their management.

hinkley · on July 9, 2019

The best thing you can do here is that if you use "8080" in a bunch of places in your code, replace all the ones that mean the same thing with a constant (eg, what port I'm running on), and the ones that mean something else (eg, what port the auth token engine is running on) with a different constant. Better still, use the URL library from your ecosystem to do this, and pester your 3rd party library authors to stop parting out URLs. If I had a dollar for every URL interpolation bug I've had to fix...

That way when you have a customer whose antivirus is running on 8080, you can finally pull it out into config files, and you 'just' have to change the people who consume the constant to pull from config. Because you've already telegraphed the intent to do this.

A few jobs back I saw another side benefit to this: you can entice people to participate in improving the code. When code is really wrong people will ignore it. They can't be bothered to get invested in it. But when it's almost right they are often motivated to fix it the rest of the way. Either they see the potential and are inspired (new blood, trying to make their mark), or the unfulfilled potential grates on their sensibilities (crotchety veterans).

mrkeen · on July 9, 2019

Yes!

Source code is config which type-checks.

If you want to "reconfigure" your app while it's running, why not give it some input, rather than tracking down a textual yaml file and editing that.

tempguy9999 · on July 9, 2019

> If you want to "reconfigure" your app while it's running

I don't think anyone was implying having a config file obviated a program restart. I'd add that restarting an unchanged executable with an altered config file is going to be considerably faster than recompiling it.

> why not give it some input

If that's not a config file, then what are you suggesting?

lmm · on July 9, 2019

> I'd add that restarting an unchanged executable with an altered config file is going to be considerably faster than recompiling it.

Why should it be? We presumably have a VCS tag corresponding to the current version such that it would be very easy to check out the corresponding version and change only one file. Then an incremental rebuild with just that one file change should not take a significant amount of time. The business already needs to be able to fix code bugs quickly - hoping that any given issue could be resolved by a config change rather than a code change is not a sound strategy. So you need to have a process for deploying code fixes fast, and then you can just reuse that process.

For large systems compilation time is not actually the bottleneck for deploying a change - rather it's the testing (particularly integration testing). But actually a config change is just as dangerous as a code change, and the same level of testing is usually appropriate. In my experience many - perhaps most - production outages tend to be caused by "config changes".

> If that's not a config file, then what are you suggesting?

At some point an application does need to have dynamic behaviour in response to some form of input. But I'd argue that we're quite good at dealing with the things we see as first-class input - we know the importance of validating input before processing it, testing different input-handling code paths, and so on. "Config" tends to not be treated the same way - all too often the testers see it as part of the code and don't test config changes like other user input changes, but the coders don't test them like code changes either. It's a dangerous ambiguity; the system is made more robust by forcing everything to be one or the other.

tempguy9999 · on July 9, 2019

> an incremental rebuild [...] should not take a significant amount of time.

> But actually a config change is just as dangerous as a code change, and the same level of testing is usually appropriate

Points well made, unarguably so

> In my experience many - perhaps most - production outages tend to be caused by "config changes".

I have no comparable experience but it sounds all too plausible and I shall bear that strongly in mind henceforth.

> "Config" tends to not be treated the same way [as code] [...]

From there to the end is brilliant. Thanks so much for such an excellent response. I'm going to print this out and keep it.

collyw · on July 9, 2019

Are you being sarcastic?

tempguy9999 · on July 9, 2019

This is totally wrong.

Source: decades of experience.

tempguy9999 · on July 9, 2019

To whoever downvoted me, perhaps you're right. At the very least I should have tried to say why.

OK, let's try and interpret the parent post in a way that makes sense to me.

If your config data is effectively embedded in code and well isolated from the rest so that it could be extracted into an ini file easily, I'd buy that. I guess you lose flexibility as you have to recompile, but ok. Perhaps it's safer too.

> and [the code is] already using the best format you have for expressing your business domain logic

Well, often a good format for an abstract statement of actions is a list or grid of data which drives the code's decisions and actions. In that sense, if you've respected that, that's part of your config file (whether external, as a file, or embedded in the code though hopefully well isolated).

If none of this is your intention, could you let me know what you're thinking, with concrete examples please?

JLangley · on July 9, 2019

REALLY worth reading the Graham Poulter comment in the original post about the difference between spatial and temporal variations.

The idea that you can hard-code everything assumes there is no variation in deployments and use of a piece of software.

If that were the case, you're talking about a SUPER simple piece of software. So sure, hard code everything.

But as soon as you're talking about a piece of software that will be used in multiple locations and with possibly different release / roll-out schedules, "there will be issues".

catern · on July 9, 2019

That comment describes a real issue that the main post overlooks. But the comment also overlooks the obvious solution. It says:

>While "temporal" variations can easily be hardcoded if you have a short release cycle, "spatial" variations are not so easily hardcoded: you end up maintaining a source branch for each active variant.

But that's not the only way. There's a way that we are all very familiar with for customizing a component for being used in different situations: Passing in different arguments to the constructor.

If you have two places you want to deploy a piece of software, which have different environments and need to, e.g., access resources at different paths, then just have two main() functions, each invoked by one of two different executables. Those different main() functions can then hardcode all the specific details of whatever place you're deploying to, in your normal programming language, without you having to create configuration files or anything.

vinceguidry · on July 9, 2019

The driver here is needing to make changes to the behavior of the app in specified ways, faster than the release cycle. My advice is to go no further than key-value configuration settings, and keep a documented set of Postman requests in the repo to serve as your UI for developers to invoke.

This way you get the immediacy of being able to change prod behavior outside of the release cycle, the safety of knowing only your devs can make those changes, and the ability to easily build a real UI later if the hidden features become features you want visible to non-technical users or your customers.

A rules engine is where the descent into madness begins. Every single thing the rules engine tweaks needs to be an actual feature with actual RESTful routes dedicated. Overloading a configuration regime, which is only supposed to handle keys and values, into the key instrumentation for the entire application, bolts inevitably poorly-documented semantics onto the application.

Different devs or departments will see the two competing regimes and pick whichever one they like the most to add on to. You'll end up with two kingdoms at war. You want peace reigning throughout your empire.

Configuration is part of your application infrastructure. Rules engines generate competing semantics. Semantics are how the brain understands systems. You want one overarching paradigm, one source of truth for how things get done in your application.

hinkley · on July 9, 2019

Back pressure is an important concept in distributed computing. It's also an important concept in project management.

By and large, "We need changes faster than our release cycle" is a condemnation of your release cycle, your definition of "need", or both.

I've 'forced' a fast, low-stress build pipeline on several teams and nobody wants to go back to before. And much to management's surprise, people will defend it after I'm gone (I think they see it as me being control freak bossy, rather than personal trainer bossy).

They also participate in pushing back to a degree on these "emergencies" because we have a "fast enough" that is reliable enough that half the team is willing to be responsible for pushing the buttons. No, you can't have that before lunch. You can have it tomorrow morning, like we always do. Maybe you should think things through a bit more in the future?

My current project was so big and Balkanized that I haven't found enough cracks to operate this way. So things are better, but feel worse in some ways because it's gone from nonspecific pain to specific pain. This is going to be the first time in 15 years I haven't left a team with a good CI/CD pipeline.

But at least all the things you can change in between releases will be under version control.

avip · on July 9, 2019

>Initially there was hope that non-technical business users would be able to use the GUI to configure the application, but that turned out to be a false hope

Entire companies, products, and developers encapsulated in one beatiful sentence.

mti27 · on July 9, 2019

I once worked at a place that reached DSL on the clock; no one understood the undocumented DSL (not even the programmers supporting it) but a few power users. I advocated re-writing rules in Python, using modern CI/CD techniques to allay fears of hard-coding. But it was too big of a philosophy change. The counter argument was "We don't want end users writing code!" but of course they were already writing code, just in a non-Google-able language...

dang · on July 9, 2019

A thread from 2017: https://news.ycombinator.com/item?id=14298715

2016: https://news.ycombinator.com/item?id=11155128

proc0 · on July 10, 2019

Good article, interesting topic. I think this clock represents the layers of abstractions an application or piece of code goes through during its lifetime. At first, no abstraction, just literal values. As you abstract the code you use the language as a tool to hide anything that repeats and expose only the essentials (a.k.a. abstracting). After DSL's at 9am in order to avoid getting back to hard coded values at 12pm, you HAVE to start abstracting at the conceptual/"business" level of the application. This would be the domain, however if the domain, when implemented, is causing a trip around the clock, this means the initial concepts are breaking the abstraction.

alok-g · on July 9, 2019

Here is the guidance I usually provide:

Problem Statement:

There are application features and behaviors that all users need, majority of them need, a minority need, some need. Users have personal preferences and individual information that the software needs to know.

We tend to handle some of the above using code, some using configurations.

Guidance #1

Configuration is not a solution across the board. Configurations are often a premature optimization towards saving future efforts.

Code _is_ configuration for the processor. Coding has development methodologies and tools designed by the industry over decades, most of which is not applicable to configurations. So don't make code run-time configurable. Change the code as and when needed. Refactor.

Exceptions:

- For personal preferences and individual information.

- If runtime behavior of the code must be changeable without rebuilding the code.

Guidance #2

If the software behavior can be changed with lesser number of lines of configuration than the number of lines of code, develop better abstractions in the code. (Do not invent DSLs, create better abstractions in the code itself.)

Keep code configurable via hard coded configurations at an appropriate place somewhere within the code. This encourages modularity. However, limit the flexibility to at most 30% development effort overall overheads above and beyond the currently known requirements. If development efforts overheads for the flexibility is much more, that flexibility is a premature optimization (keeping in mind, you aready have flexibility via ability to change the code). If you are not thinking above and beyond the currently best known requirements, you may find the requirements changing faster than what you can keep pace with.

Guidance #3

Instead of configurations, find a more specific alternative.

- Machine Learning models are technically code configurations, though we do not see it that way. ML comes with needed tooling to manage.

- Knowledge graphs.

- Data exchange file formats.

- Etc.

Guidance #4

Configurations are not for SDEs. Identify the owner who would be responsible for changing the configurations and see them as customers, in the current phase of development. Think of what help and tools are you providing them to manage the configurations.

Guidance #5

Do not let the space of configurations multiply. Configurations parameters must be modular (i.e., independent) just like code.

Just as functions having more than three parameters should be avoided, same applies to configurations impacting the behavior of a function. Avoid more than three of them taken together for any function.

Guidance #6

A configuration is also a contract. Pay no less attention to it than to function interface or API design.

Configurations need to be equivalently documented. Think of them as command line arguments. If the user needs to know the implementation internals to understand the command-line arguments, default against having them.

Guidance #7

Backward compatibility and blast radius reduction are not valid arguments to have distributed configurations. Depend on automated and manual testing instead. Reason correctly about how many of the users would need the variation in software behavior.

If a code change is to be made (i.e., it makes sense), try to apply it everywhere. (I presume when Microsoft went to fixed and mandatory Windows update cycles, it would have helped them a lot.)

Guidance #8

If different Product Managers serving different regions or types of users ask for different requirements and you as developers see no valid reason for it, make them talk to each other and document their collective reasoning before getting back to you.