All posts in Configuration

Simple Steps to Operations Team SuccessWhen you work in software operations supporting software applications, there are plenty of practices and tools you can use to be more efficient at your job: keep all of your scripts and code in source control (just like developers do!), maintain proactive monitoring, work closely with your development team on operational requirements, read DevOps on Windows (ahem.), and much more.

But even before all of that, what’s one task operators have to do frequently? Make changes to the production environment. And if you take away all the fancy tools and processes around implementing and managing these changes, there are three simple, non-technical steps you can do to help ensure successful, and hopefully panic free change implementation.

1. Plan what you’re changing upfront

This may sound obvious, but doing something as simple as putting your change steps/plan down in writing can be a huge help. Let’s say you’re responsible for moving some application from one server to another. You’ve done it a hundred times in production or otherwise, but next time before you do it, try writing it down. Your document doesn’t have to be some big bureaucratic process or huge, highly polished document. It can be as simple as a listing out of what you’re going to do to implement a given change. And it doesn’t have to be the process document for now until the end of time for all cases, just what you plan to do this time.

Of course ideally a task like moving an application isn’t a terribly complicated, multi-step, manual procedure. But either way, writing your plan down in a document (like on a wiki, some internal document management system, an issue tracking ticket, or whatever works for you) is the perfect way to organize and solidify your thoughts on the process. And it has two great side effects – it can help eliminate a little bit more of your organization’s bus factor, and it can help create or add to an internal knowledge-base if your plan document is in a location that is accessible by the whole operations team.

2. Plan for rollback

Bad ThingsTM happen, so always have a rollback plan ready ahead of time. In my experience, very few changes are completely irreversible. (And if and when changes are irreversible, you better damn well make sure that’s crystal clear to your team and all the stakeholders!) It’s only to your benefit to think about and document a rollback plan. In the heat of the moment, when something goes wrong during a change implementation, it’s nice to be able to refer to a document or procedure that clearly states your path out of the mess and at least brings the system back to the state it was in prior to the change.

And if you took the time to plan up front to begin with, odds are your rollback plan is basically already written – just reverse the steps in your original plan. Again, it doesn’t need to be a highly polished rollback procedure for all cases until the end of time, just a rollback plan for the task at hand. Of course not every change is trivial enough for a quick document, but hopefully you get the idea how planning out the change up front can guide a rollback plan.

3. Validate what you changed

In test-driven development, a common (nay required?) practice is when a production bug is discovered, the developer first creates a unit test to reproduce the issue. Of course, the test will fail against the current production version’s code base because the issue is still in the code. Then once the bug has been fixed, the developer has greater confidence that the issue is truly resolved because their unit test validated the change for them.

In software operations you can apply a similar concept to your system changes. How can you know your change was successful if you can’t validate it? Do you just hope it worked? Do you wait for a user to call and report the issue again? By asking questions of your teammates and/or system owners/subject matter experts, no matter how stupid the questions sounds you can increase your own confidence and the quality of your work simply by validating your changes.


Again, these steps may sounds like common sense, but a little bit of reflection on what you’re actually doing can go a long way towards increased quality when it comes to making changes to a software system.

And as an operator, always remember – your number one goal is the stability of the production environment which ensures your organization can keep doing what it needs to do, and you accomplish this goal not by blocking change to the environment, but by enabling change to the environment.

Mrowr Central Configuration Meow Mew!Meow meow mew?

Mrowr mew meow burrurr mreow DevOps – cross-team collaboration mreow mew mew mew maintainable software. Meow mew not more automation meow mur – browr mew configuration meow mrowr meow mur.

Meow mew mmmmmew meow config files burrowr “single source of truth” mreow. Mrowr browrrrr central configuration service mreow meow mew, mur, browr. Meow meow meow murrr configuration data murr meow not XML file.

Meow central configuration service mew mrowr burr key-value store mmmew. Murrrr key-value store, mew mmmew. Meow mmmew mur production, mmmew mur development; mmmew meow UI mur mmmew meow server. Mreow meow mew mew meow “context” meow. Mmmew, mmmew, mmmew meow mew. Browr browr meow browr.

Meow, meow browr key-value store. Meow client API mew meow “context” mrowr meow query meow service. Meowwww “context” meow mmmur environment, meow application name, meow “instance” meow application, meow user name, meow machine name, brrowrowr mreow architecture. Browr mew mreow meow client API mew mew mur mew local file-based backup configuration data mreow browr meow mmmmur fall back mreow meow mreow meow central service mew unavailable. Browr mew mreow meow application code — meow mew mur not provide “default” value meow meow configuration data – meow mur mur configuration burrr meow from service. Meow moew in code mew mew meow violates DRY, brrrrowr mew undermines meow single source of truth.

Mew mew meow mainstream DevOps mew “configuration as code” – mrrowr mew misguided. Configuration mew not mewww operational concern mreow. Configuration mew mew meow mew first-class citizen, meow configuration management browr mew burrr managing your data, meow miaw not suite of automation scripts. Meeeow configuration right meow moew requires application to change. Mrowr mew DevOps — burrurrs meow, meow mur MEW!

(This was a guest post by the DevOps on Kittens’ Chief Cat, Azrael)

Azrael Azrael


Previously, we discussed the importance of centralized configuration management and proposed the context-aware key-value store as a means of implementing it. We have outlined the basic API for such a system, but never discussed an implementation. We propose to do so in this article.

Before diving in, let’s quickly review the API and requirements. A traditional key-value store has the following interface:

string GetValue(string key)
void SetValue(string key, string value)

There is also a use case for deleting values, but we can ignore that for the purposes of this discussion. In a standard key-value store, there is a single value for a given key, and looking up the value is straight forward. In the context-aware variant, the value can be different depending on the context. This implies a slightly more complex API:

string GetValue(string key, Context context)
void SetValue(string key, Context context, string value)

The context can be implemented as a string-to-string dictionary where the key is the “dimension” (environment, user, etc) and the value is the context’s “location” on that dimension (prod or dev for “environment”, for example). The SetValue function records an explicit value for a key in a specific context, but the GetValue function is free to return values that are not explicitly set in the requested context – it may return a value that is explicitly set in a context that the provided context “inherits” from. Herein lies the power of this approach for centralized configuration – it allows us to set common properties once, while retaining the flexibility to override values in very specific situations if necessary.

But how does the GetValue function figure out what the value should be if it is no longer a simple lookup by key?

We know that the GetValue function will either return null (in the event that no appropriate value can be found) or one of the values that has been explicitly set for the requested key (it would make no sense to return a value that was set for a different key). So, we start the lookup algorithm by fetching all the values set for the requested key, regardless of the context in which those values are explicitly set. Call this list the “candidate values”.

The problem then reduces to determining which one of the candidate values is set in a context that most closely matches the requested context. It is tempting to say that we can find the best match by simply choosing the candidate that matches the requested context along the most dimensions. However, this won’t quite work, as we need a mechanism for breaking ties, and we likely want certain dimensions to carry more weight than others. For example, we probably want to require the environment to match before we match on other dimensions.

Therefore, we need one additional piece of information in order to implement a matching algorithm. This additional information is a list of “search paths”, each of which is in turn a list of dimensions. For example, if your dimensions are “environment”, “application”, and “machine”, you may want to configure two search paths, “environment/application/machine” and “environment/machine/application”. The search paths allow us to break ties and they guide the structure of the search algorithm.

Next, we need to “expand” the search paths, by adding all “parent” paths — up to and including the root — of the specified paths. Once we have performed the expansion, the order of the dimensions in the paths becomes irrelevant (the order only matters for the purposes of performing the expansion), so we can remove duplicates. Using the above paths as an example, we end up with the following expanded set of search paths : “”, “environment”, “environment/application”, “environment/application/machine”, and “environment/machine” (note that we removed a duplicate “environment”, as well as “environment/machine/application”).

Now, associate each candidate value with a “candidate path”. This is the search path where every dimension specified in the candidate context is present in the path. If there is not a candidate path, the candidate value cannot be returned in any requested context (it is “unsearchable”).

Next, select the candidate values with a valid candidate path where every dimension location in the candidate context matches the corresponding dimension locations in the target context. These candidate values become “finalist values”.

If there are no finalists, there is no match, and GetValue should return null. Otherwise, GetValue should return the finalist value that has the most dimensions specified in its context. Ties are broken in favor of the value whose candidate path that was listed first.

Let’s work through an example. Utilizing the example search paths and dimensions used above, say we have specified the following values for the key “number”:

  • “one” in the default (empty) context. The candidate path is “”.
  • “two” in environment=prod. The candidate path is “environment”.
  • “three” in environment=dev. The candidate path is “environment”.
  • “four” in environment=dev/application=dow. The candidate path is “environment/application”.
  • “five” in environment=dev/machine=box2. The candidate path is “environment/machine”.
  • “six” in application=dow/machine=box2. There is no candidate path, and the value is unsearchable.

Now, let’s perform some searches.

  • A search in the default context returns “one”. It is the only finalist.
  • A search in environment=prod/application=dow returns “two”. “one” and “two” are both finalists, and “two” has more dimensions in its context.
  • A search in environment=dev/application=dow/machine=box2 returns “four”. “one”, “three”, “four”, and “five” are all finalists. “four” and “five” tie for having the most dimensions in their contexts, but since the search path “environment/application/machine” was listed before the search path “environment/machine/application”, “four” wins.
  • A search in environment=dev/application=app/machine=box2 returns “five”. The logic is similar to the previous search, but “four” is not a finalist because the requested context does not match “application=dow”. Hence, “five” wins.

This algorithm is a bit complicated and we still consider it a work in progress. But it provides a flexible approach to implementing the context-aware key-value store, which we believe to be the key to a quality firm-wide configuration solution.

A DevOps Checklist

Categories: Configuration, Deployment, Philosophy
Comments: 1

A DevOps ChecklistThis will sound hypocritical once you read this article, but if you couldn’t tell by now, we’re not fans of checklists here at DevOps on Windows. Checklists can easily mislead people into a false sense of security. “We’re following the checklist – we must be DevOps now! But why do we still have so many production issues?!”

As we’ve said before, to us DevOps isn’t about following some rote methodology, but about understanding the principles behind operations-friendly software and following best practices to move your processes forward.

But we also understand that sometimes a “checklist-ish-type-of-list” can be a helpful guide. With that in mind, here’s our “DevOps Checklist”!

This entire checklist can be boiled down to our first principle of DevOps: is your software simple to operate and easy to change?