Previously, we discussed the importance of centralized configuration management and proposed the context-aware key-value store as a means of implementing it. We have outlined the basic API for such a system, but never discussed an implementation. We propose to do so in this article.
Before diving in, let’s quickly review the API and requirements. A traditional key-value store has the following interface:
string GetValue(string key)
void SetValue(string key, string value)
There is also a use case for deleting values, but we can ignore that for the purposes of this discussion. In a standard key-value store, there is a single value for a given key, and looking up the value is straight forward. In the context-aware variant, the value can be different depending on the context. This implies a slightly more complex API:
string GetValue(string key, Context context)
void SetValue(string key, Context context, string value)
The context can be implemented as a string-to-string dictionary where the key is the “dimension” (environment, user, etc) and the value is the context’s “location” on that dimension (prod or dev for “environment”, for example). The SetValue function records an explicit value for a key in a specific context, but the GetValue function is free to return values that are not explicitly set in the requested context – it may return a value that is explicitly set in a context that the provided context “inherits” from. Herein lies the power of this approach for centralized configuration – it allows us to set common properties once, while retaining the flexibility to override values in very specific situations if necessary.
But how does the GetValue function figure out what the value should be if it is no longer a simple lookup by key?
We know that the GetValue function will either return null (in the event that no appropriate value can be found) or one of the values that has been explicitly set for the requested key (it would make no sense to return a value that was set for a different key). So, we start the lookup algorithm by fetching all the values set for the requested key, regardless of the context in which those values are explicitly set. Call this list the “candidate values”.
The problem then reduces to determining which one of the candidate values is set in a context that most closely matches the requested context. It is tempting to say that we can find the best match by simply choosing the candidate that matches the requested context along the most dimensions. However, this won’t quite work, as we need a mechanism for breaking ties, and we likely want certain dimensions to carry more weight than others. For example, we probably want to require the environment to match before we match on other dimensions.
Therefore, we need one additional piece of information in order to implement a matching algorithm. This additional information is a list of “search paths”, each of which is in turn a list of dimensions. For example, if your dimensions are “environment”, “application”, and “machine”, you may want to configure two search paths, “environment/application/machine” and “environment/machine/application”. The search paths allow us to break ties and they guide the structure of the search algorithm.
Next, we need to “expand” the search paths, by adding all “parent” paths — up to and including the root — of the specified paths. Once we have performed the expansion, the order of the dimensions in the paths becomes irrelevant (the order only matters for the purposes of performing the expansion), so we can remove duplicates. Using the above paths as an example, we end up with the following expanded set of search paths : “”, “environment”, “environment/application”, “environment/application/machine”, and “environment/machine” (note that we removed a duplicate “environment”, as well as “environment/machine/application”).
Now, associate each candidate value with a “candidate path”. This is the search path where every dimension specified in the candidate context is present in the path. If there is not a candidate path, the candidate value cannot be returned in any requested context (it is “unsearchable”).
Next, select the candidate values with a valid candidate path where every dimension location in the candidate context matches the corresponding dimension locations in the target context. These candidate values become “finalist values”.
If there are no finalists, there is no match, and GetValue should return null. Otherwise, GetValue should return the finalist value that has the most dimensions specified in its context. Ties are broken in favor of the value whose candidate path that was listed first.
Let’s work through an example. Utilizing the example search paths and dimensions used above, say we have specified the following values for the key “number”:
- “one” in the default (empty) context. The candidate path is “”.
- “two” in environment=prod. The candidate path is “environment”.
- “three” in environment=dev. The candidate path is “environment”.
- “four” in environment=dev/application=dow. The candidate path is “environment/application”.
- “five” in environment=dev/machine=box2. The candidate path is “environment/machine”.
- “six” in application=dow/machine=box2. There is no candidate path, and the value is unsearchable.
Now, let’s perform some searches.
- A search in the default context returns “one”. It is the only finalist.
- A search in environment=prod/application=dow returns “two”. “one” and “two” are both finalists, and “two” has more dimensions in its context.
- A search in environment=dev/application=dow/machine=box2 returns “four”. “one”, “three”, “four”, and “five” are all finalists. “four” and “five” tie for having the most dimensions in their contexts, but since the search path “environment/application/machine” was listed before the search path “environment/machine/application”, “four” wins.
- A search in environment=dev/application=app/machine=box2 returns “five”. The logic is similar to the previous search, but “four” is not a finalist because the requested context does not match “application=dow”. Hence, “five” wins.
This algorithm is a bit complicated and we still consider it a work in progress. But it provides a flexible approach to implementing the context-aware key-value store, which we believe to be the key to a quality firm-wide configuration solution.