All posts in Philosophy

Simple Steps to Operations Team SuccessWhen you work in software operations supporting software applications, there are plenty of practices and tools you can use to be more efficient at your job: keep all of your scripts and code in source control (just like developers do!), maintain proactive monitoring, work closely with your development team on operational requirements, read DevOps on Windows (ahem.), and much more.

But even before all of that, what’s one task operators have to do frequently? Make changes to the production environment. And if you take away all the fancy tools and processes around implementing and managing these changes, there are three simple, non-technical steps you can do to help ensure successful, and hopefully panic free change implementation.

1. Plan what you’re changing upfront

This may sound obvious, but doing something as simple as putting your change steps/plan down in writing can be a huge help. Let’s say you’re responsible for moving some application from one server to another. You’ve done it a hundred times in production or otherwise, but next time before you do it, try writing it down. Your document doesn’t have to be some big bureaucratic process or huge, highly polished document. It can be as simple as a listing out of what you’re going to do to implement a given change. And it doesn’t have to be the process document for now until the end of time for all cases, just what you plan to do this time.

Of course ideally a task like moving an application isn’t a terribly complicated, multi-step, manual procedure. But either way, writing your plan down in a document (like on a wiki, some internal document management system, an issue tracking ticket, or whatever works for you) is the perfect way to organize and solidify your thoughts on the process. And it has two great side effects – it can help eliminate a little bit more of your organization’s bus factor, and it can help create or add to an internal knowledge-base if your plan document is in a location that is accessible by the whole operations team.

2. Plan for rollback

Bad ThingsTM happen, so always have a rollback plan ready ahead of time. In my experience, very few changes are completely irreversible. (And if and when changes are irreversible, you better damn well make sure that’s crystal clear to your team and all the stakeholders!) It’s only to your benefit to think about and document a rollback plan. In the heat of the moment, when something goes wrong during a change implementation, it’s nice to be able to refer to a document or procedure that clearly states your path out of the mess and at least brings the system back to the state it was in prior to the change.

And if you took the time to plan up front to begin with, odds are your rollback plan is basically already written – just reverse the steps in your original plan. Again, it doesn’t need to be a highly polished rollback procedure for all cases until the end of time, just a rollback plan for the task at hand. Of course not every change is trivial enough for a quick document, but hopefully you get the idea how planning out the change up front can guide a rollback plan.

3. Validate what you changed

In test-driven development, a common (nay required?) practice is when a production bug is discovered, the developer first creates a unit test to reproduce the issue. Of course, the test will fail against the current production version’s code base because the issue is still in the code. Then once the bug has been fixed, the developer has greater confidence that the issue is truly resolved because their unit test validated the change for them.

In software operations you can apply a similar concept to your system changes. How can you know your change was successful if you can’t validate it? Do you just hope it worked? Do you wait for a user to call and report the issue again? By asking questions of your teammates and/or system owners/subject matter experts, no matter how stupid the questions sounds you can increase your own confidence and the quality of your work simply by validating your changes.

 

Again, these steps may sounds like common sense, but a little bit of reflection on what you’re actually doing can go a long way towards increased quality when it comes to making changes to a software system.

And as an operator, always remember – your number one goal is the stability of the production environment which ensures your organization can keep doing what it needs to do, and you accomplish this goal not by blocking change to the environment, but by enabling change to the environment.

Mrowr Central Configuration Meow Mew!Meow meow mew?

Mrowr mew meow burrurr mreow DevOps – cross-team collaboration mreow mew mew mew maintainable software. Meow mew not more automation meow mur – browr mew configuration meow mrowr meow mur.

Meow mew mmmmmew meow config files burrowr “single source of truth” mreow. Mrowr browrrrr central configuration service mreow meow mew, mur, browr. Meow meow meow murrr configuration data murr meow not XML file.

Meow central configuration service mew mrowr burr key-value store mmmew. Murrrr key-value store, mew mmmew. Meow mmmew mur production, mmmew mur development; mmmew meow UI mur mmmew meow server. Mreow meow mew mew meow “context” meow. Mmmew, mmmew, mmmew meow mew. Browr browr meow browr.

Meow, meow browr key-value store. Meow client API mew meow “context” mrowr meow query meow service. Meowwww “context” meow mmmur environment, meow application name, meow “instance” meow application, meow user name, meow machine name, brrowrowr mreow architecture. Browr mew mreow meow client API mew mew mur mew local file-based backup configuration data mreow browr meow mmmmur fall back mreow meow mreow meow central service mew unavailable. Browr mew mreow meow application code — meow mew mur not provide “default” value meow meow configuration data – meow mur mur configuration burrr meow from service. Meow moew in code mew mew meow violates DRY, brrrrowr mew undermines meow single source of truth.

Mew mew meow mainstream DevOps mew “configuration as code” – mrrowr mew misguided. Configuration mew not mewww operational concern mreow. Configuration mew mew meow mew first-class citizen, meow configuration management browr mew burrr managing your data, meow miaw not suite of automation scripts. Meeeow configuration right meow moew requires application to change. Mrowr mew DevOps — burrurrs meow, meow mur MEW!

(This was a guest post by the DevOps on Kittens’ Chief Cat, Azrael)

Azrael Azrael

50s-robot-2-740945-m

There are all kinds of software development checklists and principles out there. Some are better than others. My personal favorites are the SOLID principles and The Atwood System of Real Ultimate Programming Power. Here at DevOps On Windows, we’ve tried our hand at such checklists before. However, most software checklists (including our own) tend to focus on specific “how-to” coding or operational guidelines. Don’t repeat yourself, depend on abstractions, deploy frequently, use centralized configuration. These are all great practices to follow, but they are also just a means to an end. The end, of course, is creating software that is easy to operate. At this site, we have always maintained that “doing DevOps” well, and by extension, creating quality software, is not something that you can accomplish by following rote guidelines and checklists. If it were, robots could do our job. As we should know (because it is our job to create them), robots lack the ability to creatively apply best practices and first principles to craft solutions to new problems. In order to defeat the robots, we need to take advantage of our ability to do so. The first step is identifying and understanding the first principles of quality software. Therefore, we present our Robot-Defeating First Principles Of Quality Software:

Clarity

Quality software must be clear and understandable. Strive to write code that you will be able to understand a year from now after you have forgotten all the details. Alternatively, try to write code that won’t inspire another developer to curse your name years from now when a bug is found or a new feature needs to be added. It is far more important (in most cases) that your code be clear in its intent than it be “clever” or even that it perform optimally. However, software is more than just code. Quality software must also be clear to system operators and end-users. Software should be a “white box” – errors should be clearly displayed, raw data should be accessible to allow double-checking of results, functionality should be easily discoverable, behaviors should be logical and understandable.

Consistency

Quality software must be consistent. The days of monolithic software solutions are over. Service-oriented architectures and cloud-based services rule the day, because they allow us to create simple, independent, specialized components that work together to form a cohesive whole. In order to support such an architecture, the myriad software components that make up your system must behave in a consistent way. They should all use the same deployment, configuration, and monitoring mechanisms. They should communicate using a standardized messaging layer that is solid, battle-tested, and flexible. User interfaces should use common controls with a consistent look and feel. Your software should be consistent from the perspective of developers (who use consistent APIs to build applications), operators (who use the same set of tools to configure, deploy, and monitor applications), and end-users (who are presented with one common experience when interacting with your system). Consistency is king.

Simplicity

Finally, and perhaps most importantly, quality software needs to be simple. To paraphrase Einstein, software should be as simple as possible, but no simpler. This is a tall order, because the world is a complex place, and it can be very challenging to create a simple system to model such complexity. Yet that is what we must strive for. Don’t write code that is a simple reflection of your requirements. Look for abstractions that allow you to create a simple model from which your required use cases can flow cleanly. Avoid “special cases” and “hard-coded” solutions. When done right, the simple solution is often the most flexible solution, and will allow your software to grow and change with your business. Creating a simple “mental model” for the way the software works is critical to making it understandable for developers, operators, and users alike.

No system of principles can ever capture the full picture of what “quality software” really means. However, striving to create software solutions that are clear, simple, and consistent is a good place to start. Remember also that the clarity, simplicity, and consistency must be evident from all perspectives — developers, operators, and users must all see these properties. Equipped with these principles and our creativity, we have a fighting chance of defeating the robots.

Application Lifecycle Management - Build a Mesh Not a MonolithWhen evaluating solutions to automate and streamline your application lifecycle management process, your goal should be to build a mesh, not a monolith. By this we mean, rather than implementing a large, monolithic, enterprise workflow automation type of solution, your solution should focus on smaller, loosely coupled system components that don’t require each other to function, yet build on each other to create a streamlined workflow in the end. Even though our site’s focus is on Windows, this may sound oddly familiar – it’s basically the Unix philosophy applied to your application lifecycle management process.

You need a process for getting your applications from source code to deployable artifacts. The process needs to be frictionless enough to not get in your development team’s way, but it needs to ensure operational requirements are met like audit trails and build reproducibility.

We hope you’ll notice as we step through this process that none of the components necessarily require one another. Each component can function without the others, and the loss or outage of one component does not prevent software releases from reaching production.

Source Control

This may sound obvious, but first and foremost, anything created by your developers and operations teams must be in source control. Centralized source control of course gives you backups and version history, and it serves as your source of truth about all software changes (who, when, what).

Build System

Many developers use an IDE like Visual Studio for interactive builds, but to automate your build process, including automation of compilation, version stamping, testing, code coverage/analysis, etc. you need a build system. Having a solid build system makes it easy to standardize build automation for most, if not all of your projects.

Continuous Integration

Next, whenever there’s more than one developer on a project (or even with just one…) you’ll need a continuous integration (CI) process. There are many products available, but this process polls your source control repository for changes to source code for a particular project, and then executes your build system (the exact same way your developers would!) and produces build artifacts including code analysis reports and binaries.

The build binaries should be dropped in an accessible location (i.e. a file share) that is only writeable by the account that runs the CI system. This ensures that binaries produced by the CI system are consistent with the source code that produced them. Of course limiting access to your build servers is important as well.

Given adequate build server capacity and an easy enough process for adding new projects to your CI system, your operations team can then use it as a way to enforce process: applications bound for production machines must be produced by the continuous integration system.

Release Branching Service

Once you’re following a branch-per-release branching scheme with standard branch naming conventions and practices, you can build additional automation to glue together your branching policy and your continuous integration system. This depends a lot on your choice of a continuous integration system, as some systems are more friendly to customization and automation than others.

You can create integrations and automation that allow developers to easily create a release branch from a given build (i.e. source control revision number) and have that release branch created as a new build project in your CI system so it begins “being watched” and gets built as changes are committed.

Release Management

Finally, your development team needs to communicate the desire to release a particular application and version to the operations team. There’s a lot of information that needs to be noted (application name, version number, location of binaries, etc.). Of course as you advance towards continuous deployment, this process itself may be automated away, but in the meantime…

Using a lightweight release tracking tool, you can give developers a means of notifying the operations team of their intent to release, including all the relevant details, and you have an opportunity to automate the staging of binaries from the location the continuous integration system drops them to a production staging area or to your deployment system. This production staging area can be another file share that is only writeable by the process running your staging process.

Depending on the amount of customization possible with your continuous integration system you can even build integrations with your release management tool, since your continuous integration system knows most of the information needed in the release request already.

Hopefully you see how each of these pieces build on each other, but don’t necessarily require each other, nor does the end goal – a production application release – require them. You don’t need a build system to compile software, nor do you need a continuous integration system to package your software, nor do you need a release branching system to create your release branches & release builds, and you certainly don’t need a release management tool to keep track of your releases and stage them for deployment. You can start from a completely manual process and slowly build up simple components to automate stages of the process, making life a little bit simpler as you go, while retaining the know-how to plug the gaps manually in the event that one of the components becomes unavailable in the future.

By integrating these automation components components together, building a mesh instead of a monolith, you can create a fairly automated and streamlined application lifecycle management process that is frictionless enough for developers to get things done, but gives operations the information they need to perform the deployment, and you have auditing and reliability every step of the way.