The Biggest Challenge for Infrastructure as Code
What do you do when you come across a piece of open source software that you’d like to try? You could download its source code tarball, extract the files, build and install it following the rules and conventions for a given programming language (./configure && make && make install, ruby setup.rb build, python setup.py install, perl Makefile.PL) - and you end up with a usable product. This simple fact is at the very core of entire open source ecosystem - without an easy and reliable way to transform source code into runnable software, open source potentially would not even exist.
I think that the biggest challenge for Infrastructure as Code today is its current lack of anything resembling a Makefile - a relatively simple description of how input could be transformed into output ready for use end to end, given a set of basic tools or a preset build environment (for example, for a project written in C it would be apt-get install build-essential on Debian and its derivatives). If you want an example, please take a look at deployment instructions for openstack/nova ("on the cloud controller, do this... on volume node, do that..."). While it is indeed infrastructure code, its end-to-end build and deployment instructions are provided in textual form, not as code.
Why is it a problem you may ask. First and foremost, build/deploy instructions provided in textual form can’t be easily consumed by a machine - it feels like we are back in the dark ages, without APIs where all work must be performed manually.
Secondly, because they are not fully formalized, they can’t be as easily shared - there could be many uncaptured context requirements that could lead to different people transforming identical inputs to outputs that would not function identically. And if they are not shared, same functionality is being worked on by many separate teams at the same time, which leads to incompatible, sometimes competing implementations and creates wasted effort by not allowing code reuse.
Thirdly, since they are not code, they are not as easy to test and verify test coverage for, or to fork and merge, or to port to other platforms.
My point is that while individual parts or steps of an infrastructure deployment could be automated, a whole thing rarely is, especially when a system is to be deployed to multiple hosts connected over the network. This would be similar to a software project with various directories, each with its own Makefile but without a top-level Makefile - such that you’d have to follow a HOWTO telling you which arguments to pass to make in each directory and in which order to run the commands.
What to do? I call on all infrastructure projects to make every attempt to ship deployment instructions not as textual step-by-step howto documents, but as code - be it Chef cookbooks, Puppet recipes, shell scripts, Fabric/Capistrano scripts and so on, or a combination of any of the above. Please consider providing cloud images (in at least one region of at least one public cloud) with your canonical build environment (your equivalent of build-essential). Please consider including canonical network topologies for your deployment - since you can't predict IP addresses each user is going to allocate, all configuration files will need to be autogenerated or built from templates.
I am well aware it’s easier said than done, but if we do this, I hope a tentative consensus on best practices for infrastructure as code deployments could emerge over time which could then facilitate creation of a common “infrastructure make” tool.
Categories: devops | infrastructure-development |