Infrastructure-as-Code (IaC) is an approach to automation of provisioning, configuration and deployment of infrastructure resources that is based on machine-readable files. These files are usually configuration files given as input to some software agent that processes them and executes specific tasks that aim to provision, configure and deploy the user-defined infrastructure. The configuration files that drive infrastructure automation can be considered as infrastructural software in all respects.
The main advantage of using IaC is repeatability: once the right process is codified, it can be repeated as many times as desired in exactly the same way.
Infrastructure-as-Code tools can be classified according to the following aspects:
Main purpose of the tool, where each purpose reflects one of the main DevOps activities: Provisioning, Configuration, Deployment, Orchestration.
Programming methodology, that can be either Declarative or Imperative. Declarative programming describes the desired end state for the system, whereas imperative programming explicitly lists all the steps to reach that end state.
The tool architecture, which can be client/server or client-only. Client/server tools need an agent running on the target system which receives the target configuration for that system from a central server, whereas client-only tools remotely access each target system to configure it.
Resilience to failures, that is related to the ability of the tool to recover from an incomplete/unsuccessful instantiation/realization/achievement of the target infrastructure. If part of the executed task fails, the state of the target system may be brought back to the initial state, like in atomic transactional systems, or it may be left in a partially configured state, or it may even be left in an unknown state. In this context, the atomicity property becomes relevant. A tool guarantees an atomic infrastructure instantiation when modifications that are to be done to different systems, they are done to either all or none of those systems, in a single operation. For tools which do not ensure an atomic behaviour, another property becomes important: idempotence.
In the following the main IaC tools are reviewed according to the coordinates listed above, starting from their purpose:
Provisioning – Infrastructure provisioning tools help in automating the basic lifecycle steps of infrastructure resources: create, update, and delete. These provisioning steps usually target virtual resources, either on premises or in the cloud, such as Virtual Machines (VMs), but can also target physical resources by using suitably flexible hardware platforms such as HPE Synergy. The main tools available for the provisioning task are: Terraform, AWS CloudFormation, and Openstack Heat (others are: Azure Resource Manager templates, Google Cloud Deployment Manager configurations).
Most of the tools mentioned above focus on a specific cloud platform, therefore lacking interoperability and portability. A notable exception is Terraform, which defines providers as plugins. Terraform can execute provisioning operations with most of the major cloud providers and virtualization platforms and can be extended to support a new provider just by defining the related plugin. Openstack Heat offers some compatibility with CloudFormation and can accept many CloudFormation templates in JSON format. Terraform automatically manages the infrastructure state and, whenever a change in the configuration is made, it applies the related updates to the corresponding part of the infrastructure, managing all the resources' dependencies. The infrastructure configuration and its state are described by text files, which can be saved into code versioning systems, therefore fully implementing the Infrastructure as Code paradigm: the infrastructure is created and managed as if it were a software code artifact. Configuration files can be written in JSON format, or in a proprietary language named Hashicorp Configuration Language (HCL). While parsing the configuration files a dependency graph is built, allowing to manage dependencies automatically.
All the evaluated tools (Terraform, CloudFormation and Heat) follow a declarative approach to provisioning infrastructure. The developer writes a template describing the desired infrastructure target state, the tools validates the template for possible syntax errors, and then creates an execution plan that is reviewed by the developer (CloudFormation calls it a "Change Set"). If the plan is correct, the developer can then execute it to make the infrastructure reach the target state. If something fails during execution, CloudFormation and Heat should roll back to the initial state, whereas Terraform just records the incomplete state, with the possibility to apply again the template to possibly reach the target state incrementally. If the initial configuration is modified after its application, Terraform is able to operate that specific modification only, without impacting the rest of the infrastructure resources.
Configuration – By infrastructure configuration we mean the process that enables to create and update a software environment on existing servers according to a given set of requirements. This means for example installing software packages, then configuring and starting them, but also configuring networks. Several tools are available to automate this process, for example: Chef, Puppet, SaltStack, Ansible, CFEngine. All the listed tools but Ansible have a client/server architecture, i.e. they need the installation of agents on the target system. SaltStack, Puppet and CFEngine are mainly based on a declarative methodology, i.e. they describe the target state of a system, whereas Chef and Ansible can be classified as imperative tools. With respect to resilience to failures, none of the listed tools can be identified as transactional, only SaltStack provides support for manual rollback in case of failures by integrating with Snapper and taking a snapshot of the Linux filesystem before and after every modification. The declarative tools (Puppet and CFEngine) claim that they do not need rollback support since their change management model is based on a concept of convergence to a known end-state and their execution is by default idempotent. Ansible code also can be written to be idempotent, but only if the developer follows specific guidelines.
Deployment – This topic is a higher-level one: it requires a provisioned and configured infrastructure to finally deploy a released version of an application into production. DevOps and Agile practitioners usually speak about Continuous Delivery and Continuous Deployment, as the last steps after Continuous Integration (CI). Some available tools are specific for deploying containers-based applications (Rancher, Containership), others are more generic (e.g. Apache Brooklyn, Spinnaker, Alien4Cloud, Cloudify) and can deploy both traditional and container-based applications. Some tools are based on the TOSCA open standard and provide a GUI to graphically define applications’ topology (Cloudify, Alien4Cloud), others rely on a Domain Specific Language (DSL) using JSON (Spinnaker) or YAML+Java (Brooklyn). All the available Deployment tools support multiple cloud providers, either directly (Spinnaker) or through specific plugins (Alien4Cloud, Cloudify); Brooklyn uses the Apache jclouds multi-cloud toolkit. Most tools also support deployment on existing (HW or VM) nodes, a feature usually indicated as Bring Your Own Node (BYON). Some deployment tools (Brooklyn, Cloudify) also support post-deployment orchestration, by setting up monitoring to detect potential issues and remediate them.
Orchestration – At the top level of the application deployment and runtime lifecycle management reside cloud application orchestrators, that have a role to run full workflows of low-level operations like provisioning of resources, configuring and installing components, connecting components to apply dependences, or tear down individual components. Orchestrators can work with any of the resource types – compute, networking, storage, services and more. Note that some of the Orchestration tools also have other capabilities, such as Provisioning or Deployment, which is why they been already mentioned in the related section above. Some of the orchestrators are platform- or technology-dependent (e.g., OpenStack Heat, Kubernetes), while others are in principle platform agnostic (e.g., Apache Brooklyn, Alien4Cloud, Cloudify, ARIA TOSCA, OpenTOSCA). The most used and advanced orchestration platform today is Kubernetes, an open-source platform for running and managing containerized applications. Its orchestration applies to computing, networking, and storage infrastructure. Kubernetes is available as an open source, or as part of the Red Hat OpenShift platform, or as one of the many Kubernetes-as-a-service offerings from many cloud providers (Amazon Eks, Google GKE, Azure AKS, and others). Kubernetes deploys containers in groups called Pods on a cluster usually composed by one master and one or more worker nodes. Kubernetes can be configured through its "declarative" API (it allows to specify the target state of all the involved objects) or through YAML configuration files executed by a command line tool (kubectl).