Pythian Blog: Technical Track

A look at current "Infrastructure as Code" Trends

Unless you have been hiding under a rock, you have probably heard the term “Infrastructure as a Code” thrown around. Infrastructure as Code - largely made possible by the rise of Cloud Infrastructure - has revolutionized automation because it was the last missing piece of the puzzle. It filled gaps in existing system provisioning solutions which, prior to the emergence of the cloud, could not interact with “infrastructure” in programmable way. [caption id="attachment_103234" align="alignnone" width="2000"] Infrastructure was the last piece of the puzzle to achieve fully integrated end-to-end provisioning pipelines[/caption] To clarify, the term Infrastructure as Code has also been used to describe a comprehensive automation approach. In that sense, “infrastructure” refers to the entire stack that makes up an organization’s system infrastructure, which would include "hardware" and software components. I say “hardware” in quotation marks because in the cloud world, the term has largely been abstracted and replaced by the notion of “cloud resources”. My use of the term Infrastructure in this article refers specifically to “cloud resources”; in order to distinguish them from other components higher in the stack (as you will see later, we will break down “as-code” tool into four layers). Cloud resources refer to the cloud implementation of physical infrastructure. In AWS terms, for example, this would refer to resources such as EC2 Instances, ELB, Lambda functions, S3 buckets,...etc. When looking at current “as-code” trends, we can divide associated tools into four categories:
  1. Infrastructure as Code tools - for Infrastructure Orchestration.
  2. Configuration as Code tools - for Configuration Management.
  3. Containers as Code tools - for Applications Containerization.
  4. Pipeline as Code tools - for Continuous Integration and Delivery.
For each category, there are tools that achieve similar objectives but are not considered “as-code” tools. For example, we would like to distinguish between an ordinary CI/CD tool and Pipeline as Code tool (and similarly for the other categories) and to do so, we need to define the characteristics of an “as-code” tool:
  • Version-controlled: Version control is obviously the most important part of persisting infrastructure, configuration, containers and pipelines as code. The tool’s configuration files should be persistable in readable and version-controllable syntax. This disqualifies binary files such as large VM images.
  • Modularity: Ask any “coder” and they will tell you that the best code is a reusable one. In my view, for a tool to support for “modularity”, it must allow for modules to be parameterized and reused multiple times.
  • Instantiable / Deployable: The tool must be able to take code as input and “deploy” it, the deployed outcome will differ based on the category. For instance, it could be a pipeline, configured instance, a container or a change to cloud infrastructure (e.g. creation or deletion of cloud infrastructure). This disqualifies many tools that rely only partially on “scripts” and require that those scripts be manually uploaded or configured in some UI.
Each of the traits above comes with inherent benefits that is beyond the scope of this article but I will mention as examples: auditability, repeatability, efficiency and maintainability.

Trending Tools

This is by far not a comprehensive list, but based on my experience they seem to be the popular ones. Note that I disqualified Jenkins from Pipeline as Code tools because its design still relies on (and allows for) the concept of Plugins - which are external dependencies that cannot be version-controlled nor deployed alongside the pipeline code. However, others may consider Jenkins implementation of Pipeline as Code (i.e. Jenkinsfiles) a proper one, regardless, big steps are indeed being taken by Jenkins community to move into that direction.
Category Tools
Infrastructure as Code Terraform
CloudFormation
Ansible (limited)
Configuration as Code Chef
Puppet
Ansible
SaltStack
Containers as Code Docker, Kubernetes
Pipeline as Code Drone.io
ConcourseCI

Code Organization Patterns

The rapid increase of DevOps tools in the last few years has led to the emergence of a wide variety of innovative as-code patterns. In this part of the series, I will cover one of the trending patterns and will cover more patterns in the coming parts.

Pattern 1: Coupling Application with Infrastructure, Configuration and Pipeline (Decentralized)

In this pattern, you couple application source code with its infrastructure code, pipeline code and configuration code. This means you only have one and only one code repository to build, provision and deploy your entire stack. How cool is that? An example layout of your application code repository would look like:
 ROOT
  |- src/ # application source code
  |- ci / # pipeline code
  |- containers/ # containers code
  |- infrastructure/ # infrastructure code
  |- config/ # provisioning configuration scripts
  |- tools/ # scripts to facilitate deployment (mostly wrapper scripts)
This pattern could also be described as decentralized since there is no central repository for the organization infrastructure, configuration or pipelines. Instead, they are distributed among application repositories.

Technology Choices for this Pattern:

Infrastructure:
  • Terraform is great choice for this pattern since it supports bootstrapping your Cloud resources with a “provisioner”, a provisioner in this case would be your Configuration as Code. It can range from simple bash scripts to Chef roles. Note that for this approach to work, you would likely have to stick to non-centralized provisioning methods (see below) and therefore Chef roles are not applicable to this pattern.
  • CloudFormation (AWS-specific) can be made to work with this pattern by using AWS::CloudFormation::Init script to bootstrap EC2 instances. For other types of resources, you may use Lambda functions as CloudFormation resources (official term: "AWS Lambda-backed Custom Resources"), but will leave it to you determine if the complexity overhead is worth it. AWS::CloudFormation::Init also encourages coupling your infrastructure AND configuration code in the same code which I consider a dangerous anti-pattern as stated in the following point.
  • Ansible has limited capabilities when it comes to orchestrating infrastructure, of course you can always interact directly with your cloud provider APIs via Ansible uri module, but that would add immense complexity especially if your infrastructure needs grow more complex over time. It also becomes tempting to orchestrate infrastructure AND configure it in the same code ("playbook") but that’s a dangerous anti-pattern that is likely result in regrets over time when your configuration needs get more complex and the need for reusability arises. Decoupling infrastructure and configuration code, is as important as decoupling front-end and back-end code in the web development world.
Configuration: Configuration tools choices may be limited by your infrastructure tool. Terraform is amazingly flexible on how you can configure your cloud resources after creating them. Terraform gives you the ability to either locally (on deployer’s machine) or remotely (on the resource, i.e. instance) execute any script. Based on that I recommend the following tools for this pattern when using Terraform for infrastructure orchestration:
  • Bash scripts: Consider using bash scripts if you have really simple and minimal configuration.
  • Ansible Playbooks: Ansible Playbooks is my favorite for this pattern, you can describe the desired state of your cloud resources and trigger the playbook execution using local-exec provisioner (deployer machine need to have Ansible binaries locally).
  • Chef Solo: This approach is the reverse of Ansible Playbooks approach. Chef recipes will be uploaded using the file provisioner of Terraform followed by a remote-exec provisioner, which should execute a script that downloads Chef and execute the recipe on the target instance. This approach is more self-contained than Ansible Playbooks option, so no binaries are required on the deployer's machine, however, the recipe runs in isolation of other resources provisioners (consider a case where a provisioner need to read output of another provisioner).
When using CloudFormation for infrastructure orchestration, you are limited to the following approaches for Configuration:
  • AWS::CloudFormation::Init allows you to execute scripts and configure your EC2 instances, however, this is usually included as part of the same template as your EC2 instance and therefore it encourages coupling Infrastructure with Configuration code in the same template which is an anti-pattern as mentioned earlier.
  • AWS Lambda-backed Custom Resources: You can implement any custom logic in a Lambda function and have CloudFormation execute it for you. This logic could be a “Configuration” script that configures an EC2 instance, or Elasticsearch index or any other custom logic. But do consider the complexity of this route over the long run.
Pipelines: Both Concourse CI and Drone.io do a great job of implementing pipelines as code. In this pattern you have a great opportunity to integrate your Infrastructure, Configuration and Application deployment in one pipeline. Consider these advantages:
  • Coupling application releases with its configuration or infrastructure dependencies. For example, if the next release of the application requires newer JVM version. You can push a configuration code commit that installs newer JVM and deploy it in the same release along with the application changes that will require the newer JVM - both changes would be deployed in the same release since they are coupled together.
  • Ability to create ad-hoc environment as part of the application deployment pipeline: Consider this scenario, you want your pipeline to stress-test your application with the same load generated last Black Friday. You don’t want to test against an active QA or Staging environment. You don’t want to run a dedicated environment for load-testing which would sit idle most of the time. In this pattern, your infrastructure is a reusable module! You can write your pipeline code so that it instantiate this module (with production-like parameters) and use it to stress-load test. On success proceed to deploy or otherwise fail.

Advantages of Decentralized Pattern:

  • Simple.
  • Single repository for the entire stack.
  • Easier to implement an integrated end-to-end provisioning and deployment process.
  • Ability to couple application releases with infrastructure or configuration changes, therefore adding:
    • Ability to provision ad-hoc infrastructure for temporary use during deployment pipeline.
    • Ability to push application releases along with required infrastructure configuration changes (such as JVM example above).

Limitations of Decentralized Pattern:

  • Does not support shared infrastructure well; for example, infrastructure shared across separately applications (e.g., shared database).
  • Difficult to enforce separation of duties; for example, all collaborators will have access to application source code, infrastructure code, configuration,..etc.

Conclusion

This concludes part 1 of this series. We will cover more patterns in the coming parts and you will see that a single pattern may not always be an exact fit. In practice, organizations typically follow a hybrid pattern which works well when there are clear rules on shared responsibilities as well as discipline in following those rules.
Learn more about Pythian IT Infrastructure Services.

No Comments Yet

Let us know what you think

Subscribe by email