Recently I was given the opportunity to get involved with the building of the development and production AWS environments for a greenfield project. As someone with extremely little previous experience in Ops or AWS (the first time I tried hosting a website with AWS I created an S3 Bucket and spent over an hour trying to work out how to SSH into it), this was a somewhat daunting prospect. Luckily, I was given a great deal of encouragement and support, and had some great open-source tools at my disposal.

A Case for TDD

The very idea of Test Driven Development made me and my peers shudder at University. Testing itself was almost glazed over — we were told testing is important, but taught very little about how to actually do it. As a result, the suggestion of employing TDD in a team project at University was usually taken as a sarcastic joke.

Of course in the real world, testing cannot be glazed over. It is one of, if not the, most important part of the software development process. I found this out the hard way while training for my first development role.

Thankfully a rather pragmatic colleague showed me the Light Side of TDD while we were pair-programming on a migration project. This, for me, was a fantastic use case for TDD – when you're thrown into a huge piece of software and need to get to grips with its workings as quickly and thoroughly as possible. We taught ourselves how the huge codebase worked (or at least the part we were concerned with) in just a few days, by writing unit tests and playing around until the tests eventually passed. We were then able to continue working on the project in this way, and by the end already had a suite of unit tests ready, so didn't have to spend several days writing them from scratch. From this one project alone, I realised how wrong I'd previously been about TDD, and told myself that I'd use it wherever possible in future.

TDD for Infrastructure

The case I presented just now was for migrating part of a massive, pre-existing codebase. However, the project this post is about was in a very early stage infrastructure-wise when I started getting involved. There was no existing infrastructure or resources, with the exception of the client's management network. To add to this, I had no solid skills or prior exposure to this type of development, so this is a very different use case.

That's where AWSpec comes in.

AWSpec is a framework based on RSpec that allows developers to test AWS resource configurations using English-like Ruby statements. For example, to test an EC2 instance:

It's extremely intuitive and comes with a long, well-documented long of supported resource types and attributes. It also supports the loading of values from YAML files via Ruby's standard YAML library, allowing you to abstract the configuration specifics away from the testing functions, aiding readability and maintenance of your ever-evolving infrastructure.

For me, AWSpec came into it's own when I realised I could use this configuration abstraction to build iterative test functions — by wrapping the test block above in a function and calling it repeatedly from another function that iterates through a list of resources. For example:

This setup allows the tester to define the desired configuration of their AWS resources purely within a YAML file of arbitrary complexity, so that others can understand exactly how the resources will fit together within the infrastructure without raking through Ruby code that they may not fully understand. It also saves a remarkable amount of time when it comes to maintenance, even for something as simple as a change of naming convention.

By far the biggest advantage of this approach, though, is the ability to use the same testing code for testing multiple distinct environments.

I didn't fancy copy and pasting individual resource tests for each of the four environments that needed to be built, so I built an environment "switch" using an environment variable and a second YAML file defining paths to different YAML configurations:

Another feature in AWSpec that helps to get a new Ops engineer off the ground quickly is the generate command, which enables you to generate AWSpec for resources that already exist within a VPC you own:

$ awspec generate ec2 vpc-ab123cde >> spec/ec2_spec.rb

I wouldn't recommend relying on this command for generating your actual test code, but it's useful for providing concrete examples of syntax and the resource attributes that are available.

So you can see just how simply and with few lines of (very readable) code you can write tests for your AWS infrastructure. This lends itself particularly well to Test Driven Development, as it's very fast to get the tests set up and easy to bolster or change the tests as you move forwards.

To use Test Driven Development efficiently, however, you need a way of iterating, but how does one iterate infrastructure without starting to dream about the AWS console at night?

Terraform

"HashiCorp Terraform enables you to safely and predictably create, change, and improve infrastructure. It is an open source tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned." — terraform.io

Put simply, Terraform allows you to write infrastructure as code. Specifically, you're able to define resources using Hashicorp's designed-for-Terraform HCL language, and then Terraform will generate a plan showing you exactly what changes to your infrastructure it's going to make. You can then take the time to review the plan so that there aren't any nasty surprises later, and confirm whether or not you want Terraform to go ahead and apply the plan.

This makes creation of resources highly repeatable — perfect for iterative development.

Terraform also stores information about the state of your infrastructure in a .tfstate file, which can be stored either locally or remotely, so that if you want to add new resources to existing infrastructure, make small changes to a resource (such as adding tags) or completely tear down your infrastructure, it knows exactly which resources it needs to create, modify or destroy.

This is extremely useful, particularly to someone who doesn't know exactly what they're doing. It helped me learn exactly how AWS resources are configured and how they relate to each other, and I was able to use trial and error to see what effects certain attribute values have on resources. Some more reasons to use Terraform include:

  • It can be used for a wide range of infrastructure-as-a-service providers such as AWS, Google Cloud, Azure and many more, making it much simpler to handle cloud migration than doing it by hand or using IaaS brand-specific tools (such as Amazon's CloudFormation)
  • It can teach you the basic principles of infrastructure management
  • Your infrastructure can be version controlled, and even deployed and tested automatically.
  • It's really fast. You're not going to be able to deploy even a single resource through the AWS management console in the time it'll take Terraform to bring up a fairly complex environment. This can come in handy if you're not using auto-scaling groups and need to bring up a new resource quickly in the event of a resource becoming unhealthy (this is unfortunately still a thing)
  • The code can be made accessible to the wider development team, so they can refer to it and understand exactly how their code is being deployed
  • It's actually pretty fun, at least in my opinion. There's something very satisfying about being able to define infrastructure as code and then build, edit or destroy it with a keystroke.
    • Resources can be defined as general-purpose modules, so that you can re-use configurations across several projects.

Things start getting a bit more complicated when you start defining and referencing general purpose modules that all of your projects need to be able to use. While the use of modules is beneficial in the long term, it can make it difficult to work out where certain values are coming from in the finished resources. For example, you couldn't hand a developer access to a Terraform repository and expect them to understand the infrastructure it's building without them making a considerable effort and flitting between three-to-four files at once. Terraform is pretty intuitive but it's also extremely powerful, so the structure can be complex.

That's where AWSpec comes back in, allowing you to quickly test the resources Terraform built and locate missing resources and malconfigured attributes quickly and efficiently. With that done you're able to head back to your Terraform code, edit it, and allow Terraform to make the necessary changes to the infrastructure. Because Terraform only changes what it needs to, this should be pretty quick, so you'll be able to re-test it in no time at all.

With this approach, you're in complete control of the size of the feedback loop. You could re-run tests every time you make a small change, or add your tests as a job in a CICD pipeline (such as Jenkins) so that the infrastructure only gets tested when updates are made to your project's applications. Plus you're given the added benefit of having a human-readable YAML file that defines all the resources and details of your infrastructure that you care about, that you can give to the rest of the team on-request.

Some Handy Tips

Hopefully this post has given you some food for thought and you'd considering using Terraform and AWSpec to build and test your next project's infrastructure. Before you jump straight into it though, I've written a set of hot tips that'll make your life a little easier going forwards.

  1. Read up on Terraform's interpolation syntax before you do anything.

    In an ideal world, developers would read a tool's entire set of documentation before using it (and with Terraform you should, it's very well written). This – at least in my limited experience — isn't the case, as its often tempting to dive straight in. However when learning Terraform, developers should at a minimum read the section about interpolation syntax. It's pretty powerful and there's some very handy functionality hidden in there.

    Don't be like me, who discovered the cidrsubnet interpolation only after spending a day tearing my hair out while learning about multi-tier subnetting.

  2. If you're going to use Terraform, use it for everything.

    Terraform manages your infrastructure for you. If you add or delete resources manually, Terraform won't keep track of them. So when you're in a hurry to tear down your test infrastructure at the end of the work day and it just won't finish because there's an Elasticache cluster — that you've added manually and forgotten about — keeping one of your subnets open, that's on you.

  3. Keep your tests simple to begin with.

    AWSpec tests are only as complicated as you make them. For example, the most basic test AWSpec can perform on a resource is existence. While you're going to want to test more attributes later, it may be worth considering starting by testing fewer attributes and more resources. Or don't. It's up to you.

Keep this advice in mind and you'll be creating complex and robust environments in no time.