Developing with an Immutable Infrastructure

What is Immutable Infrastructure?

Simply put, your hardware stack is created and maintained using the programming concept of immutability: once something is instantiated, it is immutable and does not change. If an update is needed (either from a scheduled upgrade or bug), a new instance is created to replace the existing one. Thus, once a component is launched it is never changed, only replaced.

Benefits of Immutable Infrastructure (II)

Speed – The Koddi dev team strives to continually move fast and push new features for our clients. We have multiple autoscaled web server stacks (development, internal, production, white label, etc.) that all get updated multiple times per week. Using traditional infrastructure methods to maintain all of these servers (package/security updates, code changes, distro/kernel upgrades, config files) and ensuring each server is in line with the others would cause a drastic slowdown in productivity. Using II, every server is provisioned using the same base, so everything always matches.

Reliability – Before a new feature or platform update is launched at Koddi, it must undergo multiple rounds of testing/QA. This includes both automated unit tests, as well as human QA in our internal environment. However, (rarely) bugs can make it into production. II allows us to quickly roll back our entire stack to a previous version that we know is free from error and debug the issue in a dev environment. With traditional architecture, changes would have to be undone on production and then hope the issue is fixed. Being able to quickly switch back to a previous version ensures that our clients experience as little impact as possible.

What we use

As we’ve discussed before on our blog, Koddi uses Amazon Web Services as our cloud provider. Our process for server deployment consists of two parts: packing our release and deploying new servers.

Packing

Once a new feature or update is ready to be released to our production environment, a tag is created from our GitHub master branch. We then use Packer from HashiCorp to create an Amazon Machine Image (AMI) that will be deployed. Packer does a few awesome things behind the scenes to simplify this process.

Spin up a temporary instance into our VPC
Provision a temporary security group that allows only our deployment server to connect to the temporary instance
Copy/Update any config files we use
Pull the given tag’s code from GitHub
Create an AMI from the temporary instance
Destroy the temporary instance and security group

One key benefit of a packed AMI is the significant reduction in the time it takes an instance to be ready when launched. Since the image has already made all the updates/code changes during packing, this isn’t required when the instance spins up. This is especially beneficial when scaling up an Auto Scaling Group (ASG). After the AMI is created, we can deploy the new image at any time.

Deployment

For deploying our environment, we implement a blue-green deployment using Ansible. Ansible is extremely customizable, and the Koddi use case is just one of the many possibilities. Below is our process:

Create a Launch Config using the AMI from the Packer build. (The Launch Config is a set of instructions that the auto-scaler uses when launching new instances to the load balancer.)
Create an Auto Scaling Group using the new Launch Config
Setup the ASG scaling policies
Attach the new ASG to our load balancer
Spin up the desired number of instances and wait for them to become healthy
Remove the existing ASG after the new instances are verified healthy

After all of our instances are deployed, we do some background cleanup like removing any LCs/ASGs/AMIs older than our last five deployments. Similar to Packer, Ansible allows us to automate the entire process from start to finish.

Next Steps

Now that we have our base II system in place, our plan is to continually make our platform faster and more fault tolerant. Here are some ideas for what we might do next:

Implement Terraform to allow creation/destruction of our entire infrastructure quickly.
Setup Chaos Monkey to test our disaster recovery process.
Use Jenkins to enable true “push-button” deployment

If you have questions about our process or have comments on how we could do it better, please reach out in the comments below. We’d love to hear from you!