Deployment with Docker and AWS Fargate

I’ve intended to write about our deployment stack with AWS Fargate for a while but kept putting it off. We’ve gotten tremendous value from Fargate and there’s a serious dearth of approachable material online. I think these are related: Fargate is scary to write about. It’s an abstraction over your entire deployment so there’s necessarily a lot of magic going on under the hood. The documentation is filled with buzzy words like orchestration and serverless and – as with all AWS docs – self-referential to an exponentially increasing number of other AWS docs for acronyms like ELB, EC2, VPC, and EBS. But without being experts we’ve managed to use Fargate to setup continuous, rolling deployment of multiple applications. These have been running for two months now without any downtime. So what follows is a beginner’s guide to Fargate, written by a beginner. Let’s start by establishing some background.

Deployment

Deploying is the process of getting the web applications you run locally during development running on the public internet (in this article, on AWS). This is harder than it sounds for a number of reasons.

  • Resources: When running locally you take for granted your computer’s resources like CPU, RAM, memory. These all have to be provisioned on some machine in the cloud. Traditionally this meant provisioning an EC2 instance.
  • Operating System: Again taken for granted locally, but your provisioned instance needs to have an operating system – usually some Linux distro. This OS needs to be compatible with the technologies your application is running
  • Publishing and running the code: you need to get your code onto the instance, either as the raw source or a compiled binary. Then you need to compile and run this application. And you want to seamlessly roll the new deploy over the old one, without any downtime. On top of all this you might have multiple applications you need to do this for.
  • Reliability: your production deployment needs to keep running indefinitely. If some intermittent error occurs that crashes one of your applications you need that process to restart automatically or you’ll have downtime.
  • Services: your application will almost certainly use some database like Postgres and maybe many others like Redis. These services need to be installed and run somewhere your instance can access them.
  • Networking: when running the code locally all of our processes are running on the same machine making communication trivial. This will not be the case in the cloud so we have to manage how they’ll talk to one another from different machines.
  • Security: a deployed application is accessible to the world. All of our processes’ endpoints and internal communication need to be secure.
  • Secrets: your applications will likely hold many API keys and tokens to authenticate with other services. This need to be available on each instance, but these are highly sensitive and so should not be transferred frequently or over insecure channels.

I’m sure there are many more that I’m missing but this is already a daunting list. Traditionally each of these steps involved configuring something in the AWS Console UI or CLI for each service. In addition to being a huge pain in the ass this is dangerous. This amounts to a huge amount of managed state. You have no easy way to track, let alone revert, changes made in the UI. There’s no way to test changes before making them. If you need to scale then you have to manually provision new machines, take offline the old ones, and do all the network and secrets configuration anew. Its almost impossible to do this without having some scheduled downtime.

Serverless Deployment

AWS Fargate uses a different paradigm called Serverless Deployment. This is a bit of a misnomer since plenty of servers are still involved. But what’s meant here is that no EC2 server instances are ever provisioned or configured manually. Instead you describe in code what infrastructure and configuration you want, pass this code to Fargate, and let AWS handle the provisioning and setup.

There are huge benefits to this arrangement. Because the configuration is now in code that lives in version control you can manage and audit changes through your normal PR review process. You can easily review and rollback any changes. You can setup tests to run in your CI to ensure nothing breaks.

More philosophically, we’ve switched from an imperative to a declarative paradigm. Instead of making a series of commands (imperatives) in the AWS Console that created a huge amount of state to manage, we’re now simply declaring once “This is the correct state of the world”. A new deployment (or a rolled back deployment) is as straightforward as declaring the old configuration.

The code that declares all of this lives in two places: one or more Dockerfiles and one or more Task Definitions.

Dockerfile

Docker is an incredible tool; there have been entire books written about it. The short version is that Docker enables containerization: packaging your source code along with the requirements to run it. You declare in code in a file called a Dockerfile a virtual environment for your application to run in (for example Linux with Python installed), any service dependencies like Postgres, and the steps to build and run your application. Any containerized application can be run simply with docker exec. This is an extremely powerful abstraction. This enables container orchestration tools like Kubernetes and Fargate to run and manage deployments of multiple applications without knowing anything about the internals of those apps.

Practically speaking, here’s what one of our Dockerfiles to deploy our Rust backend looks like:

FROM rustlang/rust:nightly-stretch
WORKDIR /usr/src/sheets/server
COPY Cargo.toml ./
COPY server ./server
RUN cargo build --release -p server
RUN cargo install --path server
EXPOSE 7000 8000
CMD ["/usr/local/cargo/bin/server"]

In order:

  1. Use a container image with the latest Rust nightly build installed. This includes an Ubuntu install and other basic dependencies.
  2. Setup a working directory in the container
  3. Copy in the needed source code to the container
  4. Compile the Rust code to a binary
  5. Install the Rust binary
  6. Expose two ports (one for Websockets and one for HTTPS)
  7. Run the Rust binary

Now Fargate can deploy this application. Further, other developers can run this application themselves without worry about installing anything on their machines or having mismatched dependencies.

Task Definitions

We use a task definition to define how Amazon Elastic Container Service (ECS) should run our Docker containers. This means defining most of the deployment steps from our original list not handled by the Dockerfile.

You can find plenty of templates in the official documentation and I’ve uploaded a redacted version of ours (we use the NestJS framework, hence the names). Most of it is boilerplate, but to highlight the interesting parts:

{
  "containerDefinitions": [
    {
      ...
      "portMappings": [
        {
          "hostPort": 80,
          "protocol": "tcp",
          "containerPort": 80
        }
      ],
      ...
      "environment": [
        {
          "name": "NEST_PORT",
          "value": "80"
        },
      ],
      ...
      "secrets": [
        {
          "name": "ASM_GOOGLE_SECRET",
          "valueFrom": "arn:aws:secretsmanager:us-west-1:12345:..."
        },
      ],
      "image": "12345.dkr.ecr.us-west-1.amazonaws.com/repo:latest",
      ...
    }
  ],
  "memory": "512",
  "cpu": "256",
  ...
}

In order, we are:

  1. Defining how to map our container ports
  2. Setting environment variables
  3. Setting up our secrets using Amazon Secret Manager
  4. Defining what container image to use (using Amazon Elastic Container Registry)
  5. Defining what resources the machine we deploy to should have (CPU, memory, etc.)

These are the steps that, in the old deployment methodology, we’d have to do manually each time we wanted to setup a new machine. We would need to manually provision an EC2 instance, setup the networking, and copy over the secrets and environment variables to that machine. Instead we declare all these steps in code and Fargate handles them for us.

Additional Benefits

This level of automation is hugely valuable on its own. But Fargate also gives us plenty of additional benefits “for free.”

Because Fargate entirely understands how to deploy machines we can configure Fargate to provision additional machines automatically as necessary. So if our site suddenly comes under tremendous load (say because of a press push) Fargate can automatically add new resources to handle the scale. This is an incredible feature for preventing downtime and slowness.

Fargate also does safe, rolling deployments. When we deploy new code there is no downtime; Fargate handles taking down the old version and only does so once the new deployment is running safety. If the new deployment fails because the health check is not responding then the old code will stay up, again preventing major downtime.

Conclusion

Our Fargate experience has been amazing. We’ve been doing continuous deployment, including adding new resources and services, without any downtime for months. Our code deploys every time a change merges; deploys take twelve minutes. We’ve been saved from downtime multiple times by the Fargate guard rails. We deploy with confidence, even right before important demos.

I fully recommend this deployment stack to anyone, even novice AWS users. Though the setup seems daunting you derive a huge amount of value from the effort. To get all of the features mentioned above you would need to put in a huge amount of manual effort. With this upfront cost we’re ready to scale easily for the foreseeable future.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s