In the next series of blog posts we aim to build and deploy a production grade data stack on Amazon's Elastic Container Service. We aim to create a solution that offers a lean team the ability to build and maintain a data cluster with minimal overheads. Low touch infrastructure offers the lone data engineer (with an overly optimistic CTO ) a chance at a fair fight. We'll use AWS's Copilot to manage all the deployment tasks. We create a data cluster that contains an orchestrator to launch on demand tasks with the ability to update a data store. Our datastore of choice will be TimescaleDb (though we won't use it for this post). We'll use Dask to parallelize compute (on a separate post)
Data workflows are usually designed as independent self contained components (think deployable artifacts) that can be run independently or by composing a workflow out of several independent tasks. When workflows are composed of several independent sub-flows it gives a Directed Acyclic Graph (DAG). If we consider a simple gaming system that awards points to a user at the completion of various milestones (or competitions), a task to build leaderboards by competition would require a basic workflow like this.
While Apache Airflow, Luigi and Dagster have become industry standards, our stack will use the Prefect orchestrator. In our example we'll consider a very basic DAG and focus more on the deployment of our pipeline. In the process we make certain assumptions around familiarity with Prefect, while briefly introducing it.
Our primary choice for using Prefect is it's Hybrid model. It allows us to use is it's cloud orchestrator and UI while we can ensure our workloads run on our infrastructure. Prefects cloud agents manage all the communication between the cloud orchestrator. This allows us to focus on maintaining just the cloud agent and our flows.
Copilot is AWS's answer to kubectl for their ECS stack. Unless you've been tasked with blowing your recent series A on building custom deployment tooling, AWS Copilot is the answer and has become the new standard in deploying containerized workloads on ECS.
Our workflow involves the following steps
Creating a single container with all relevant dependencies for our project
Being able to run our tasks locally with Prefect's LocalRun
Being able to run individual Prefect Tasks on ECS using ECSRun
Being able to deploy Prefects ECS Agent that checks with the orchestrator and manages Task operations.
Being able to trigger runs using Prefect Cloud UI
Getting familiar with Copilot
While the Copilot documentation outlines all the operations beautifully we're going to build a basic playbook that'll work for any application. We'll go over deploying Web APIs with load balancers on a separate post. Here we focus on purely backend workloads that don't need to have a public endpoint.
Here we have a simple script that can be run as a shell script to get the installation working.
Our script here installs version v1.7.1 ( however, I'd recommend going with the latest version) and makes it into an executable. Running the following command should tell you if you have a working copilot installation.
$ copilot -h
Creating an IAM user and permissions for Copilot
Now, assuming you already have created an AWS account, we're going to
Create a new user called copilot.
We'll create a group called deployment and add our copilot user to that group.
We'll attach a few permissions to our deployment group, specifically
AmazonEC2FullAccess AmazonEC2ContainerRegistryFullAccess AmazonS3FullAccess AWSES_FullAccess AmazonSSMFullAccess AmazonECSTaskExecutionRolePolicy AWSCloudformationFullAccess SecretsManagerReadWrite AmazonVPCReadOnlyAccess AssumeAnyRolePolicy IAMRolePolicy
While IAM policies deal with a lot of intricacies we'll try to cover the bare essentials needed to understand why these permissions are needed. We live in a litigious society and I'm obligated to state that if you shoot yourself in the foot as a consequence of allocating these privileges, you're on your own. Needless to say, the AWS Access Key and Secret for the copilot user need to be carefully stored. The IAMRolePolicy allows your user, permission to pass a role to another IAM or TaskRole among other things.
The AssumeAnyRolePolicy let's you set up trust relationships. To explain the need for the sts:AssumeRole policy I'll refer to @bvtyjo's brilliant explanation on the copilot-cli issues page on github
… Copilot's credential story is structured around a central administration account and environment accounts. The application account needs broad permissions, as it is responsible for creating environments, vpcs, pipelines, etc, but access can be restricted at the environment level. To deploy to any given environment as a developer, all you need is to be able to call sts:AssumeRole on the environment's EnvManagerRole. This role is created by Copilot and contains all the permissions needed to deploy and operate Copilot services, with no write-level permissions on any environment resources. From <https://github.com/aws/copilot-cli/issues/1771>
Now we save the AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY on our machine as a new Profile. More details on the IAM roles for Tasks here
Our Prefect Application
If you're interested on how to get started with Prefect, stay tuned for an in-depth walk through.
Here we'll assume you have a working application that is structured a bit like this. The folder structure below shows a copilot folder, which we'll create from scratch. But before that, let's briefly look at the dockerfiles in our project. Our application structure looks like this
You'll notice we use two separate dockerfiles. We'll start with the EcsAgent_Dockerfile. The EcsAgent_Dockerfile is the dockerfile for the Prefect ECS agent that communicates with the Prefect Cloud orchestrator and triggers tasks for us. While using Prefect's cloud backend our workflow includes 2 steps:
Launching Prefect's ECS Agent
Registering our tasks with the Agent.
The Dockerfile is the one we use to register our tasks with the Prefect ECS Agent EcsAgent_Dockerfile
For the most part our EcsAgent_Dockerfile is not special. It uses poetry for dependency management and sets up a virtual environment. The critical bit here is that we instruct prefect to ignore creating a virtualenv. This is essential when Prefect executes a specific tasks on a new containers. We run our jobs as local scripts while ensuring all the module dependencies needed to run those scripts are available.
In order for the task to find all the necessary dependencies we install it without a virtualenv.
Launching Prefect's ECS Agent
Our script does a few things before launching the ecs agent. It switches the backend to Prefect Cloud and passes the ECS Cluster name, launch type and the environment name as labels.
Registering our tasks
The Dockerfile for this looks identical except the final CMD it triggers to start the container. The launch-register_tasks.sh shell script triggers a set of tasks that
Set the Prefect backend to use the cloud backend
Run database migrations or setups on our data stores
Register the tasks.
We execute this using Copilots taskRun which runs a single task and exits. The container launched for this task is discarded. Now that we've identified all the moving pieces we'll begin our deployment. Notice the set of environment variables starting with COPILOT_ that we use in our launch-register_tasks.sh script.
Once we setup our environment with copilot we intend to have access to all these environment variables.
Why run the same container
The goal is to use a single image which contains all the necessary dependencies to run every flow we need. The downside to this having to redeploy the latest container and re-registering the newer tasks. However, nothing stops us from running more than one agent. This allows you the freedom to avoid disrupting tasks that don't need alteration. Having the right combination of labels between the agents and the tasks you deploy is one way to work around it. We also ensure that every container that our Prefect agent uses to execute a flow also loads the same container image. This simplifies versioning for a small team maintaining a lot of flows. The cognitive overhead is restricted to staying aware of the latest container image and the version of code it runs to know if your flow would execute without trouble.
Setting up the application
$ copilot init
This requests for an app name and requires you to answer a few questions about the project. The app name is the project name. For example if our project is a card game called Synapse then we'll call the project synapse. Our project can have multiple services
1. Frontend Service 2. Backend Service 3. Background Tasks Service
We would also require multiple environments to deploy these services.
1. test 2. staging 3. production (prod)
The set of questions as described in the copilot documentation would be
1. What would you like to name your application” - an application is a collection of services. In this example we’ll only have one service in our app, but if you wanted to have a multi-service app, Copilot makes that easy. Let’s call this app example-app.
2. “Which service type best represents your service's architecture?” - Copilot is asking us what we want our service to do - do we want it to service traffic? Do we want it to be a private backend service? For us, we want our app to be accessible from the web, so let's hit enter and select Load Balanced Web Service.
3. “What do you want to name this Load Balanced Web Service?” - now what should we call our service in our app? Be as creative as you want - but I recommend naming this service front-end.
4. “Which Dockerfile would you like to use for front-end?” - go ahead and choose the default Dockerfile here. This is the service that Copilot will build and deploy for you.
We'll choose to not deploy the service during the initialization but instead initialize our environments first. This creates the VPCs and ECS clusters that we'll need for the subsequent steps. We also indicate that our service is a backend service which is reflected in the manifest.yml file generated. More on the manifest.yml later.
$ copilot env init --name staging
Naming the Service
The type of service we'd choose is a backend service and the dockerfile would be the EcsAgent_Dockerfile. The name for the service we'd use would be flow. This is a limitation with the current prefect ECS Agent where it expects a service named flow and is detailed out in this thread on the Prefect Issues list (check the Conclusions). You'll notice the copilot folder in your project with a manifest.yml file. The manifest.yml file contains all the details required to deploy a single service. We'll now add a few environment variables for our service using a yaml file. Our yaml file will look like this
The prefect cloud API key would be available on the Prefect Cloud Settings tab of your account.
Secrets to be injected into your environment
We now upload our secrets by running
copilot secret init --cli-input-yaml copilot/flow/your_yaml_file.yml
This loads all the variables into AWS's Parameter Store and is injected into each container when loaded. Once our secrets have been uploaded update the manifest.yml with the secrets.
Deploying our ECS Agent Service
Once we have everything in place we call
$ copilot svc deploy -e staging -n flow
And let copilot do it's thing. If everything goes well you should see your ECS Agent show up on the prefect dashboard UI. Now, that's half the job done.
Configuring our tasks to use ECSRun
Now we want to continue using the LocalRun for all our work locally (if any) and use the ECSRun for all other tasks. The ECSRun config takes a few parameters
When we register our services we want it to be registered with the latest task definition version available. We'll also need the necessary container configurations to specify the size and memory of the container. We simply fetch the associated task_role_arn and execution_role_arns associated with the task definition. To do this we create a small module and use the get_run_config method to
create the run_config for each task.
Now our tasks would use the get_run_config method to specify the run configurations.
Registering our flows
Now that we have all the pieces ready to be deployed we add a startup script to register all our scripts in one go. We could also register tasks on a "when required" basis but we'll leave that for a separate post
Now we trigger the launch-register_tasks.sh shell script and we have our data setup ready to use.