Architecting a Code Deployment System
Once developers have written code for a site, they need to place it on the web servers. That process is called code deployment. It can include code that fixes bugs, adds new features, or upgrades the underlying platform.
Some examples of Continuous Deployment Tools For Software Deployment are:
- AWS CodeDeploy
- Octopus Deploy
Q: What exactly do we mean by a code-deployment system? Are we talking about building, testing, and shipping code?
A: We want to design a system that takes code, builds it into a binary (an opaque blob of data—the compiled code), and deploys the result globally in an efficient and scalable way. We don't need to worry about testing code; let's assume that's already covered.
Possible steps in designing the system:
1. Gathering System Requirements
We're building a system that involves repeatedly (in the order of thousands of times per day) building and deploying code to hundreds of thousands of machines spread out across 5-10 regions around the world.
Building code will take up to 15 minutes, it'll result in a binary file of up to 10GB, and we want to have the entire deployment process (building and deploying code to our target machines) take at most 30 minutes.
Each build will need a clear end-state (SUCCESS or FAILURE), and though we care about availability (2 to 3 nines), we don't need to optimize too much on this dimension.
2. Coming Up With A Plan
It seems like this system can actually very simply be divided into two clear subsystems:
- Build System that builds code into binaries
- Deployment System that deploys binaries to our machines across the world
3. General Overview
From a high-level perspective, we can call the process of building code into a binary a job, and we can design our build system as a queue of jobs. Jobs get added to the queue, and each job has a commit identifier (the commit SHA) for what version of the code it should build and the name of the artifact that will be created (the name of the resulting binary). Since we're agnostic to the type of the code being built, we can assume that all languages are handled automatically here.
We can have a pool of servers (workers) that are going to handle all of these jobs. Each worker will repeatedly take jobs off the queue (in a FIFO manner—no prioritization for now), build the relevant binaries (again, we're assuming that the actual implementation details of building code are given to us), and write the resulting binaries to blob storage (Google Cloud Storage or S3 for instance). Blob storage makes sense here, because binaries are literally blobs of data.
4. Job Queue
Implement the queue using a SQL database? (since in-memory queue would be problematic in handling state)
5. SQL Job Queue
We can have a jobs table in our SQL database where every record in the database represents a job, and we can use record-creation timestamps as the queue's ordering mechanism.
Our table will be:
- id: string, the ID of the job, autogenerated
- created_at: timestamp
- commit_sha: string
- name: string, the pointer to the job's eventual binary in blob storage
- status: string, QUEUED, RUNNING , SUCCEEDED, FAILED
ACID transactions will make it safe for potentially hundreds of workers to grab jobs off the queue without unintentionally running the same job twice (we'll avoid race conditions). Our actual transaction will look like this:
BEGIN TRANSACTION; SELECT * FROM jobs_table WHERE status = 'QUEUED' ORDER BY created_at ASC LIMIT 1; // if there's none, we ROLLBACK; UPDATE jobs_table SET status = 'RUNNING' WHERE id = id from previous query; COMMIT;
All of the workers will be running this transaction every so often to dequeue the next job; let's say every 5 seconds. If we arbitrarily assume that we'll have 100 workers sharing the same queue, we'll have 100/5 = 20 reads per second, which is very easy to handle for a SQL database.
7. Lost Jobs
The transaction that this auxiliary service will perform will look something like this:
UPDATE jobs_table SET status = 'QUEUED' WHERE status = 'RUNNING' AND last_heartbeat < NOW() - 10 minutes;
8. Scale Estimation
We previously arbitrarily assumed that we would have 100 workers, which made our SQL-database queue able to handle the expected load. We should try to estimate if this number of workers is actually realistic.
With some back-of-the-envelope math, we can see that, since a build can take up to 15 minutes, a single worker can run 4 jobs per hour, or ~100 (96) jobs per day. Given thousands of builds per day (say, 5000-10000), this means that we would need 50-100 workers (5000 / 100). So our arbitrary figure was accurate.
Even if the builds aren't uniformly spread out (in other words, they peak during work hours), our system scales horizontally very easily. We can automatically add or remove workers whenever the load warrants it. We can also scale our system vertically by making our workers more powerful, thereby reducing the build time.
When a worker completes a build, it can store the binary in GCS or S3 before updating the relevant row in the jobs table. This will ensure that a binary has been persisted before its relevant job is marked as SUCCEEDED .
Since we're going to be deploying our binaries to machines spread across the world, it'll likely make sense to have regional storage rather than just a single global blob store.
We can design our system based on regional clusters around the world (in our 5-10 global regions). Each region can have a blob store (a regional GCS or S3 bucket). Once a worker successfully stores a binary in our main blob store, the worker is released and can run another job, while the main blob store performs some asynchronous replication to store the binary in all of the regional GCS or S3 buckets. Given 5-10 regions and 10GB files, this step should take no more than 5-10 minutes, bringing our total build-and-deploy duration so far to roughly 20-25 minutes (15 minutes for a build and 5-10 minutes for global replication of the binary).
10. Replication-Status Service
We can have a global service that continuously checks all regional GCS or S3 buckets and aggregates the replication status for successful builds (in other words, checks that a given binary in the main blob store has been replicated across all regions). Once a binary has been replicated across all regions, this service updates a separate SQL database with rows containing the name of a binary and a replication status . Once a binary has a "complete" replication status , it's officially deployable.
11. Blob Distribution
Since we're going to deploy 10 GBs to hundreds of thousands of machines, even with our regional clusters, having each machine download a 10GB file one after the other from a regional blob store is going to be extremely slow. A peer-to-peer-network approach will be much faster and will allow us to hit our 30-minute time frame for deployments. All of our regional clusters will behave as peer-to-peer networks.
Let's describe what happens when an engineer presses a button on some internal UI that says "Deploy build/binary B1 to every machine globally". This is the action that triggers the binary downloads on all the regional peer-to-peer networks.
To simplify this process and to support having multiple builds getting deployed concurrently, we can design this in a goal-state oriented manner.
The goal-state will be the desired build version at any point in time and will look something like: "current_build: B1 ", and this can be stored in some dynamic configuration service (a key-value store like Etcd or ZooKeeper ). We'll have a global goal-state as well as regional goal-states.
Each regional cluster will have a K-V store that holds configuration for that cluster about what builds should be running on that cluster, and we'll also have a global K-V store.
When an engineer clicks the "Deploy build/binary B1" button, our global K-V store's build_version will get updated. Regional K-V stores will be continuously polling the global K-V store (say, every 10 seconds) for updates to the build_version and will update themselves accordingly.
Machines in the clusters/regions will be polling the relevant regional K-V store, and when the build_version changes, they'll try to fetch that build from the P2P network and run the binary.
13. System Diagram