This repository contains the Terraform code to deploy an Elastic Kubernetes Cluster in Amazon Web Services (AWS).
Terraform backend state is stored in an S3 bucket with state locking enabled via a DynamoDB table. This setup ensures that multiple people can work with the pipeline and deploy simultaneously without causing conflicts by locking the state during operations like plan and apply.
Outputs from this infrastructure pipeline are persisted into the Terraform backend state so they can be read by he application pipeline (xyz_app_poc) that uses this infrastructure.
The deployment is currently configured to deploy to a staging environment first (from the stage
branch) and then to a production environment (from the main
branch). Other environments can be added.
Environment type | Short name | Branch | Notes |
---|---|---|---|
Production | prod | main | Default branch |
Staging | stage | stage | The only source of pull requests into prod |
Development | dev | (varies) | Created by developers for pull requests into stage. They use their own AWS accounts as environments. |
This code provisions several AWS resources, including:
- Virtual Private Cloud (VPC)
- 3 Public Subnets
- 3 Private Subnets
- NAT Gateway
- Internet Gateway
- Elastic Kubernetes Service (EKS) Cluster
- CloudWatch log group for the EKS Cluster
- EKS managed node group, backed by an EC2 Auto Scaling group
min_size = 1
andmax_size = 5
- this can be adjusted in main.yml
- Security Group for the cluster nodes
- Subnets and nodes are distributed across 3 availability zones
Cost
- The rough estimate of the cost to run this infrastructure is about $140/month (details here) per environment.
- This cost will be affected much traffic is served, number of nodes, or whether you require Kubernetes extended support.
The pipeline automation here currently only supports stage
and prod
environments.
The AWS Regions for the each of these should be different. You configure the Regions to use in stage.tfvars
and main.tfvars
respectively.
Developers create their own dev
branches to work in (using their own accounts as environments) and then create pull requests from these into stage
Three GitHub actions workflows are defined:
- Terraform (
terraform.yml
) initializes and provisions the infrastructure defined in the configuration files for thestage
andprod
environments. - Enforce Flow (
enforce-flow.yml
) enforces the order in which changes flow through the environments in the pipeline. Currently with only the two environments, it enforces thatprod
will only accept merges fromstage
. It also checks that all tests in the previous environment have passed before accepting the merge. - Environment Tests (
ci.yml
) runs tests applicable to every environment. The workflow exists as a placeholder to add future tests. Note: Stage Tests and Prod Tests workflows can similarly be created for tests that only apply to a specific environment.
The following GitHub rules are enforced on branches stage
(staging) and main
(production)
- Require approvals: Requires at least one code review with resolution before merging.
- Require a pull request before merging: Require all commits be made to a non-target branch and submitted via a pull request before they can be merged.
- Require status checks to pass:
- Checks Enforce Flow to ensure pull requests are only accepted from the designated previous environment in the pipeline, after all tests have passed on it.
- Checks Environment Tests to ensure all environment tests pass on the current environment.
The following GitHub environments for deployment are defined: stage
, prod
- Clone the repository.
- Create an IAM User in the AWS account you wish to use, with the necessary credentials (see below). Enter
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
values for that IAM User in GitHub secrets (Settings ➡ Secrets and variables ➡ Actions ➡ New repository secret) - Create an S3 bucket and DynamoDB table in a Region in this same account to use for the backend store for Terraform state
- For the DynamoDB table use
LockID
(type String) as the partition key. - Update the
bucket
name,dynamodb_table
name, and AWSregion
values inproviders.tf
under the Remote backend setting.
- For the DynamoDB table use
- [optional] Update
stage.tfvars
andmain.tfvars
to set theaws_region
you want to use for each. These must be different than each other. - The staging environment infrastructure will automatically build when you merge a pull request into branch
stage
. - The production environment infrastructure will automatically build when you merge a pull request from branch
stage
into branchmain
.
-
Add other environment stages to the pipeline, such as
test
-
Environments deploy to separate AWS accounts instead of separate Regions in the same account
- This provides better fault isolation between environments
- This maintains closer parity between environments since availability of services and instance types can vary with Region
- To implement this would require specifying AWS credentials for each account used in GitHub Secrets, and then adding logic to the
terraform.yml
GitHub action to select the appropriate credentials based on environment type.
-
Improved AWS authentication
- AWS recommends using short lived credentials such as an IAM Role instead of IAM User (long lived credentials)
aws-actions/configure-aws-credentials
can use IAM Roles if you setup a GitHub OIDC provider in the AWS account.
-
Provide a sample IAM Policy for AWS credentials that follows the principle of least-privilege. The AWS credentials used here require the following permissions:
- EC2 actions: Required for creating and managing VPCs, subnets, NAT gateways, and security groups.
- EKS actions: Needed for creating and managing the EKS cluster and node groups.
- IAM actions: Needed for creating and managing IAM roles and attaching policies (like the EKS cluster role).
- Auto Scaling actions: Required for managing EKS-managed node groups since they use auto-scaling groups.
- STS AssumeRole: Allows the EKS service to assume the IAM role for the cluster.
-
Improve how Terraform is used to deploy multiple environments
- In this current implementation we make use of multiple
.tfvars
files and Terraform workspaces. - However there exists built-for-purpose systems like Terragrunt for this.
- In this current implementation we make use of multiple
-
Supply an AWS CloudFormation template to create S3 bucket and DynamoDB table for Terraform backend state