This section outlines the Terraform modules and configurations necessary to deploy and test a scalable, highly available, and cost-effective infrastructure on AWS. By leveraging Infrastructure as Code (IaC), we automate the provisioning of key resources such as VPCs, subnets, EC2 instances, an Application Load Balancer (ALB), an RDS PostgreSQL database, security groups, and an S3 bucket for static content hosting. This approach ensures consistent, repeatable deployments while optimizing for security, availability, and cost efficiency.
- Task 2: Automated Infrastructure with Terraform
- Table of Contents
- Modules Overview
- Environment Setup
- How to Deploy, Update, and Destroy the Infrastructure
- Testing Plan for Infrastructure Setup
- 1. Ensuring EC2 Instances Can Serve Web Traffic and Scale Correctly
- 2. Verifying that the ALB Distributes Traffic Across Instances
- 3. Checking that the RDS Instance Is Accessible from EC2 but Not from the Internet
- 4. Ensuring Static Assets Are Correctly Stored and Accessible from the S3 Bucket
- 6. Running TFsec to Check Security Compliance in Our Code
- 5. Troubleshooting Tips
- Purpose: Creates a VPC, along with public and private subnets, routing tables, an Internet Gateway, and a NAT Gateway.
- Module
- Purpose: Launches an EC2 Auto Scaling Group that scales based on CPU utilization. The instances are placed in the public subnets and serve traffic via an ALB.
- Important: We configured a small script in the user_data so that the instances would launch an Nginx server, allowing us to evaluate load balancing through the hostname and expose the /health endpoint to use in the target group health checks.
- Module
- Purpose: Sets up an ALB to balance incoming HTTP/HTTPS traffic across the EC2 instances.
- Module
- Purpose: Provisions a highly available RDS instance (PostgreSQL) in private subnets with a Multi-AZ setup.
- Module
- Purpose: Creates an S3 bucket for storing static assets or backups.
- Module
- Purpose: Provides IAM roles and policies with the least privilege principle. EC2 instances can access S3 and CloudWatch, and other AWS resources as needed.
- Module
- Purpose: Configures security groups to control access between different resources like ALB, EC2, and RDS.
- Module
- Docker: Make sure Docker is installed.
- Docker Compose: Ensure Docker Compose is installed.
- Make: Make is required to execute all commands in this project.
- AWS IAM User with Required Permissions: Ensure the IAM user running Terraform has the necessary permissions, as outlined in the IAM policy section.
To follow the principle of least privilege, the user that runs Terraform should only have the permissions necessary to create, manage, and destroy the resources in your AWS infrastructure.
You can create an IAM policy with the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:*",
"iam:PassRole",
"iam:GetRole",
"iam:CreateRole",
"iam:AttachRolePolicy",
"iam:DetachRolePolicy",
"iam:DeleteRole",
"rds:*",
"s3:*",
"elasticloadbalancing:*",
"autoscaling:*",
"cloudwatch:*",
"logs:*",
"vpc:*",
"route53:*"
],
"Resource": "*"
}
]
}
To ensure Terraform manages your infrastructure correctly, you must define certain environment variables in a .env file. This file will be used by Docker Compose to load all necessary environment variables, ensuring proper functionality of the automation.
#Set env
ENV=stg
#AWS config vars
AWS_ACCESS_KEY_ID=xxxxx
AWS_SECRET_ACCESS_KEY=xxxxx
AWS_DEFAULT_REGION=us-east-1
#Tf vars
TF_VAR_rds_db_username=xxxxx
TF_VAR_rds_db_password=xxxxx
TF_VAR_local_public_ip=x.x.x.x/32
TF_VAR_ssh_public_key="ssh-rsa xxxxx"
#Test vars
NUM_REQUESTS=10000
NUM_CONCURRENT=50
NUM_CURL_REQUESTS=20
- ENV: Specifies the environment (
stg
for staging orprod
for production). - AWS_ACCESS_KEY_ID: Your AWS access key ID.
- AWS_SECRET_ACCESS_KEY: Your AWS secret access key.
- AWS_DEFAULT_REGION: The AWS region where the infrastructure will be deployed (e.g.,
us-east-1
). - TF_VAR_local_public_ip: Your local machine's public IP address, used for whitelisting in security groups.
- TF_VAR_ssh_public_key: The SSH public key used to access EC2 instances.
- TF_VAR_rds_db_username: Username for the RDS database administrator.
- TF_VAR_rds_db_password: Password for the RDS database administrator.
- NUM_REQUESTS: Number of requests sent to the ALB.
- NUM_CONCURRENT: Number of concurrent workers sending requests to the ALB.
- NUM_CURL_REQUESTS: Number of requests used to verify load balancing.
To simplify the infrastructure creation process with Terraform, i created a small automation using Docker. This setup allows for easy management of environments while securely maintaining AWS credentials, SSH keys, DB user&pass and personal IP addresses.
The tf-code/$ENV
folder contains the Terraform configurations for the $ENV
environment. If we start with the stg
environment, then to create a production environment, simply create a new folder named tf-code/prod
and copy the configurations from the stg
folder. Modify variables or resource names as needed to distinguish between environments.
Set the environment variable according to the environment you want to work in, and remember to have created the folder (tf-code/stg
or tf-code/prod
):
make init
Run the following command to check plan of the infrastructure:
make plan
Run the following command to deploy the infrastructure:
make apply
Terraform will show you the planned infrastructure changes and proceed with the deployment.
If you need to update the infrastructure (e.g., change the number of instances in the ASG or modify security group rules or delete a resource):
Edit the relevant .tf
files in the tf-code/stg
or tf-code/prod
folder.
Once the changes are made, run the following command to update the infrastructure:
make apply
Terraform will detect the changes and apply them.
If you need to tear down the entire environment:
CAUTION! Terraform will destroy the infrastructure after executing the destroy command.
make destroy
To create another environment (e.g., production), simply duplicate the tf-code/stg
folder and rename it to tf-code/prod
.
cp -r tf-code/stg tf-code/prod
Modify variables in tf-code/prod/
as needed (e.g., instance types, resource names, etc.) to distinguish between environments.
Modify ENV
variables in .env
as needed (e.g., stg, prod, etc.) to distinguish between environments.
Follow the deployment steps (make init
and make apply
).
Objective: Confirm that the EC2 instances behind the Auto Scaling Group (ASG) can serve web traffic and scale based on the load.
For this challenge, I built a small script where I use Apache Benchmark (ab) to perform a basic load test on the EC2 instances and then validate that the load balancing is working correctly.
- Setup Test Suit:
- Set the environment variables in
.env
- NUM_REQUEST: Number of requests to send to the ALB
- NUM_CONCURRENT: Concurrency of the requests
- NUM_CURL_REQUESTS: Number of curl requests to validate that the load balancing is functioning correctly.
- Set the environment variables in
- Simulate Load to Test Auto Scalin:
- Run test script
make test
- Monitor the Auto Scaling Group (ASG) in the AWS Console or use CloudWatch to check if instances are scaling up/down based on CPU load.
- Web traffic should be served by the EC2 instances via the ALB.
- The number of EC2 instances should scale based on the CPU utilization as defined in the ASG policy.
- Use CloudWatch metrics to observe scaling activities.
- Check the EC2 Instances in the ASG to ensure they are increasing or decreasing as the load fluctuates.
Objective: Ensure the ALB is correctly distributing incoming traffic across the available EC2 instances.
- Test Load Distribution:
- Using the script output from the previous step, observe whether the requests are hitting different EC2 instances by checking the Hostname in the output.
- Requests should be evenly distributed across the EC2 instances.
- Logs should indicate that different instances are receiving requests as the ALB distributes traffic.
- Check the output of the test script to ensure that each EC2 instance hostname is different, confirming traffic distribution.
- Use the AWS Console or CloudWatch to verify that traffic is being routed to multiple targets within the ALB Target Group.
Objective: Validate that the RDS instance can be accessed only from the EC2 instances in the private subnet and that no direct access from the public internet is possible.
-
Test Access from EC2:
- SSH into one of the EC2 instances.
- Use a database client psql for PostgreSQL to connect to the RDS instance using the private endpoint.
Example for PostgreSQL:
psql -h <RDS-Endpoint> -U <db-username> -d <db-name>
-
Test Access from the Internet:
- From a local machine or another instance not in the private subnet, try to access the RDS endpoint. This should be blocked by the security group that denies public access.
Example (this should fail):
psql -h <RDS-Endpoint> -U <db-username> -d <db-name>
- EC2 instances should be able to access the RDS instance.
- Any attempt to connect to the RDS instance from the public internet should be denied.
- Ensure successful connections to the RDS instance from EC2 instances.
- Verify failed connections from any external or public machine, ensuring the RDS instance is secure and not exposed to the internet.
Objective: Verify that static assets are correctly stored in the S3 bucket and accessible publicly in read-only mode.
-
Upload Static Assets:
- Use the AWS CLI or the S3 Console to upload a few static files (e.g., images, HTML files) to the S3 bucket.
Example CLI command:
aws s3 cp ./my-static-file.jpg s3://<bucket-name>/my-static-file.jpg
-
Test Public Access:
- Access the files publicly via the S3 URL to ensure they are accessible in read-only mode.
Example:
curl https://<bucket-name>.s3.amazonaws.com/my-static-file.jpg
-
Test Denied Actions:
- Attempt to delete or upload files using a public access URL. This should be denied since the bucket only allows read access.
- Static assets should be accessible publicly via the S3 URL in read-only mode.
- Any attempt to modify (upload/delete) objects via public access should be denied.
- Ensure that files can be read from the S3 bucket publicly.
- Test actions such as file deletion or modification from a public endpoint, which should be denied.
Tfsec is a tool that allows us to validate the security measures in our Infrastructure as Code (IaC). I have implemented a small test in our automation pipeline to check the security status of our code.
To run the test, simply execute the following command:
make sec
This will output the security status of our code and highlight any potential improvements we should make to ensure continuous compliance with best security practices.
- Communication issue with ALB: You should verify that the health check of the ASG is functioning correctly. To do this, check the routing rules, security groups, etc.
- Instances unavailable or invalid: You can try changing the instance type you are using from EC2 or RDS to another one. Otherwise, you may need to wait for AWS to increase your quota before trying again.
- Permission errors when creating a resource: Check the role and permissions assigned to the Terraform user, and gradually escalate permissions until you find the one needed.
- Invalid SSH access to instances: Ensure that your SSH key is correct and has been added to your computer.
- If you want to destroy the S3 bucket, you must ensure that it doesn't have any stored or versioned content. In case it does contain content, you must first set the variable
s3_force_destroy=true
and apply the change before proceeding to destroy it.