Welcome to the first exercise of our ETL Testing Framework tutorial! In this exercise, you'll set up the necessary AWS infrastructure and local environment to begin working with the ETL testing framework.
Before you begin, make sure you have the following:
- A GitHub account with access to the repository and GitHub Codespaces enabled.
- Basic understanding of Git, Terraform, and AWS services.
[Skip if you are in the tutorial!]
- An AWS account. Temporary users will be provided for the Tutorial.
- IAM role with administrative privileges or specific permissions for S3, IAM, Lambda, DynamoDB, CodeBuild, and CodePipeline.
During this tutorial, each attendee will be provided with temporary AWS credentials (Access Key ID and Secret Access Key) that can be used to set up and deploy the required infrastructure. These credentials are strictly temporary and will be removed immediately after the tutorial.
To get your AWS account for this tutorial, follow these steps:
-
Navigate to the following AWS provide user URL
-
This will provide you with temporary AWS credentials, including:
- AWS Access Key ID
- AWS Secret Access Key
- AWS Session Token
-
Make sure to copy or download these credentials and keep them secure. You will need them for the next steps.
- Use the Access Key ID, Secret Access Key, and Session Token provided by the URL when configuring your AWS CLI in the following steps.
- If you prefer to use your own AWS account, you are welcome to do so. In this case, please make sure to create an access token key for your user and have it at hand. Also, take note of your AWS username. username will be the
<owner>
on thesetup_infrastructure.sh
, see section 3.
- Go to the main repository on GitHub.
- Launch a Codespace:
- Once you have forked the repository, navigate to your fork.
- Click on the "Code" button, "Codespaces" tab, then select "Create codespace on master" to create a new Codespace.
- The Codespace will automatically set up your environment based on the repository's configuration (e.g.,
.devcontainer
). Important: this process can take a few minutes, be patient: - You will know the CodeSpace is ready when you can see:
To set up the necessary AWS infrastructure for the ETL testing framework, use the setup_infrastructure.sh
script. This script will automate parts of the setup process, including configuring backends, generating necessary Terraform variable files, and packaging Lambda functions.
- Configures Terraform Backends: Runs the
configure_backend.sh
script to set up Terraform backend configurations based on your user details. - Generates Terraform Variable Files: Executes the
generate_tfvars.sh
script to create.tfvars
files with the appropriate parameters for your environment. - Packages Lambda Functions: Calls the
package_lambdas.sh
script to package Lambda functions and prepare them for deployment.
- Navigate to the
scripts
Directory and Execute Setup:- First, ensure you are in the root directory of your repository:
Replace
<owner>
with your provided AWS user account (i.e. conference-user-x).
- First, ensure you are in the root directory of your repository:
Replace
cd scripts && ./setup_infrastructure.sh <owner>
- AWS Configuration:
- During the script execution, you may be prompted to configure your AWS credentials. If so, enter your AWS access key, secret access key, default region name, and default output format. This is typically handled using the
aws configure
command, but you don't need to execute it because it is already executed in thesetup_infrastructure.sh
script. - When prompted, enter:
- AWS Access Key ID: Your AWS access key ID.
- AWS Secret Access Key: Your AWS secret access key.
-
At the end of the execution, ensure you write yes when prompted to continue with the script. This will create a commit that will create a fork of the repository in your GitHub account.
- During the script execution, you may be prompted to configure your AWS credentials. If so, enter your AWS access key, secret access key, default region name, and default output format. This is typically handled using the
In this section, you will initialize and apply Terraform configurations for different purposes:
- iac/backend: Infrastructure for the Terraform state
- iac/cicd: Infrastructure related to CI/CD pipelines
- iac/etl: Infrastructure related to ETL processes
Before deploying the CI/CD and ETL infrastructures, you need to set up the backend infrastructure where Terraform will store its state remotely in AWS using an S3 bucket and a DynamoDB table.
-
Navigate to the Backend Terraform Configuration:
- First, navigate to the
backend
folder withiniac
- This folder contains the Terraform configuration files necessary to set up the S3 bucket and DynamoDB table that will store your Terraform state.
cd ../iac/backend
- First, navigate to the
-
Review the Terraform Configuration:
- The
main.tf
file creates an S3 bucket to store your Terraform state files and a DynamoDB table to manage state locking and consistency.
- The
-
Initialize and Apply the Backend Configuration:
- Initialize and apply the Terraform configuration to create the S3 bucket and DynamoDB table.
terraform init
terraform apply
- Confirm the apply action when prompted by typing
yes
.
-
Navigate to the CI/CD Terraform Directory:
- Move to the
iac/cicd
directory where the Terraform files for setting up CI/CD infrastructure are located.
cd ../cicd
- Move to the
-
Initialize Terraform:
- Initialize Terraform in this directory to download the necessary providers and prepare the environment.
terraform init
-
Validate Terraform Configuration:
- Run the following command to ensure that the Terraform configuration files are syntactically correct.
terraform validate
-
Plan the CI/CD Infrastructure:
- Create an execution plan to see what resources Terraform will create or modify.
terraform plan
-
Deploy CI/CD Infrastructure:
- You can now instead use a single command to initialize and apply the Terraform configuration for the CI/CD infrastructure. With the
auto-approve
flag, Terraform will not ask for confirmation before applying the changes.
terraform apply --auto-approve
- You can now instead use a single command to initialize and apply the Terraform configuration for the CI/CD infrastructure. With the
-
Setup GitHub Connection:
- 3.1. Go to your GitHub repository, navigate to the
Settings
tab, and selectSecurity
. UnfoldSecrets and variables
and selectActions
.
- 3.1. Go to your GitHub repository, navigate to the
- 3.2. Create the following secrets by clicking on
New repository secret
:-
ARTIFACT_BUCKET
: The name of the S3 bucket where the artifacts will be stored. Go to your resource group in AWS and copy the S3 bucket name that starts with 'github-actions-artifact-'.- To get it:
click on the resource group and find the S3 bucket name that starts with 'github-actions-artifact-'
-
AWS_ACCESS_KEY_ID
: Your AWS access key ID from AutomationSTAR page (see Getting Your AWS Account) -
3.4. Run GitHub Action Terraform Plan Check
-
3.5. Ensure your AWS pipeline is triggered.
-
-
Navigate to the ETL Terraform Directory:
- Now, move to the
iac/etl
directory to deploy the ETL infrastructure.
cd ../etl
- Now, move to the
-
Deploy ETL Infrastructure:
- Initialize and apply the Terraform configuration for the ETL infrastructure.
terraform init && terraform apply --auto-approve
- Login to AWS Console: Log in to your AWS account and verify that all resources have been created.
- Check S3 Buckets: Confirm that the S3 buckets for the backend, Lambda functions, raw, clean, and curated data are present.
- Check Other Resources: Verify that the IAM roles, CodeBuild, and CodePipeline have been created.
- Terraform Init Errors: Ensure your AWS credentials are correctly configured. Use
aws configure
to reset them if necessary. - Python Environment Issues: If you encounter issues with Python dependencies, ensure you are using the correct Python version and the virtual environment is activated.
- Resource Verification: Double-check the AWS region specified in your Terraform configuration; resources may be created in a different region if it's not consistent.
Use this checklist to ensure you've completed all the necessary steps for Exercise 1:
- Obtained temporary AWS credentials
- Forked the repository
- Launched a GitHub Codespace
- Checked out the initial branch
- Ran the
setup_infrastructure.sh
script - Deployed Terraform Backend Infrastructure
- Deployed CI/CD Infrastructure
- Enabled GitHub CodeStar connection
- Deployed ETL Infrastructure
- Verified AWS resources
Once you've completed all these items, you've successfully finished Exercise 1!
Once you have successfully set up your environment and verified the resources, you are ready to move on to Exercise 2: Discovering pytest and boto3. In Exercise 2, you'll learn about pytest and boto3, and implement a fixture to generate test data for your E2E tests.
After completing Exercise 2, you'll be prepared for Exercise 3, where you'll explore how to build Test Reports in Allure.