Skip to content

ablatov/aws-deequ-glue

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Serverless Data Quality solution based on AWS Deequ and AWS Glue

Documentation

Original documentation can be found here - https://aws.amazon.com/blogs/big-data/building-a-serverless-data-quality-and-analysis-framework-with-deequ-and-aws-glue/

Installation & deployment

From local laptop

  1. Setup AWS CLI locally (temporary AWS credentials for the account can be used)
  2. Install Python >= 3.7 (check this https://linuxize.com/post/how-to-install-python-3-7-on-ubuntu-18-04/)
  3. Install Node.js >= 14.7.0
curl -fsSL https://deb.nodesource.com/setup_current.x | sudo -E bash -
sudo apt-get install -y nodejs
  1. Install needed dependencies
cd backend
make install
mkdir ~/.npm-global
PATH=~/.npm-global/bin:$PATH
NPM_CONFIG_PREFIX=~/.npm-global
sudo npm install -g serverless serverless-pseudo-parameters serverless-python-requirements serverless-wsgi --unsafe
  1. To deploy simply run:
cd ../
./deploy.sh -r $YOUR_REGION_HERE -p $YOUR_AMAZON_PROFILE -n $YOUR_STACK_NAME -e $YOUR_ENV_NAME

All these parameters are optional, default values you can find in deploy.sh script. $YOUR_REGION_HERE - AWS region where you want to deploy $YOUR_AMAZON_PROFILE - profile you select to use (set during aws configure) $YOUR_STACK_NAME - how your Stack will be displayed in AWS Cloudformation $YOUR_ENV_NAME - environment to deploy resources to (dev\uat\prod)

  1. To test the deployment in E2E manner, please use manual from here (Text and links under architecture picture): https://aws.amazon.com/blogs/big-data/building-a-serverless-data-quality-and-analysis-framework-with-deequ-and-aws-glue/

!!! TESTING OF THE SOLUTION WILL COST YOUR MONEY ( about 10 US cents :) ) !!!

From Jenkins server

  1. Build and push docker container with pushDockerfile.groovy
  2. Create a new Jenkins item as a pipeline and use Jenkinsfile to configure the job.
  3. Run Jenkins job with parameters needed.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published