This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concurrent data pipeline by using Amazon EMR and Apache Livy. This pipeline is orchestrated by Apache Airflow.
This folder contains the cloudformation template that spins up the Airflow infrastructure.
This folder contains reusable code for Amazon EMR and Apache Livy.
This folder contains sample transformation scala code which transforms the movielens data files from csv to parquet.
This script contains the code for the DAG definition. It basically defines the Airflow pipeline.
This library is licensed under the Apache 2.0 License.