This repository contains code and configuration files for an Extract, Transform, Load (ETL) project using Google Cloud Data Fusion for data extraction, Apache Airflow/Composer for orchestration, and Google BigQuery for data loading.
Refer youtube Video for this project
The project aims to perform the following tasks:
- Data Extraction: Extract data using python.
- Data Masking: Apply data masking & encoding techniques to sensitive information in Cloud Data Fusion before loading it into BigQuery.
- Data Loading: Load transformed data into Google BigQuery tables.
- Orchestration: Automate complete Data pipeline using Airflow ( Cloud Composer )