ELT-Visualization_Flow

1. Project Structure:

ELT-Visualization-Flow
├── data_extraction
│   ├── every_earthquake.py
│   ├── location_extraction.py
│   ├── severe_climate_alert.py
│   └── weather_data_extraction.py
├── data_loading
│   ├── load_to_bigquery.py
│   └── severe_weather_loading.py
├── data_preprocessing
│   ├── data_segregation.py
│   └── severe_weather_preprocessing.py
├── gcp_pub_sub
│   ├── gcp_publisher.py
│   └── gcp_subscriber.py
│
├── ELT_DBT_Project(DBT Project Directory)
├── extract_all_data.py
├── README.md
└── requirements.txt

2. Project Flow:

Introduction: In this project, We have developed a robust and automated system to gather, process, and analyze disaster data from various APIs, ensuring a seamless flow of information for informed decision-making. Below is a detailed step-by-step guide outlining the entire project flow:

Step 1: Cron Job Setup Initiate the project by setting up a cron job to execute a Python script daily. This script serves as the backbone for data extraction from multiple APIs.

Step 2: API Integration Utilize Python to interact with different APIs, including the Every Earthquake API, OpenMeteo Weather API, Flood and Cyclone Alerts API, and Google GeoCoding API. This ensures a comprehensive dataset covering a spectrum of environmental factors.

Step 3: Data Cleaning and Standardization Implement data cleaning procedures in the Python script to ensure consistency and accuracy in the extracted information. Standardize the data according to pre-defined requirements for better downstream processing.

Step 4: Google BigQuery and Pub/Sub Integration Leverage Google Cloud services to efficiently manage and store the extracted data. Use Pub/Sub to handle the data flow and decrease the load on Google BigQuery, ensuring optimal performance.

Step 5: DBT Transformation Incorporate a DBT (Data Build Tool) project with SQL transformation codes. These codes are designed to transform raw data into a structured and meaningful format tailored to specific analytical needs.

Step 6: Connecting Tableau to BigQuery Establish a connection between Tableau and Google BigQuery, enabling a seamless transfer of transformed data for analysis. This integration provides a user-friendly interface for visual exploration and reporting.

Step 7: Data Analysis in Tableau Utilize Tableau's powerful features to conduct in-depth data analysis on the transformed dataset. Leverage various visualization options to extract meaningful insights from the disaster data.

Step 8: TabPy Integration for Enhanced Calculations Integrate TabPy server to enhance the analytical capabilities within Tableau. This allows for more complex calculations and statistical analyses, further enriching the depth of insights derived from the data.

Conclusion: This comprehensive project flow ensures an end-to-end automated pipeline for environmental data analysis. From data extraction and cleaning to transformation and visualization, each step is meticulously designed to provide accurate and actionable insights for stakeholders. The seamless integration of various technologies and platforms creates a robust ecosystem for informed decision-making in the realm of environmental monitoring and analysis.

3. Setup Procedure:

Navigate to the directory where you want to set up this project.
Open cmd/bash and run the below command:
On Mac/Win: git clone https://github.com/sprakshith/ELT-Visualization-Flow.git
Now create a virtual enviroment.
On Mac: python3 -m venv ./venv
Example: python3 -m venv ./venv

On Win: python -m venv "[Path to Project Directory]\[NAME_OF_VIRTUAL_ENV]"
Example: python -m venv "D:\ELT-Visualization-Flow\venv"
To activate the venv run the below command.
On Mac: source venv/bin/activate
On Win: venv\Scripts\activate.bat
To install all the requirements run the below command. Execute this command whenever there is a change in requirements.txt file.
On Mac/Win: pip install -r requirements.txt
Setup a CRON Job to run the extract_all_data.py and dbt run command once a day.
Run the gcp_subscriber.py python file to be active all the time so that data can be appened to BigQuery in time.
Run the TabPy server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ELT-Visualization_Flow

1. Project Structure:

2. Project Flow:

3. Setup Procedure:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ELT_DBT_Project		ELT_DBT_Project
data_extraction		data_extraction
data_loading		data_loading
data_preprocessing		data_preprocessing
gcp_pub_sub		gcp_pub_sub
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
extract_all_data.py		extract_all_data.py
requirements.txt		requirements.txt

sprakshith/Disaster-Data-Pipeline

Folders and files

Latest commit

History

Repository files navigation

ELT-Visualization_Flow

1. Project Structure:

2. Project Flow:

3. Setup Procedure:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages