Data Engineering ETL Automation for Fashion Trend Analysis

This project demonstrates an ETL (Extract, Transform, Load) automation pipeline using Directed Acyclic Graphs (DAGs) in Apache Airflow.

💃 Overview Project

This project automates the ETL (Extract, Transform, Load) pipeline to process data from multiple sources, integrate it, and provide insights. It focuses on analyzing trends in fashion-related data by leveraging Apache Airflow to manage the workflow. The end goal is to predict seasonal trends and generate valuable reports. The trend is collected based on keyword such as fur jacket, cardigan, coat, etc.

📂 Directory Layout (🔍 where to look at..)

ETL-Fashion-Tren-Analysis/  
│  
├── Analysis/         # Data Visualization   
├── DAGs/             # Directory for Airflow DAGs 
├── Privacy Policy/   # Privacy Policy App for Authorization Pinterest Access Token 
└── Script ETL/       # Functions that run on automation

🌟 Pipeline & ETL Architecture

Extract Data is collected from two sources
- Web scraping from X (formerly Twitter)
- API calls to Pinterest
Transform
- Handle X data: CSV data from X is cleaned and converted into time-series format for engagement metrics
- Handle Pinterest data: JSON data from Pinterest is flattened and structured to extract growth-related insights
Load The transformed and integrated data is stored in a PostgreSQL database (data warehouse)
Analyze and Report The data warehouse serves as the foundation for generating trend insights and visualizations to predict upcoming fashion trends

Apache Airflow orchestrates each step of the ETL process through DAGs.

👜 Data Sources

The pipeline pulls data from two primary sources:

X (formerly Twitter): Data is extracted via web scraping and exported as a CSV file containing user engagement metrics and other relevant attributes.
Pinterest: Data is fetched using an API, providing JSON files containing growth trends and user interaction data. These sources provide complementary datasets for fashion trend analysis.

✨ Extract

Web Scraping (X): Engagement metrics such as likes, retweets, and comments are collected and saved as a CSV file.
API Call (Pinterest): JSON data is retrieved, including attributes related to growth trends and trend behaviors to the respect of time (yearly, monthly, weekly)

✨ Transform & Integration

Cleaning and reformatting
- CSV Data: Engagement data is cleaned and converted into a time-series format for weekly analysis
- JSON Data: Nested structures in JSON are flattened, and key-value pairs are extracted for time-series growth metrics
Data integration
- The cleaned datasets are merged into a unified pandas DataFrame with consistent time intervals
- Columns such as time_weekly, data_tweet, and data_pinterest are created to combine metrics from both sources

✨ Load

The integrated DataFrame is uploaded into a PostgreSQL database serving as the data warehouse.
The database provides a centralized storage solution for the processed data, enabling efficient querying and further analysis.

Documentation

YouTube: Watch on YouTube
Article Post: Read the Article Post

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering ETL Automation for Fashion Trend Analysis

💃 Overview Project

🌟 Pipeline & ETL Architecture

👜 Data Sources

✨ Extract

✨ Transform & Integration

✨ Load

Documentation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Analysis		Analysis
DAGs		DAGs
Privacy Policy		Privacy Policy
Script ETL		Script ETL
README.md		README.md

potreic/ETL-Fashion-Trend-Analysis

Folders and files

Latest commit

History

Repository files navigation

Data Engineering ETL Automation for Fashion Trend Analysis

💃 Overview Project

🌟 Pipeline & ETL Architecture

👜 Data Sources

✨ Extract

✨ Transform & Integration

✨ Load

Documentation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages