Skip to content

This project demonstrates a data engineering pipeline using FastAPI, MySQL, and Jupyter Notebook. It processes raw transactional data, cleans and transforms it using Pandas, and loads it into a structured MySQL database. A FastAPI-based REST API allows querying customer summaries and product sales data efficiently.

Notifications You must be signed in to change notification settings

vighnesh3043/ETL-project1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

ETL Pipeline and FastAPI API

Overview

This project implements an ETL (Extract, Transform, Load) pipeline using Python, Pandas, and MySQL to process retail transaction data. Additionally, a FastAPI application provides an API to retrieve customer summaries and product sales insights.

Features

  • ETL Pipeline: Cleans and loads data from CSV into MySQL.
  • MySQL Database: Stores structured transaction data.
  • FastAPI Application: Provides RESTful APIs for querying data.

Setup Instructions

1. Prerequisites

Ensure the following are installed:

  • Python 3.12.3
  • MySQL Server
  • MySQL Workbench
  • Required Python dependencies
  • Make sure the dataset is in CSV format for better results

2. Install Dependencies

Run the following command to install required packages:

pip install pandas mysql-connector-python fastapi uvicorn

3. MySQL Setup

a. Install MySQL Server

Download and install MySQL from MySQL Official Site.

b. Create Database and Tables

Create a database using MySQL Workbench or the command line:

CREATE DATABASE yourdatabase;
USE yourdatabase;

Note: The ETL script will automatically create the required database and tables.

4. Running the FastAPI Application

Start the FastAPI app:

uvicorn main:app --reload

Note: Ensure the ETL script and FastAPI app run in the same environment.

5. Access the API

6. Database Configuration

Modify the credentials in your script as needed:

DB_HOST = "e.g:123.0.0.1"
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_NAME = "your_database_name"

API Endpoints

1. Get Customer Summary

  • Endpoint: GET /customer_summary
  • Query Parameter: customer_id (optional, int)
  • Example Request:
    GET http://127.0.0.1:8000/customer_summary?customer_id=12345
  • Response Example:
    [
      {
        "CustomerID": 12345,
        "TotalSpent": 500.75,
        "TotalOrders": 15,
        "LastPurchaseDate": "2025-01-15 14:30:00"
      }
    ]

2. Get Product Sales Overview

  • Endpoint: GET /product_sales
  • Query Parameter: product_code (optional, string)
  • Example Request:
    GET http://127.0.0.1:8000/product_sales?product_code=85123A
  • Response Example:
    [
      {
        "StockCode": "85123",
        "TotalSold": 150,
        "TotalRevenue": 1200.50,
        "LastSaleDate": "2025-02-18 10:00:00"
      }
    ]

Sample Queries

1. Retrieve All Customers Summary

GET http://127.0.0.1:8000/customer_summary

2. Retrieve Data for a Specific Customer

GET http://127.0.0.1:8000/customer_summary?customer_id=12345

3. Retrieve All Product Sales Data

GET http://127.0.0.1:8000/product_sales

4. Retrieve Sales Data for a Specific Product

GET http://127.0.0.1:8000/product_sales?product_code=85123

Conclusion

This project provides an efficient ETL pipeline and a RESTful API for analyzing customer and product sales data. 🚀

About

This project demonstrates a data engineering pipeline using FastAPI, MySQL, and Jupyter Notebook. It processes raw transactional data, cleans and transforms it using Pandas, and loads it into a structured MySQL database. A FastAPI-based REST API allows querying customer summaries and product sales data efficiently.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published