🌟 Workshop_03 - Machine Learning Prediction and Streaming Data 🌟

by Manuel Gruezo

👋 Welcome!

In this project, you'll train a regression machine learning model to predict the happiness score of countries 🌍. You'll be working with 5 CSV files containing global happiness data.

💻 Technologies used:

Python 🐍
Jupyter Notebook 📓
PostgreSQL 🐘
Apache Kafka 🚀

🎯 Objectives

EDA and ETL: Perform exploratory data analysis and prepare data by cleaning, preprocessing, and selecting relevant features. 🧹
Regression Model Training: Develop a regression model, optimize it, and evaluate its performance. 📊
Real-time Streaming: Use Apache Kafka to handle real-time data processing from EDA/ETL to predictions. 🔄
Database Integration: Store predictions and relevant features in a PostgreSQL database. 📂

📂 Folder Structure

Workshop3
├── data                           # 📁 CSV data files
├── notebooks                      # 📝 Jupyter notebooks
│   ├── 001-EDA.ipynb              # Exploratory Data Analysis
│   ├── 002-model_metrics.ipynb    # Model evaluation and metrics
│   └── model.pkl                  # Trained model in pickle format
├── src                            # 🛠️ Project's source code
│   ├── database                   # Database-related modules
│   │   ├── connection.py          # Database connection script
│   │   └── db_settings.json       # Database configuration
│   ├── models                     # Machine learning models
│   └── utils                      # Utility scripts (e.g., feature selection)
├── .env                           # 🌐 Environment variables
├── docker-compose.yml             # 🐋 Docker Compose file
├── consumer.py                    # Consumer microservice script
├── producer.py                    # Producer microservice script
└── requirements.txt               # 📜 Dependencies list

🌐 Data Source

📥 World Happiness Report Dataset

🚀 How to Run the Project

Pre-requisites:

🐍 Python: Download Python
🐘 PostgreSQL: Download PostgreSQL
🐋 Docker: Download Docker

Steps:

1️⃣ Clone this repository:

git clone https://github.com/alej0909/Workshop-3.git

2️⃣ Navigate to the project folder:

cd Workshop-3

3️⃣ Create a virtual environment:

python -m venv venv

4️⃣ Activate the virtual environment:

./venv/Scripts/activate

5️⃣ Configure your database:

Create a db_settings.json file under src/database with:

{
  "user": "Your PostgreSQL username",
  "password": "Your PostgreSQL password",
  "host": "Your database host address",
  "port": "Your PostgreSQL port",
  "database": "Your database name"
}

6️⃣ Install required libraries:

pip install -r requirements.txt

7️⃣ Set up your environment:

Create a .env file and define the WORK_PATH variable.

8️⃣ Set up your database:

Create a PostgreSQL database matching the database name in your db_settings.json.

9️⃣ Start with the Jupyter notebook:

Open and run 001-EDA.ipynb.

🌟 Running the Streaming Architecture

🔟 Run Docker:

docker compose up

1️⃣1️⃣ Access Kafka container terminal:

docker exec -it kafka-test bash

1️⃣2️⃣ Create a Kafka topic:

kafka-topics --bootstrap-server kafka-test:9092 --create --topic predict-happiness

1️⃣3️⃣ Run the producer and consumer:

Producer:

python producer.py

Consumer:

python consumer.py

1️⃣4️⃣ Verify your database:

Check PostgreSQL for the new table with happiness predictions.

🧪 Evaluate the Model

Run 002-model_metrics.ipynb to analyze the model's performance and metrics 📈.

🎉 Congratulations! You're ready to predict happiness in real-time. 💡

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟 Workshop_03 - Machine Learning Prediction and Streaming Data 🌟

👋 Welcome!

🎯 Objectives

📂 Folder Structure

🌐 Data Source

🚀 How to Run the Project

Pre-requisites:

Steps:

🌟 Running the Streaming Architecture

🧪 Evaluate the Model

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
consumer.py		consumer.py
docker-compose.yml		docker-compose.yml
producer.py		producer.py
requirements.txt		requirements.txt

alej0909/Workshop-3

Folders and files

Latest commit

History

Repository files navigation

🌟 Workshop_03 - Machine Learning Prediction and Streaming Data 🌟

👋 Welcome!

🎯 Objectives

📂 Folder Structure

🌐 Data Source

🚀 How to Run the Project

Pre-requisites:

Steps:

🌟 Running the Streaming Architecture

🧪 Evaluate the Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages