by Manuel Gruezo
In this project, you'll train a regression machine learning model to predict the happiness score of countries 🌍. You'll be working with 5 CSV files containing global happiness data.
💻 Technologies used:
- Python 🐍
- Jupyter Notebook 📓
- PostgreSQL 🐘
- Apache Kafka 🚀
- EDA and ETL: Perform exploratory data analysis and prepare data by cleaning, preprocessing, and selecting relevant features. 🧹
- Regression Model Training: Develop a regression model, optimize it, and evaluate its performance. 📊
- Real-time Streaming: Use Apache Kafka to handle real-time data processing from EDA/ETL to predictions. 🔄
- Database Integration: Store predictions and relevant features in a PostgreSQL database. 📂
Workshop3
├── data # 📁 CSV data files
├── notebooks # 📝 Jupyter notebooks
│ ├── 001-EDA.ipynb # Exploratory Data Analysis
│ ├── 002-model_metrics.ipynb # Model evaluation and metrics
│ └── model.pkl # Trained model in pickle format
├── src # 🛠️ Project's source code
│ ├── database # Database-related modules
│ │ ├── connection.py # Database connection script
│ │ └── db_settings.json # Database configuration
│ ├── models # Machine learning models
│ └── utils # Utility scripts (e.g., feature selection)
├── .env # 🌐 Environment variables
├── docker-compose.yml # 🐋 Docker Compose file
├── consumer.py # Consumer microservice script
├── producer.py # Producer microservice script
└── requirements.txt # 📜 Dependencies list
📥 World Happiness Report Dataset
- 🐍 Python: Download Python
- 🐘 PostgreSQL: Download PostgreSQL
- 🐋 Docker: Download Docker
1️⃣ Clone this repository:
git clone https://github.com/alej0909/Workshop-3.git
2️⃣ Navigate to the project folder:
cd Workshop-3
3️⃣ Create a virtual environment:
python -m venv venv
4️⃣ Activate the virtual environment:
./venv/Scripts/activate
5️⃣ Configure your database:
- Create a
db_settings.json
file undersrc/database
with:
{
"user": "Your PostgreSQL username",
"password": "Your PostgreSQL password",
"host": "Your database host address",
"port": "Your PostgreSQL port",
"database": "Your database name"
}
6️⃣ Install required libraries:
pip install -r requirements.txt
7️⃣ Set up your environment:
- Create a
.env
file and define theWORK_PATH
variable.
8️⃣ Set up your database:
- Create a PostgreSQL database matching the
database
name in yourdb_settings.json
.
9️⃣ Start with the Jupyter notebook:
- Open and run
001-EDA.ipynb
.
🔟 Run Docker:
docker compose up
1️⃣1️⃣ Access Kafka container terminal:
docker exec -it kafka-test bash
1️⃣2️⃣ Create a Kafka topic:
kafka-topics --bootstrap-server kafka-test:9092 --create --topic predict-happiness
1️⃣3️⃣ Run the producer and consumer:
- Producer:
python producer.py
- Consumer:
python consumer.py
1️⃣4️⃣ Verify your database:
- Check PostgreSQL for the new table with happiness predictions.
Run 002-model_metrics.ipynb
to analyze the model's performance and metrics 📈.
🎉 Congratulations! You're ready to predict happiness in real-time. 💡