This repository has been created for Udacity Data Scientist Nanodegree Program - Data Engineering Part - Disaster Response Pipeline Project. The dataset has been provided by Figure Eight and it contains pre-labelled tweet and messages from real-life disaster The aim of the project is to build a NLP Machine Learning Pipeline to categorize emergency messages based on the needs communicated by sender. The predictions from the pipeline will be used by organizations via web app that has been designed in the project.
The project is consisted of 3 main parts.
- ETL Pipeline: Extract data from source, clean and save into a SQLite DB.
- Machine Learning Pipeline: To train the model in order to be able to classify the messages correctly.
- Flask & Plotly Based Web App: Interactive web app that allows users to enter message and get classification predictions.
Udacity_DisasterResponses_Project
|-- app
|-- templates
|-- go.html
|-- master.html
|-- run.py
|-- visualizations.py
|-- data
|-- disaster_message.csv
|-- disaster_categories.csv
|-- CleanDataDB.db
|-- process_data.py
|-- models
|-- model.pkl
|-- train_classifier.py
|-- Jupyter_Notebooks
|-- ETL Pipeline Preparation.ipynb
|-- ETL Pipeline Preparation.html
|-- ML Pipeline Preparation.ipynb
|-- ML Pipeline Preparation.html
|-- README
-
Create virtual environment and activate it
python3 -m venv env
source env/bin/activate
-
Download the repository to virtual environment
cd env
git clone https://github.com/eermis1/Udacity_DisasterResponses_Project.git
cd Udacity_DisasterResponses_Project
-
Install required libraries
pip install numpy
pip install scipy
pip install pandas
pip install sklearn
pip install nltk
pip install SQLalchemy
pip install flask
pip install plotly
-
Go to app directory
cd app
-
Run "run.py"
python run.py
-
Go to http://0.0.0.0:3001/
If you wish to run process_data.py and train_classifier.py seperately please follow below steps;
python process_data.py disaster_messages.csv disaster_categories.csv DisasterResponse.db
python train_classifier.py ../data/DisasterResponse.db classifier.pkl
Notes:
- The arguments change be changed based on user requirements
- Repository already includes DB and model.pkl
The repository has been created by Evren Ermiş