Skip to content

Latest commit

 

History

History
101 lines (76 loc) · 3.6 KB

File metadata and controls

101 lines (76 loc) · 3.6 KB

Udacity Data Scientist Nanodegree Program

Disaster Response Pipeline Project


License: MIT

Description

Intro Pic

This repository has been created for Udacity Data Scientist Nanodegree Program - Data Engineering Part - Disaster Response Pipeline Project. The dataset has been provided by Figure Eight and it contains pre-labelled tweet and messages from real-life disaster The aim of the project is to build a NLP Machine Learning Pipeline to categorize emergency messages based on the needs communicated by sender. The predictions from the pipeline will be used by organizations via web app that has been designed in the project.

The project is consisted of 3 main parts.

  1. ETL Pipeline: Extract data from source, clean and save into a SQLite DB.
  2. Machine Learning Pipeline: To train the model in order to be able to classify the messages correctly.
  3. Flask & Plotly Based Web App: Interactive web app that allows users to enter message and get classification predictions.

Getting Started

Directory Structure

        Udacity_DisasterResponses_Project
          |-- app
                |-- templates
                        |-- go.html
                        |-- master.html
                |-- run.py
                |-- visualizations.py                
          |-- data
                |-- disaster_message.csv
                |-- disaster_categories.csv
                |-- CleanDataDB.db
                |-- process_data.py
          |-- models
                |-- model.pkl
                |-- train_classifier.py
          |-- Jupyter_Notebooks
                |-- ETL Pipeline Preparation.ipynb
                |-- ETL Pipeline Preparation.html
                |-- ML Pipeline Preparation.ipynb
                |-- ML Pipeline Preparation.html                
          |-- README

Installation & Instructions

  1. Create virtual environment and activate it
    python3 -m venv env
    source env/bin/activate

  2. Download the repository to virtual environment
    cd env
    git clone https://github.com/eermis1/Udacity_DisasterResponses_Project.git
    cd Udacity_DisasterResponses_Project

  3. Install required libraries
    pip install numpy
    pip install scipy
    pip install pandas
    pip install sklearn
    pip install nltk
    pip install SQLalchemy
    pip install flask
    pip install plotly

  4. Go to app directory
    cd app

  5. Run "run.py"
    python run.py

  6. Go to http://0.0.0.0:3001/

Don't Forget !

If you wish to run process_data.py and train_classifier.py seperately please follow below steps;

python process_data.py disaster_messages.csv disaster_categories.csv DisasterResponse.db
python train_classifier.py ../data/DisasterResponse.db classifier.pkl

Notes:

  • The arguments change be changed based on user requirements
  • Repository already includes DB and model.pkl

Author

The repository has been created by Evren Ermiş

Screenshots

messagelentghperid Graph2