NLP consulting project: defining a data-driven strategy for the Londonese restaurant Bokan 37

This project has five main steps:

Data Collection
Data Cleaning
Word Embedding
Topic Extraction
Sentiment Analysis

Setup

git clone https://github.com/hehlinge42/nlp_consulting_project.git
cd nlp_consulting_project
pip install -r requirements.txt

Architecture

Session 1: TripAdvisor scraper

Tools to scrap TripAdvisor's UK website (https://www.tripadvisor.co.uk/) for restaurants and their associated reviews made by different users.
cd scraper
See dedicated README in the folder.

Session 2: Data cleaner

Tool to clean and tokenize the reviews scraped from TripAdvisor.
cd cleaner
See dedicated README in the folder.

Session 3: Feature Embedder

Tool to embed tokenized reviews into numerical vector.
cd embedder
See dedicated README in the folder.

Session 4: Feature Embedder with Attention Mechanism

Tool to embed tokenized reviews into numerical vector and predict associated ratings using a Hierarchical Attention Network (HAN).
cd attention_embedder
See dedicated README in the folder.

Run Application from Command Line

As seen from image below simply run the following command and set user defined parameters via GUI:

python3 launch_program.py

GUI User defined settings:

--Save Wordcloud: option to create wordclouds per restaurants.
--Save TFIDF: option to create TFIDF embedding per restaurants.
--Embedding Technique: define embedding technique (lsi, word2vec, fasttext) supported.

Script to merge data from multiple scrapping runs, create a balanced dataset of reviews (ratings 1-5), clean selected reviews and embed words into vectors depending on user defined embedding technique (lsi, word2vec, fasttext and all are supported).

Contributors

Project realized by @elalamik, @erraya, @hehlinge42, @louistransfer and @MaximeRedstone

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
.vscode		.vscode
attention_embedder		attention_embedder
cleaner		cleaner
embedder		embedder
final_presentation		final_presentation
hackathon		hackathon
scraper		scraper
sentiment_analysis/notebooks		sentiment_analysis/notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ignore_large_files.sh		ignore_large_files.sh
launch_program.py		launch_program.py
requirements.txt		requirements.txt
run_all.py		run_all.py
settings_file.cfg		settings_file.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP consulting project: defining a data-driven strategy for the Londonese restaurant Bokan 37

Setup

Architecture

Session 1: TripAdvisor scraper

Session 2: Data cleaner

Session 3: Feature Embedder

Session 4: Feature Embedder with Attention Mechanism

Run Application from Command Line

Contributors

About

Releases

Packages

Contributors 5

Languages

License

hehlinge42/nlp_consulting_project

Folders and files

Latest commit

History

Repository files navigation

NLP consulting project: defining a data-driven strategy for the Londonese restaurant Bokan 37

Setup

Architecture

Session 1: TripAdvisor scraper

Session 2: Data cleaner

Session 3: Feature Embedder

Session 4: Feature Embedder with Attention Mechanism

Run Application from Command Line

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages