getting-started-with-nlp

This is the material for the intro to nlp workshop by Alexandra Cosseron and Eva Jaumann.

In order to be ready for this workshop, you need to:

Setup your computer
Download Amazon reviews dataset

Setup your computer

Step 1: Requirements

pyenv >= 1.2.13
python >= 3.7.0

Step 2: Clone the workshop repository

git clone https://github.com/pyladieshamburg/getting-started-with-nlp.git
cd getting-started-with-nlp

Step 3: Setup your virtual environment

With pyenv and poetry:

pyenv local 3.7.4
pip install poetry
make setup

Step 4: Open Jupyter Lab

Run the following command in your shell:

poetry run jupyter lab

Download the necessary data

We will use two datasets during the workshop:

SMS Spam Collection dataset, which has been downloaded for you and which you will find in the folder data of this repo. More information about this dataset can be found in the following section;
Amazon Reviews: Unlocked Mobile Phones, which you will need to download by yourself.

Amazon Reviews: Unlocked Mobile Phones is a public dataset made available through the Kaggle platform. You can download it here. You will need to create a free Kaggle account in order to do so. Once the downloading is completed, you can place the file in the folder data (the code we will use assumes that the data files are located there, you will need to adjust the code accordingly if you don't). No need to unzip the file at this stage, it is handled in the code.

Information about the data we will use

Part 1: SMS Spam Collection

As described in its dedicated website:

The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being legitimate (ham) or spam.

Further details about this dataset can be found at http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/ .

The paper of reference for this dataset is:

Almeida, T.A., Gómez Hidalgo, J.M., Yamakami, A. Contributions to the Study of SMS Spam Filtering: New Collection and Results. Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG'11), Mountain View, CA, USA, 2011. preprint

Part 2: Amazon Reviews: Unlocked Mobile Phones

Kaggle dedicated webpage provides the following description:

PromptCloud [a web scraping company, ndrl] extracted 400 thousand reviews of unlocked mobile phones sold on Amazon.com to find out insights with respect to reviews, ratings, price and their relationships.

Given below are the fields:

Product Title

Brand

Price

Rating

Review text

Number of people who found the review helpful

Data was acquired in December, 2016 by the crawlers build to deliver [their] data extraction services.

Troubleshooting

While importing nltk module, if you encounter a ModuleImportError , the following procedure may help:

Close Jupyter lab
Run the following command:

poetry run python -m ipykernel install --user --name=nltk_ws

Reopen Jupyter Lab
Select the kernel nltk_ws

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
slides_and_solution		slides_and_solution
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
part_1_spam_classifier.ipynb		part_1_spam_classifier.ipynb
part_2_sentiment_analysis.ipynb		part_2_sentiment_analysis.ipynb
part_3_toward_improving_spam_classifier.ipynb		part_3_toward_improving_spam_classifier.ipynb
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

getting-started-with-nlp

Setup your computer

Step 1: Requirements

Step 2: Clone the workshop repository

Step 3: Setup your virtual environment

Step 4: Open Jupyter Lab

Download the necessary data

Information about the data we will use

Part 1: SMS Spam Collection

Part 2: Amazon Reviews: Unlocked Mobile Phones

Troubleshooting

About

Releases

Packages

Languages

License

pyladieshamburg/getting-started-with-nlp

Folders and files

Latest commit

History

Repository files navigation

getting-started-with-nlp

Setup your computer

Step 1: Requirements

Step 2: Clone the workshop repository

Step 3: Setup your virtual environment

Step 4: Open Jupyter Lab

Download the necessary data

Information about the data we will use

Part 1: SMS Spam Collection

Part 2: Amazon Reviews: Unlocked Mobile Phones

Troubleshooting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages