This tutorial on basic NLP techniques for data science was created for the UC Berkeley CDIPS 2016 workshop. Most parts were taken from the kaggle Bag of Words Meets Bags of Popcorn tutorial.
What is NLP? NLP (Natural Language Processing) is a set of techniques for approaching text problems. This tutorial will run you through basic NLP techniques. Please see parts 2 and 3 on the kaggle website for more advnaced tutorials. A brief intro can be found in this repo's NLP_presentation.pdf
We'll use an ipython notebook to run through kaggle's excellent intro on NLP. Use the notebook included in this repo, entitled "Bag_of_words" Follow the tutorial here This will help you get started with loading and cleaning the IMDB movie reviews. If you like you can then follow up iwth applying a simple Bag of Words model to get surprisingly accurate predictions of whether a review is thumbs-up or thumbs-down.