Skip to content

Latest commit

 

History

History
15 lines (9 loc) · 1.12 KB

README.md

File metadata and controls

15 lines (9 loc) · 1.12 KB

A short tutorial on Natural Language Processing

This tutorial on basic NLP techniques for data science was created for the UC Berkeley CDIPS 2016 workshop. Most parts were taken from the kaggle Bag of Words Meets Bags of Popcorn tutorial.

Introduction

What is NLP? NLP (Natural Language Processing) is a set of techniques for approaching text problems. This tutorial will run you through basic NLP techniques. Please see parts 2 and 3 on the kaggle website for more advnaced tutorials. A brief intro can be found in this repo's NLP_presentation.pdf

Interactive Tutorial

We'll use an ipython notebook to run through kaggle's excellent intro on NLP. Use the notebook included in this repo, entitled "Bag_of_words" Follow the tutorial here This will help you get started with loading and cleaning the IMDB movie reviews. If you like you can then follow up iwth applying a simple Bag of Words model to get surprisingly accurate predictions of whether a review is thumbs-up or thumbs-down.