Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 481 Bytes

File metadata and controls

6 lines (4 loc) · 481 Bytes

Spark Programming - Natural Language Processing and Information Retrieval

IN432 Big Data coursework 2018

Group coursework together with @laibe as part of the course INM432 BigData at City, University of London.

This coursework is about classification of e-mail messages as spam or non-spam in Spark and alsos introduce a few additional elements, such as the NLTK and some of the preprocessing and machine learning functions that come with Spark.