Skip to content

Melody-Meng/Game-of-thrones-analysis

Repository files navigation

Game-of-thrones-analysis

This is a natural language processing project using subtitle files from Game of Thrones Season7.

1. Natural Language Processing using (NLTK)


a. tokenize words from each document
b. filter out stop words and stem the remaining
c. rank the most-used 100 words
d. count word frequency by episode

2. K-means Clustering (Scikit-learn)


a. K-means clustering to group similar words based on frequencies
b. PCA analysis for dimensionality reduction
c. plot results