Skip to content

Latest commit

 

History

History
13 lines (11 loc) · 534 Bytes

README.md

File metadata and controls

13 lines (11 loc) · 534 Bytes

Game-of-thrones-analysis

This is a natural language processing project using subtitle files from Game of Thrones Season7.

1. Natural Language Processing using (NLTK)


a. tokenize words from each document
b. filter out stop words and stem the remaining
c. rank the most-used 100 words
d. count word frequency by episode

2. K-means Clustering (Scikit-learn)


a. K-means clustering to group similar words based on frequencies
b. PCA analysis for dimensionality reduction
c. plot results