This is a natural language processing project using subtitle files from Game of Thrones Season7.
a. tokenize words from each document
b. filter out stop words and stem the remaining
c. rank the most-used 100 words
d. count word frequency by episode
a. K-means clustering to group similar words based on frequencies
b. PCA analysis for dimensionality reduction
c. plot results