This repo hosts the code (notebooks) and data used for the Text Analysis section of the NCRM Spring School on Computational Communication. The folders refer to the various topics in the course and include:
\preprocessing-lexicon
: This covers our introductory section which introduces you to the Python programming language, text preprocessing techniques, and lexicon-based analysis.\supervised-learning
: We cover supervised approach to text classification, using sentiment analysis as our working example.\topic-models
: This section covers unsupervised approaches (e.g., topic models) for clustering text into topics or themes.\language-models
: Our last section covers the use of word embeddings and language models to examine semantic similarity between texts.
All of the data used throughout the week will be available in \data
. Holler if you have any questions!
You can download the materials for this class (as they become available) at the following:
https://www.dropbox.com/scl/fo/3r9v1fca39g4rlo16vleg/h?dl=0&rlkey=cm3yo60ql7t6t75vamwz7vl7z