This repository provides materials for a session that is part of the I2DS Tools for Data Science workshop run at the Hertie School, Berlin in November 2021. The student-run workshop is part of the course Introduction to Data Science taught by Simon Munzert at the Hertie School, Berlin, in Fall 2021.
Please click here for our presentation html.
Please click here for the live session exercises html.
This workshop will introduce you to text analysis using the quanteda package. Text analysis is the process of automatically classifying and extracting meaningful information from unstructured text. Quanteda is a fundamental tool to perform text analysis, as well as a variety of other natural language processing tasks such as corpus management, tokenization, and visualization.
The goals of this session are to (1) introduce you to pre-processing of text and management of document-feature matrices, (2) try out basic functions on corpora and tokens, and (3) provide you with practice material as well as some further readings.
- Kathryn Malchow
- Federico Mammana
- quanteda: An R package for the quantitative analysis of textual data
- quanteda tutorials by Kohei Watanabe and Stefan Müller
- a beginner's guide to text analysis with quanteda
- textual data visualization with quanteda
The material in this repository is made available under the MIT license.
Kathryn Malchow prepared the showcase of functions and tools of the presentation. Prepared the live tutorial.
Federico Mammana prepared the introduction to quanteda and motivation of the presentation. Prepared the live tutorial.