This repository provides materials for a session that is part of the I2DS Tools for Data Science workshop run at the Hertie School, Berlin in November 2021. The student-run workshop is part of the course Introduction to Data Science taught by Simon Munzert at the Hertie School, Berlin, in Fall 2021.
This session will introduce you to the modern data wrangling workflow with data.table. Data wrangling is one of the core steps in the data science workflow, specifically when cleaning raw data sets into a format that is readily analyzable. Data.table offers fast and memory efficient: file reader and writer, aggregations, updates, equi, non-equi, rolling, range and interval joins, in a short and flexible syntax, for faster development. It is commonly used for data manipulation challenges, including the manipulation of datasets and variables.
The goals of this session are to (1) equip you with conceptual knowledge about the data.table package and data wrangling workflow, (2) demonstrate the ease of using data.table through highlighting the most common data wrangling functions, and (3) provide you with a practice exercise and further resources.
The material in this repository is made available under the MIT license.
Ma. Adelle Gia Arbo prepared the flow and content of the presentation and exercise. Recorded and delivered the video presentation.
Viraaj Akuthota prepared the flow and content of the presentation and exercise.