Skip to content

intro-to-data-science-21-workshop/09-MaAdelleGiaArbo-DataWrangling-data.table

Repository files navigation

Data wrangling at scale using data.table

Summary

This repository provides materials for a session that is part of the I2DS Tools for Data Science workshop run at the Hertie School, Berlin in November 2021. The student-run workshop is part of the course Introduction to Data Science taught by Simon Munzert at the Hertie School, Berlin, in Fall 2021.

Session contents

This session will introduce you to the modern data wrangling workflow with data.table. Data wrangling is one of the core steps in the data science workflow, specifically when cleaning raw data sets into a format that is readily analyzable. Data.table offers fast and memory efficient: file reader and writer, aggregations, updates, equi, non-equi, rolling, range and interval joins, in a short and flexible syntax, for faster development. It is commonly used for data manipulation challenges, including the manipulation of datasets and variables.

Main learning objectives

The goals of this session are to (1) equip you with conceptual knowledge about the data.table package and data wrangling workflow, (2) demonstrate the ease of using data.table through highlighting the most common data wrangling functions, and (3) provide you with a practice exercise and further resources.

Instructors

Further resources

License

The material in this repository is made available under the MIT license.

Statement of contributions

Ma. Adelle Gia Arbo prepared the flow and content of the presentation and exercise. Recorded and delivered the video presentation.

Viraaj Akuthota prepared the flow and content of the presentation and exercise.

About

Learn the basics of data wrangling using data.table

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published