Skip to content

Latest commit

 

History

History
20 lines (12 loc) · 2.6 KB

README.md

File metadata and controls

20 lines (12 loc) · 2.6 KB

CORE Skills Data Science Springboard - Day 4 - Getting to Know the Tools

Binder

The aim of today's session will be to introduce methods to make sure that you're starting with quality data. As all data science methods are garbage in/garbage out you need to make sure you can explore new datasets quickly to assess whether your approach is viable. We will work towards building a basic exploratory data analysis framework with a checklist of things you should be looking out for.

You should aim to get familiar with pandas interface for manipulating (munging) tabular data, learn how to create and interpret basic summary statistics, how to identify appropriate QA/QC, and have a basic understanding of 'tidy data' and data formats.

Pre-session Reading & Resources

This week we're going to be looking at exploratory data with new datasets. You'll find that this process takes around 60-90% of any data science project so it's worth (a) getting good at it and (b) looking at ways to make this process easier. One approach is to put data in a tidy form as soon as possible.

We're also going to be using some more advanced methods that pandas offers - you should aim to get as familiar with these as you can as it really is the swiss-army knife of data munging in Python. If you've used R before there will be a number of things that feel very familiar. There are a number of good technical resources online for getting your head around pandas which you might like to stack away for reference: