Skip to content

jpwhalley/Data_science_in_Python_2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science In Python

Overview

Collecting, making sense of and classifying data has never been more important. Using Python, the students will learn how to:

  • Collect dispersed, but publicly available, data through tools like selenium for web scraping, and application programming interfaces (APIs).
  • Analyse data using dataframes and arrays, with special attention in how to optimise code using compilers like Numba for big data problems.
  • Make sense of, and do quality control, on high dimensional data sets, through dimensionality reduction methods like Principal Component Analysis (PCA) and multi-dimensional scaling (MDS).
  • Use the huge advances made recently, in Artificial Intelligence (AI) software like Keras, in utilizing neural networks to classify data.

By the end of the module, students will be able to:

  • Refer to and adapt from a code base of taught examples of data collection, analysis, dimensionality reduction and classification.
  • Build their own code base around a personally selected problem touching on the areas taught.
  • Have a good understanding of the ethics and common problems encountered in data collection, processing and classification.

Recommended tutorial and reading

Teaching

  • Day 1: Python, version control and collecting publicly available data.
  • Day 2: Data science in Python and Machine Learning
  • Day 3: Large, multidimensional datasets
  • Day 4: Classification of data using neural networks

Prerequisites

Please install the following suggested software:

About

For the Data Science in Python module 2022

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published