Week 1

Welcome to the course! This week we talk about -

Basic Python
Data Visualisation using Matplotlib
Data Distribution
Numpy

To make sure that you understand things well, we have given a brief description of the topics followed by links ranging from beginner's to advanced material. It is okay if you don't understand everything in the first go. The topics covered lay the foundations of Machine Learning, so take your time to understand things well. Also, it is not so important to get into everything rigorously, you can do that as per your needs later in the course, but do read the things and get an overall notion.

Basic Python Tutorials

Before beginning with the course, you should have very little but some experience of coding in Python. This should include the different data types and data structures in python, basic syntax for different types of loops, defining functions, reading from and writing to files, etc. For basic Python Tutorials refer to the links below -

Playlist #1.
Playlist #2.

For the purpose of this course you need to watch till Tutorial #15 (for the first link) or till Tutorial #24 (for the second link), but we strongly recommend that you go through all the tutorials, as it will help you figure things out much faster later on when you try out new stuff.

Once you are clear with elementary Python, you should have a basic idea about what are Jupyter Notebooks and how to run code in them. For an introduction to Jupyter Notebooks, refer here

That's it! You are now ready to get started.

Data Visualization using Matplotlib

Before creating analytical models, a data scientist must develop an understanding of the properties and relationships in a dataset. There are two goals for data exploration and visualization -

To understand the relationships between the data columns.
To identify features that may be useful for predicting labels in machine learning projects. Additionally, redundant, collinear features can be identified.

Thus, visualization for data exploration is an essential data science skill. Here, we’ll learn to analyze data via various types of plots offered by matplotlib and seaborn library.

Useful Resources

Light Introduction
Beginner's Guide
Data Visualisation

Don't worry if you don't understand everything now. It will become more clear once you start implementing them in the assignments.

Data Distribution

The data that we have for our model can come from a variety of distributions. Having an understanding of the data distribution helps in making an informed decision about the model that we can use. Let us briefly talk about some data distributions -

Bernoulli Distribution - It has only two outcomes.
Uniform Distribution - The probability of occurrence of all outcomes is the same.
Normal Distribution - The probability distribution is given by some expression that forms a bell - shaped curve.

Go through this article to go deeper into the various data distributions that are common in Machine Learning.

Numpy

Why Numpy?

NumPy (Numerical Python) is a linear algebra library in Python. It is a very important library on which almost every data science or machine learning Python package such as SciPy (Scientific Python), Mat−plotlib (plotting library), Scikit-learn, etc depends on to a great extent.

What is it used for ?

NumPy is very useful for performing mathematical and logical operations on Arrays. It provides an abundance of useful features for operations on n-arrays and matrices in Python.

One of the main advantages of Numpy is that vectorisation using numpy arrays makes it super time efiicient. It enables parallel computation that makes it so fast and hence extremely useful.

Useful Resources

Quickstart
Numpy Basics

Useful Links

Numpy and Pandas
Ipython Notebooks and Scikit-Learn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

week1_README.md

week1_README.md

Week 1

Basic Python Tutorials

Data Visualization using Matplotlib

Useful Resources

Data Distribution

Numpy

Why Numpy?

What is it used for ?

Useful Resources

Useful Links

Files

week1_README.md

Latest commit

History

week1_README.md

File metadata and controls

Week 1

Basic Python Tutorials

Data Visualization using Matplotlib

Useful Resources

Data Distribution

Numpy

Why Numpy?

What is it used for ?

Useful Resources

Useful Links