Skip to content

Commit

Permalink
changes in ML track (#43)
Browse files Browse the repository at this point in the history
* changes in ML track

* changes

* Update soa/tracks/ml/8.md

Co-authored-by: Arjoonn <[email protected]>
  • Loading branch information
kabirnagpal and theSage21 authored Jul 19, 2020
1 parent 73a3e2c commit fb16ea2
Show file tree
Hide file tree
Showing 8 changed files with 161 additions and 107 deletions.
40 changes: 22 additions & 18 deletions soa/tracks/ml/1.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,25 @@
<a href='https://t.me/ml_code_for_100_days'><button>Discuss on telegram</button></a>
## ML Track
Welcome to the ML track. We hope you're really excited for this.
For starters we'll brush up your Python Skills. This includes your understanding of
- [Numpy](https://numpy.org/)
- [Pandas](https://pandas.pydata.org/)
- [Matplotlib](https://matplotlib.org/)
# ML Track
Welcome to the ML track. We hope you're really excited for this.
For starters we'll brush up your Python Skills. This includes your understanding of the following:

Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%201.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).
- [Numpy](https://numpy.org/)
- [Pandas](https://pandas.pydata.org/)
- [Matplotlib](https://matplotlib.org/)

How to get mean of each column in a Data Frame named `df`?
Please write the full command. ( answer is case sensitive )
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s == 'df.mean()'
</code>
</form>
**Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%201.ipynb) to view the Jupyter-Notebook.**
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).


We hope you've gone through the code and other resources provided along. Let's wind it up with a quick question.
How to get mean of each column in a Data Frame named `df`?
Please write the full command. ( answer is case sensitive )

<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s == 'df.mean()'
</code>
</form>
34 changes: 19 additions & 15 deletions soa/tracks/ml/2.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,28 @@
<a href='https://t.me/ml_code_for_100_days'><button>Discuss on telegram</button></a>
## ML Track - Week 2
We hope you're really excited to get started with actual Machine Learning. But just hold on!!
A big problem in machine learning algorithms is that, they're not humans. They are just bunch of formulas being applied in a loop of conditional statements.
So it cannot handle certain types of data like Strings. Also it will not be able to handle missing values.
These concept were discussed in last week tracks, and now is the time to learn in depth.
This week we'll learn about:
# ML Track - Week 2

- One Hot encoding
- Label Encoding
- Normalization
- Dealing with Missing values
- Introduction to Machine learning
- Types of Learning (Supervised, Unsupervised and Reinforcement)
- Application of Machine Learning
We hope you're really excited to get started with actual Machine Learning. But just hold on!!
A big problem in machine learning algorithms is that, they're not humans. They are just bunch of formulas being applied in a loop of conditional statements.
So it cannot handle certain types of data like strings. Also it will not be able to handle missing values.
These concepts were discussed in last week's tracks, and now is the time to learn in depth.

Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%202.ipynb) to view the Jupyter-Notebook.
This week we'll learn about:

1. One Hot encoding
2. Label Encoding
3. Normalization
4. Dealing with Missing values
5. Introduction to Machine learning
6. Types of Learning (Supervised, Unsupervised and Reinforcement)
7. Application of Machine Learning

**Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%202.ipynb) to view the Jupyter-Notebook.**
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).


Write the command to One Hot encode Column named 'company' using pandas function on data frame `df`.
We hope you've gone through the code and other resources provided along. Let's wind it up with a quick question.
Write the command to One Hot Encode Column named 'company' using pandas function on data frame `df`.

<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
Expand Down
47 changes: 26 additions & 21 deletions soa/tracks/ml/3.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,30 @@
<a href='https://t.me/ml_code_for_100_days'><button>Discuss on telegram</button></a>
## ML Track - Week 3
Congratulations for making upto here!
As now we've completed the preprocessing methods, we can start with Machine Learing Algorithms.
We'll start with **Regression**.
Regression analysis is a supervised method, used to predict **Continous**, **Independent** variable using dependent variables.
This week will require you to have prior knowledge in linear, quadratic and polynomial equations.
This week we'll learn about:
# ML Track - Week 3

- Linear Regression
- Multiple Linear Regression
- Polynomial Regression
Congratulations for making upto here!
As now we've completed the preprocessing methods, we can start with Machine Learing Algorithms.
We'll start with **Regression**.
Regression analysis is a supervised method, used to predict **Continous**, **Independent** variable using dependent variables.
This week will require you to have prior knowledge in linear, quadratic and polynomial equations.

Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%203.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)
This week we'll learn about:

`mean_squared_error` is a method from which class in Sklearn?
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s.lower() == 'metrics'
</code>
</form>
1. Linear Regression
2. Multiple Linear Regression
3. Polynomial Regression

**Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%203.ipynb) to view the Jupyter-Notebook.**
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).


We hope you've gone through the code and other resources provided along. Let's wind it up with a quick question.
`mean_squared_error` is a method from which module in Sklearn?

<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s.lower() == 'metrics'
</code>
</form>
27 changes: 15 additions & 12 deletions soa/tracks/ml/4.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,31 @@
<a href='https://t.me/ml_code_for_100_days'><button>Discuss on telegram</button></a>
## ML Track - Week 4
This week we are going to learn a type of widely used supervised machine learning algorithm - **Classification**.
# ML Track - Week 4
This week we are going to learn a type of widely used supervised machine learning algorithm - **Classification**.
Classification is the process of predicting the class of given data points.

Classification is the process of predicting the class of given data points.
For example, spam detection in email service providers can be identified as a classification problem. This is a binary classification since there are only 2 classes : spam and not spam.
A classifier utilizes some training data to understand how given input variables are related to the class.

For example, spam detection in email service providers can be identified as a classification problem. This is a binary classification since there are only 2 classes : spam and not spam. A classifier utilizes some training data to understand how given input variables relate to the class.
In this week, we will cover the following classifier algorithnms:

In this week, we will cover the following classifier algorithnms:
1. Logistic Regression
2. K-Nearest Neighbours
3. Decision Tree Classifier
4. Random Forest Classifier
5. Voting Classifier

- Support Vector Classifier (SVC)
- Decision Tree Classifier
- Random Forest Classifier
- Voting Classifier

Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%202.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)
**Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%204.ipynb) to view the Jupyter-Notebook.**
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).


We hope you've gone through the code and other resources provided along. Let's wind it up with a quick question.
What is the number of estimators used for Random Forest Classifier?

<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>

def answer(s):
return s=='200'
</code>
Expand Down
30 changes: 15 additions & 15 deletions soa/tracks/ml/5.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,33 @@
<a href='https://t.me/ml_code_for_100_days'><button>Discuss on telegram</button></a>
## ML Track - Week 5
Congratulations, You have come mid-way!
Now, let's learn how good or bad our model is performing and why?
# ML Track - Week 5
Congratulations! You have come mid-way!
Now, let's learn how good or bad our model is performing and why?

Topics covered in this week:
- Underfitting
- Overfitting
- Bias Variance Trade-off
- Regularization
- Support Vector Machine
Topics covered in this week:

Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%206.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)
1. Underfitting
2. Overfitting
3. Bias Variance Trade-off
4. Regularization
5. Support Vector Machine

I hope this that week would have proven useful to you and let's wind it up with a quick question .
**Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%205.ipynb) to view the Jupyter-Notebook.**
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).

In terms of the bias-variance trade-off, which of the following is/are substantially more harmful to the test error than the training error? Type the correct option number to answer.

We hope you've gone through the code and other resources provided along. Let's wind it up with a quick question.
In terms of the bias-variance trade-off, which of the following is/are substantially more harmful to the test error than the training error? (Input the correct option)

1. Bias
2. Loss
3. Variance
4. Risk


<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s.lower() == '3'
return s == '3'
</code>
</form>
28 changes: 14 additions & 14 deletions soa/tracks/ml/6.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,26 @@
<a href='https://t.me/ml_code_for_100_days'><button>Discuss on telegram</button></a>
## ML Track - Week 6
Congratulations,
You have come a long way! Till now we have been working on supervised machine learning , so now gear up for the first chapter of unsupervised machine learning - Clustering .
# ML Track - Week 6
Congratulations!
You have come a long way! Till now we have been working on supervised machine learning , so, now gear up for the first chapter of unsupervised machine learning - Clustering .

Clustering is basically the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups.

So in this week we are going to dive deep into the clustering and cover the following topics:
So in this week we are going to dive deep into the clustering and cover the following topics:

- what is clustering
- Difference between clustering and clasification
- K-means clustering
-- Silhouette Score
- Hierarchical clustering
1. What is clustering?
2. Difference between clustering and clasification.
3. K-means clustering
4. Silhouette Score
5. Hierarchical clustering


Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%206.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)
**Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%206.ipynb) to view the Jupyter-Notebook.**
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).

I hope this that week would have proven useful to you and let's wind it up with a quick question .

We hope you've gone through the code and other resources provided along. Let's wind it up with a quick question.
What is the name of the linkage that we have used in Agglomerative Clustering?

<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
Expand All @@ -27,5 +29,3 @@ What is the name of the linkage that we have used in Agglomerative Clustering?
return s.lower() == 'ward'
</code>
</form>


27 changes: 15 additions & 12 deletions soa/tracks/ml/7.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,30 @@
<a href='https://t.me/ml_code_for_100_days'><button>Discuss on telegram</button></a>
## ML Track-Week 7
# ML Track-Week 7

Congratulations for making it upto here !
Congratulations for making it upto here!

This week will introduce you to Dimensionality Reduction Techniques and Model Selection strategies like K cross fold validation, Grid Search and Stacking.
This week will introduce you to Dimensionality Reduction Techniques and Model Selection strategies like K cross fold validation, Grid Search and Stacking.

Dimensionality Reduction means reducing the number of features(columns) in a given dataset.Imagine working with a dataset with nearly 20000 features. Having
so many features makes it problematic to draw insights from the data. It’s not feasible to analyze each and every variable at a microscopic level. Hence, we use Dimensionality Reduction techniques.
Dimensionality Reduction means reducing the number of features(columns) in a given dataset.Imagine working with a dataset with nearly 20000 features.
Having so many features makes it problematic to draw insights from the data. It's not feasible to analyze each and every variable at a microscopic level.
Hence, we use Dimensionality Reduction techniques.

Model selection,on the hand, is the process of selecting one final machine learning model from among a collection of candidate machine learning models
for a training dataset.
Model selection,on the hand, is the process of selecting one final machine learning model from among a collection of candidate machine learning models for a training dataset.

Let's start then ,shall we ?
Let's start then, shall we?

Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%207.ipynb).If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)
**Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%207.ipynb) to view the Jupyter-Notebook.**
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).


We hope you've gone through the code and other resources provided along. Let's wind it up with a quick question.
Kernel PCA cannot be used for non linear data. (True / False)

Question to be answered after you complete your notebook.
Kernel PCA cannot be used for non linear data. (true / false)
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s.lower() == 'false'
return s.lower() == 'False'
</code>
</form>
35 changes: 35 additions & 0 deletions soa/tracks/ml/8.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
<a href='https://t.me/ml_code_for_100_days'><button>Discuss on telegram</button></a>
# ML Track-Week 8

Now, let us boost our learning.
This week will introduce you to algorithms that can boost the accuracy of your model.
Boosting considers many models, placed sequentially where each model tries to minimize the error obtained from the previous model.

We'll learn About

1. Gradient Boosting Algorithm
2. Extreme Boosting Algorithm
3. Ada Boost Algorithm

**Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%208.ipynb) to view the Jupyter-Notebook.**
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).


We hope you've gone through the code and other resources provided along. Let's wind it up with a quick question.
Which of the following algorithm are not an example of ensemble learning algorithm?

1. Random Forest
2. Adaboost
3. Extra Trees
4. Gradient Boosting
5. Decision Trees

<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s =='5':

</code>
</form>

0 comments on commit fb16ea2

Please sign in to comment.