Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct spelling error (#40 linar -> linear) #10

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 17 additions & 17 deletions chapter_02/nb_ch02_01.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
"source": [
"## Banknote classification with fcNN without hidden layer compared to fcNN with hidden layer\n",
"\n",
"**Goal:** In this notebook you will do your first classification. You will see that fully connected networks without a hidden layer can only learn linar decision boundaries, while fully connected networks with hidden layers are able to learn non-linear decision boundaries.\n",
"**Goal:** In this notebook you will do your first classification. You will see that fully-connected networks without a hidden layer can only learn linear decision boundaries, while fully-connected networks with hidden layers are able to learn non-linear decision boundaries.\n",
"\n",
"**Usage:** The idea of the notebook is that you try to understand the provided code. Run it, check the output, and play with it by slightly changing the code. \n",
"\n",
Expand All @@ -48,7 +48,7 @@
">4. entropy (continuous feature) \n",
">5. class (binary indicating if the banknote is real or fake) \n",
"\n",
"Don't bother too much how these features exactely came from.\n",
"Don't bother too much exactly where these features came from.\n",
"\n",
"For this analysis we only use 2 features. \n",
"\n",
Expand Down Expand Up @@ -268,7 +268,7 @@
"id": "HlDPWop1_zGM"
},
"source": [
"Let's extract the two featues *x1: skewness of wavelet transformed image* and *x2: entropy of wavelet transformed image*. We print the shape and see that we for X we have 1372 oberservations with two featues and for Y there are 1372 binary labels."
"Let's extract the two featues *x1: skewness of wavelet transformed image* and *x2: entropy of wavelet transformed image*. We print the shape and see that for X there are 1372 oberservations with two featues and for Y there are 1372 binary labels."
]
},
{
Expand Down Expand Up @@ -308,7 +308,7 @@
"id": "W2upXCjUBweV"
},
"source": [
"Since the banknotes are described by only 2 features, we can easily visualize the positions of real and fake banknotes in the 2D feature space. You can see that the boundary between the two classes is not separable by a straight line. A curved boundary line will do better. But even then we cannot expect a perfect seperation.\n"
"Since the banknotes are described by only 2 features, we can easily visualize the positions of real and fake banknotes in the 2D feature space. You can see that the boundary between the two classes is not separable by a straight line. A curved boundary line will do better. But even then we cannot expect a perfect separation.\n"
]
},
{
Expand Down Expand Up @@ -374,7 +374,7 @@
},
"source": [
"### fcNN with only one neuron\n",
"Let’s try to use a single neuron with a sigmoid activation function (also known as logistic regression) as classification model to seperate the banknotes. \n",
"Let’s try to use a single neuron with a sigmoid activation function (also known as logistic regression) as a classification model to seperate the banknotes. \n",
"We use the sequential API from keras to build the model. To fit the 3 parameters we use the stochastic gradient descent optimizer with a learning rate of 0.15."
]
},
Expand Down Expand Up @@ -454,7 +454,7 @@
"id": "AI_YeSyBEHTc"
},
"source": [
"In the next cell, we train the network. In other words, we tune the parameters that were initialized randomly with stochastic gradient descent to minimize our loss function (the binary corssentropy). We set the batchsize to 128 per updatestep and train for 400 epochs."
"In the next cell, we train the network. In other words, we tune the parameters that were initialized randomly with stochastic gradient descent to minimize our loss function (the binary corssentropy). We set the batchsize to 128 per update step and train for 400 epochs."
]
},
{
Expand All @@ -481,7 +481,7 @@
"id": "zqE0FrJ9FPM2"
},
"source": [
"Let's look at the so called leraning curve, we plot the accuracy and the loss vs the epochs. You can see that after 100 epochs, we predict around 70% of our data correct and have a loss aorund 0.51 (this values can vary from run to run)."
"Let's look at the so-called learning curve, we plot the accuracy and the loss vs the epochs. You can see that after 100 epochs, we predict around 70% of our data correct and have a loss aorund 0.51 (this values can vary from run to run)."
]
},
{
Expand Down Expand Up @@ -547,7 +547,7 @@
"source": [
"### Plotting the learned decision boundary\n",
"Let's visualize which decision boundary was learned by the fcNN with only one output neuron (and no hidden layer). \n",
"As you can see the decision boundary is a straight line. This is not a coincidence but a general property of a single artificial neuron with a sigmoid as activation function and no hidden layer, also known as logistic regression.\n"
"As you can see the decision boundary is a straight line. This is not a coincidence but a general property of a single artificial neuron with a sigmoid as an activation function and no hidden layer, also known as logistic regression.\n"
]
},
{
Expand Down Expand Up @@ -647,7 +647,7 @@
"source": [
"### fcNN with one hidden layer \n",
"\n",
"We know that the boundary between the two classes is not descriped very good by a line. Therefore a single neuron is not appropriate to model the probability for a fake banknote based on its two features. To get a more flexible model, we introduce an additional layer between input layer and output layer. This is called hidden layer. Here we use a hidden layer with 8 neurons. We also change the ouputnodes form 1 to 2, to get two ouputs for the probability of real and fake banknote. Because we now have 2 outputs, we use the *softmax* activation function in the output layer. The softmax activation ensures that the output can be interpreted as a probability (see book for details)"
"We know that the boundary between the two classes is not described very well by a line. Therefore a single neuron is not appropriate to model the probability for a fake banknote based on its two features. To get a more flexible model, we introduce an additional layer between input layer and output layer. This is called a hidden layer. Here we use a hidden layer with 8 neurons. We also change the ouputnodes from 1 to 2, to get two ouputs for the probability of real and fake banknote. Because we now have 2 outputs, we use the *softmax* activation function in the output layer. The softmax activation ensures that the output can be interpreted as a probability (see book for details)."
]
},
{
Expand Down Expand Up @@ -687,9 +687,9 @@
"id": "aY3XX2qHOQQz"
},
"source": [
"In this is output summary we see that we now have a lot more trainable paramters then before. \n",
"24 = inputdim · outpuntdim + outputbias= 2 · 8 + 8 \n",
"18 = inputdim · outpuntdim + outputbias= 8 · 2 + 2 "
"In this output summary we see that we now have a lot more trainable paramters than before. \n",
"24 = inputdim · outputdim + outputbias= 2 · 8 + 8 \n",
"18 = inputdim · outputdim + outputbias= 8 · 2 + 2 "
]
},
{
Expand Down Expand Up @@ -774,7 +774,7 @@
"id": "Soz6r8cGRAFh"
},
"source": [
"In the next cell, train the network. In other words, we tune the parameters that were initialized randomly with stochastic gradient descent to minimize our loss function (the categorical crossentropy). We set the batchsize to 128 per updatestep and train for 400 epochs."
"In the next cell, we train the network. In other words, we tune the parameters that were initialized randomly with stochastic gradient descent to minimize our loss function (the categorical crossentropy). We set the batchsize to 128 per update step and train for 400 epochs."
]
},
{
Expand All @@ -801,7 +801,7 @@
"id": "8-w1vdq-R0Wv"
},
"source": [
"Let's look again at the leraning curve, we plot the accuracy and the loss vs the epochs. You can see that after 100 epochs, we predict around 86% of our data correct and have a loss aorund 0.29 (this values can vary from run to run). This is already alot better than the model without a hidden layer."
"Let's look again at the learning curve, we plot the accuracy and the loss vs. the epochs. You can see that after 100 epochs, we predict around 86% of our data correct and have a loss aorund 0.29 (these values can vary from run to run). This is already a lot better than the model without a hidden layer."
]
},
{
Expand Down Expand Up @@ -866,8 +866,8 @@
},
"source": [
"### Plotting the learned decision boundary\n",
"Let's visualize which decision boundary was learned by the fcNN with the hidden layer\n",
"As you can see the decision boundary is a now curved and not straight anymore. The model (with the hidden layer in the middle) separates the the two classes in the training data better and is able to learn non-linear decision boundaries. \n",
"Let's visualize which decision boundary was learned by the fcNN with the hidden layer. \n",
"As you can see the decision boundary is now curved and not straight anymore. The model (with the hidden layer in the middle) separates the two classes in the training data better and is able to learn non-linear decision boundaries. \n",
"\n"
]
},
Expand Down Expand Up @@ -923,4 +923,4 @@
]
}
]
}
}