tensorchiefs · q9scotty · Feb 10, 2022 · Feb 10, 2022
diff --git a/chapter_02/nb_ch02_01.ipynb b/chapter_02/nb_ch02_01.ipynb
@@ -37,7 +37,7 @@
       "source": [
         "## Banknote classification with fcNN without hidden layer compared to fcNN with hidden layer\n",
         "\n",
-        "**Goal:** In this notebook you will do your first classification. You will see that fully connected networks without a hidden layer can only learn linar decision boundaries, while fully connected networks with hidden layers are able to learn non-linear decision boundaries.\n",
+        "**Goal:** In this notebook you will do your first classification. You will see that fully-connected networks without a hidden layer can only learn linear decision boundaries, while fully-connected networks with hidden layers are able to learn non-linear decision boundaries.\n",
         "\n",
         "**Usage:** The idea of the notebook is that you try to understand the provided code. Run it, check the output, and play with it by slightly changing the code. \n",
         "\n",
@@ -48,7 +48,7 @@
         ">4. entropy (continuous feature) \n",
         ">5. class (binary indicating if the banknote is real or fake)  \n",
         "\n",
-        "Don't bother too much how these features exactely came from.\n",
+        "Don't bother too much exactly where these features came from.\n",
         "\n",
         "For this analysis we only use 2 features. \n",
         "\n",
@@ -268,7 +268,7 @@
         "id": "HlDPWop1_zGM"
       },
       "source": [
-        "Let's extract the two featues *x1: skewness of wavelet transformed image* and *x2: entropy of wavelet transformed image*. We print the shape and see that we for X  we have 1372 oberservations with two featues and for Y there are 1372 binary labels."
+        "Let's extract the two featues *x1: skewness of wavelet transformed image* and *x2: entropy of wavelet transformed image*. We print the shape and see that for X there are 1372 oberservations with two featues and for Y there are 1372 binary labels."
       ]
     },
     {
@@ -308,7 +308,7 @@
         "id": "W2upXCjUBweV"
       },
       "source": [
-        "Since the banknotes are described by only 2 features, we can easily visualize the positions of real and fake banknotes in the 2D feature space. You can see that the boundary between the two classes is not separable by a straight line. A curved boundary line will do better. But even then we cannot expect a perfect seperation.\n"
+        "Since the banknotes are described by only 2 features, we can easily visualize the positions of real and fake banknotes in the 2D feature space. You can see that the boundary between the two classes is not separable by a straight line. A curved boundary line will do better. But even then we cannot expect a perfect separation.\n"
       ]
     },
     {
@@ -374,7 +374,7 @@
       },
       "source": [
         "### fcNN with only one neuron\n",
-        "Let’s try to use a single neuron with a sigmoid activation function (also known as logistic regression) as classification model to seperate the banknotes.  \n",
+        "Let’s try to use a single neuron with a sigmoid activation function (also known as logistic regression) as a classification model to seperate the banknotes.  \n",
         "We use the sequential API from keras to build the model. To fit the 3 parameters we use the stochastic gradient descent optimizer with a learning rate of 0.15."
       ]
     },
@@ -454,7 +454,7 @@
         "id": "AI_YeSyBEHTc"
       },
       "source": [
-        "In the next cell, we train the network. In other words, we tune the parameters that were initialized randomly with stochastic gradient descent to minimize our loss function (the binary corssentropy). We set the batchsize to 128 per updatestep and train for 400 epochs."
+        "In the next cell, we train the network. In other words, we tune the parameters that were initialized randomly with stochastic gradient descent to minimize our loss function (the binary corssentropy). We set the batchsize to 128 per update step and train for 400 epochs."
       ]
     },
     {
@@ -481,7 +481,7 @@
         "id": "zqE0FrJ9FPM2"
       },
       "source": [
-        "Let's look at the so called leraning curve, we plot the accuracy and the loss vs the epochs. You can see that after 100 epochs, we predict around 70% of our data correct and have a loss aorund 0.51 (this values can vary from run to run)."
+        "Let's look at the so-called learning curve, we plot the accuracy and the loss vs the epochs. You can see that after 100 epochs, we predict around 70% of our data correct and have a loss aorund 0.51 (this values can vary from run to run)."
       ]
     },
     {
@@ -547,7 +547,7 @@
       "source": [
         "### Plotting the learned decision boundary\n",
         "Let's visualize which decision boundary was learned by the fcNN with only one output neuron (and no hidden layer).  \n",
-        "As you can see the decision boundary is a straight line. This is not a coincidence but a general property of a single artificial neuron with a sigmoid as activation function and no hidden layer, also known as logistic regression.\n"
+        "As you can see the decision boundary is a straight line. This is not a coincidence but a general property of a single artificial neuron with a sigmoid as an activation function and no hidden layer, also known as logistic regression.\n"
       ]
     },
     {
@@ -647,7 +647,7 @@
       "source": [
         "### fcNN with one hidden layer \n",
         "\n",
-        "We know that the boundary between the two classes is not descriped very good by a line. Therefore a single neuron is not appropriate to model the probability for a fake banknote based on its two features. To get a more flexible model, we introduce an additional layer between input layer and output layer. This is called hidden layer. Here we use a hidden layer with 8 neurons. We also change the ouputnodes form 1 to 2, to get two ouputs for the probability of real and fake banknote. Because we now have 2 outputs, we use the *softmax* activation function in the output layer. The softmax activation ensures that the output can be interpreted as a probability (see book for details)"
+        "We know that the boundary between the two classes is not described very well by a line. Therefore a single neuron is not appropriate to model the probability for a fake banknote based on its two features. To get a more flexible model, we introduce an additional layer between input layer and output layer. This is called a hidden layer. Here we use a hidden layer with 8 neurons. We also change the ouputnodes from 1 to 2, to get two ouputs for the probability of real and fake banknote. Because we now have 2 outputs, we use the *softmax* activation function in the output layer. The softmax activation ensures that the output can be interpreted as a probability (see book for details)."
       ]
     },
     {
@@ -687,9 +687,9 @@
         "id": "aY3XX2qHOQQz"
       },
       "source": [
-        "In this is output summary we see that we now have a lot more trainable paramters then before.  \n",
-        "24 = inputdim · outpuntdim + outputbias= 2 · 8 + 8   \n",
-        "18 = inputdim · outpuntdim + outputbias= 8 · 2 + 2   "
+        "In this output summary we see that we now have a lot more trainable paramters than before.  \n",
+        "24 = inputdim · outputdim + outputbias= 2 · 8 + 8   \n",
+        "18 = inputdim · outputdim + outputbias= 8 · 2 + 2   "
       ]
     },
     {
@@ -774,7 +774,7 @@
         "id": "Soz6r8cGRAFh"
       },
       "source": [
-        "In the next cell, train the network. In other words, we tune the parameters that were initialized randomly with stochastic gradient descent to minimize our loss function (the categorical crossentropy). We set the batchsize to 128 per updatestep and train for 400 epochs."
+        "In the next cell, we train the network. In other words, we tune the parameters that were initialized randomly with stochastic gradient descent to minimize our loss function (the categorical crossentropy). We set the batchsize to 128 per update step and train for 400 epochs."
       ]
     },
     {
@@ -801,7 +801,7 @@
         "id": "8-w1vdq-R0Wv"
       },
       "source": [
-        "Let's look again at the leraning curve, we plot the accuracy and the loss vs the epochs. You can see that after 100 epochs, we predict around 86% of our data correct and have a loss aorund 0.29 (this values can vary from run to run). This is already alot better than the model without a hidden layer."
+        "Let's look again at the learning curve, we plot the accuracy and the loss vs. the epochs. You can see that after 100 epochs, we predict around 86% of our data correct and have a loss aorund 0.29 (these values can vary from run to run). This is already a lot better than the model without a hidden layer."
       ]
     },
     {
@@ -866,8 +866,8 @@
       },
       "source": [
         "### Plotting the learned decision boundary\n",
-        "Let's visualize which decision boundary was learned by the fcNN with the hidden layer\n",
-        "As you can see the decision boundary is a now curved and not straight anymore. The model (with the hidden layer in the middle) separates the the two classes in the training data better and is able to learn non-linear decision boundaries. \n",
+        "Let's visualize which decision boundary was learned by the fcNN with the hidden layer. \n",
+        "As you can see the decision boundary is now curved and not straight anymore. The model (with the hidden layer in the middle) separates the two classes in the training data better and is able to learn non-linear decision boundaries. \n",
         "\n"
       ]
     },
@@ -923,4 +923,4 @@
       ]
     }
   ]
-}
+}