Fixed some typos

atlasmooth · Nov 5, 2022 · 76e392b · 76e392b
1 parent 35a5922
commit 76e392b
Show file tree

Hide file tree

Showing 10 changed files with 33 additions and 36 deletions.
diff --git a/ch02/ch02.ipynb b/ch02/ch02.ipynb
@@ -1332,7 +1332,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.6"
+   "version": "3.8.12"
   }
  },
  "nbformat": 4,

diff --git a/ch03/README.md b/ch03/README.md
@@ -9,7 +9,7 @@
   - Learning the weights of the logistic cost function
   - Converting an Adaline implementation into an algorithm for logistic regression
   - Training a logistic regression model with scikit-learn
-  - Tackling over tting via regularization
+  - Tackling overfitting via regularization
 - Maximum margin classification with support vector machines
   - Maximum margin intuition
   - Dealing with a nonlinearly separable case using slack variables

diff --git a/ch03/ch03.ipynb b/ch03/ch03.ipynb
@@ -1789,7 +1789,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.6"
+   "version": "3.8.12"
   },
   "toc": {
    "nav_menu": {},

diff --git a/ch04/ch04.ipynb b/ch04/ch04.ipynb
@@ -1854,7 +1854,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Partitioning a dataset into a seperate training and test set"
+    "# Partitioning a dataset into a separate training and test set"
    ]
   },
   {
@@ -2837,7 +2837,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.6"
+   "version": "3.8.12"
   },
   "toc": {
    "nav_menu": {},

diff --git a/ch05/ch05.ipynb b/ch05/ch05.ipynb
@@ -98,7 +98,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "- [Unsupervised dimensionality reduction via principal component analysis 128](#Unsupervised-dimensionality-reduction-via-principal-component-analysis-128)\n",
+    "- [Unsupervised dimensionality reduction via principal component analysis](#Unsupervised-dimensionality-reduction-via-principal-component-analysis)\n",
     "  - [The main steps behind principal component analysis](#The-main-steps-behind-principal-component-analysis)\n",
     "  - [Extracting the principal components step-by-step](#Extracting-the-principal-components-step-by-step)\n",
     "  - [Total and explained variance](#Total-and-explained-variance)\n",
@@ -421,9 +421,9 @@
     "\n",
     "**Note**\n",
     "\n",
-    "Accidentally, I wrote `X_test_std = sc.fit_transform(X_test)` instead of `X_test_std = sc.transform(X_test)`. In this case, it wouldn't make a big difference since the mean and standard deviation of the test set should be (quite) similar to the training set. However, as remember from Chapter 3, the correct way is to re-use parameters from the training set if we are doing any kind of transformation -- the test set should basically stand for \"new, unseen\" data.\n",
+    "Accidentally, I wrote `X_test_std = sc.fit_transform(X_test)` instead of `X_test_std = sc.transform(X_test)`. In this case, it wouldn't make a big difference since the mean and standard deviation of the test set should be (quite) similar to the training set. However, as you remember from Chapter 3, the correct way is to re-use parameters from the training set if we are doing any kind of transformation -- the test set should basically stand for \"new, unseen\" data.\n",
     "\n",
-    "My initial typo reflects a common mistake is that some people are *not* re-using these parameters from the model training/building and standardize the new data \"from scratch.\" Here's simple example to explain why this is a problem.\n",
+    "My initial typo reflects a common mistake which is that some people are *not* re-using these parameters from the model training/building and standardize the new data \"from scratch.\" Here is a simple example to explain why this is a problem.\n",
     "\n",
     "Let's assume we have a simple training set consisting of 3 examples with 1 feature (let's call this feature \"length\"):\n",
     "\n",
@@ -445,17 +445,17 @@
     "- new_5: 6 cm -> class ?\n",
     "- new_6: 7 cm -> class ?\n",
     "\n",
-    "If we look at the \"unstandardized \"length\" values in our training datast, it is intuitive to say that all of these examples are likely belonging to class_2. However, if we standardize these by re-computing standard deviation and and mean you would get similar values as before in the training set and your classifier would (probably incorrectly) classify examples 4 and 5 as class 2.\n",
+    "If we look at the \"unstandardized \"length\" values in our training datast, it is intuitive to say that all of these examples are likely belonging to class_2. However, if we standardize these by re-computing standard deviation and mean you would get similar values as before in the training set and your classifier would (probably incorrectly) classify examples 4 and 5 as class_2.\n",
     "\n",
-    "- new_std_4: -1.21 -> class 2\n",
-    "- new_std_5: 0 -> class 2\n",
-    "- new_std_6: 1.21 -> class 1\n",
+    "- new_std_4: -1.21 -> class_2\n",
+    "- new_std_5: 0 -> class_2\n",
+    "- new_std_6: 1.21 -> class_1\n",
     "\n",
     "However, if we use the parameters from your \"training set standardization,\" we'd get the values:\n",
     "\n",
-    "- example5: -18.37 -> class 2\n",
-    "- example6: -17.15 -> class 2\n",
-    "- example7: -15.92 -> class 2\n",
+    "- example5: -18.37 -> class_2\n",
+    "- example6: -17.15 -> class_2\n",
+    "- example7: -15.92 -> class_2\n",
     "\n",
     "The values 5 cm, 6 cm, and 7 cm are much lower than anything we have seen in the training set previously. Thus, it only makes sense that the standardized features of the \"new examples\" are much lower than every standardized feature in the training set.\n",
     "\n",
@@ -719,7 +719,7 @@
    "source": [
     "**NOTE**\n",
     "\n",
-    "The following four code cells has been added in addition to the content to the book, to illustrate how to replicate the results from our own PCA implementation in scikit-learn:"
+    "The following four code cells have been added in addition to the content to the book, to illustrate how to replicate the results from our own PCA implementation in scikit-learn:"
    ]
   },
   {
@@ -1821,7 +1821,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.6"
+   "version": "3.8.12"
   },
   "toc": {
    "nav_menu": {},

diff --git a/ch06/ch06.ipynb b/ch06/ch06.ipynb
@@ -1894,7 +1894,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.7"
+   "version": "3.8.12"
   },
   "toc": {
    "nav_menu": {},

diff --git a/ch07/ch07.ipynb b/ch07/ch07.ipynb
@@ -1815,7 +1815,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Using XGboost "
+    "## Using XGBoost "
    ]
   },
   {
@@ -1950,7 +1950,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.7"
+   "version": "3.8.12"
   },
   "toc": {
    "nav_menu": {},

diff --git a/ch08/ch08.ipynb b/ch08/ch08.ipynb
@@ -153,7 +153,7 @@
     "The IMDB movie review set can be downloaded from [http://ai.stanford.edu/~amaas/data/sentiment/](http://ai.stanford.edu/~amaas/data/sentiment/).\n",
     "After downloading the dataset, decompress the files.\n",
     "\n",
-    "A) If you are working with Linux or MacOS X, open a new terminal windowm `cd` into the download directory and execute \n",
+    "A) If you are working with Linux or MacOS X, open a new terminal window, `cd` into the download directory and execute \n",
     "\n",
     "`tar -zxf aclImdb_v1.tar.gz`\n",
     "\n",
@@ -522,14 +522,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As we can see from executing the preceding command, the vocabulary is stored in a Python dictionary, which maps the unique words that are mapped to integer indices. Next let us print the feature vectors that we just created:"
+    "As we can see from executing the preceding command, the vocabulary is stored in a Python dictionary, which maps the unique words to integer indices. Next let us print the feature vectors that we just created:"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Each index position in the feature vectors shown here corresponds to the integer values that are stored as dictionary items in the CountVectorizer vocabulary. For example, the  rst feature at index position 0 resembles the count of the word and, which only occurs in the last document, and the word is at index position 1 (the 2nd feature in the document vectors) occurs in all three sentences. Those values in the feature vectors are also called the raw term frequencies: *tf (t,d)*—the number of times a term t occurs in a document *d*."
+    "Each index position in the feature vectors shown here corresponds to the integer values that are stored as dictionary items in the CountVectorizer vocabulary. For example, the first feature at index position 0 resembles the count of the word \"and\", which only occurs in the last document, and the word \"is\" at index position 1 (the 2nd feature in the document vectors) occurs in all three sentences. Those values in the feature vectors are also called the raw term frequencies: *tf (t,d)*—the number of times a term t occurs in a document *d*."
    ]
   },
   {
@@ -578,7 +578,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "When we are analyzing text data, we often encounter words that occur across multiple documents from both classes. Those frequently occurring words typically don't contain useful or discriminatory information. In this subsection, we will learn about a useful technique called term frequency-inverse document frequency (tf-idf) that can be used to downweight those frequently occurring words in the feature vectors. The tf-idf can be de ned as the product of the term frequency and the inverse document frequency:\n",
+    "When we are analyzing text data, we often encounter words that occur across multiple documents from both classes. Those frequently occurring words typically don't contain useful or discriminatory information. In this subsection, we will learn about a useful technique called term frequency-inverse document frequency (tf-idf) that can be used to downweigh those frequently occurring words in the feature vectors. The tf-idf can be defined as the product of the term frequency and the inverse document frequency:\n",
     "\n",
     "$$\\text{tf-idf}(t,d)=\\text{tf (t,d)}\\times \\text{idf}(t,d)$$\n",
     "\n",
@@ -621,16 +621,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As we saw in the previous subsection, the word is had the largest term frequency in the 3rd document, being the most frequently occurring word. However, after transforming the same feature vector into tf-idfs, we see that the word is is\n",
-    "now associated with a relatively small tf-idf (0.45) in document 3 since it is\n",
-    "also contained in documents 1 and 2 and thus is unlikely to contain any useful, discriminatory information.\n"
+    "As we saw in the previous subsection, the word \"is\" had the largest term frequency in the 3rd document, being the most frequently occurring word. However, after transforming the same feature vector into tf-idfs, we see that the word \"is\" is now associated with a relatively small tf-idf (0.45) in document 3 since it is also contained in documents 1 and 2 and thus is unlikely to contain any useful, discriminatory information.\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "However, if we'd manually calculated the tf-idfs of the individual terms in our feature vectors, we'd have noticed that the `TfidfTransformer` calculates the tf-idfs slightly differently compared to the standard textbook equations that we de ned earlier. The equations for the idf and tf-idf that were implemented in scikit-learn are:"
+    "However, if we'd manually calculated the tf-idfs of the individual terms in our feature vectors, we'd have noticed that the `TfidfTransformer` calculates the tf-idfs slightly differently compared to the standard textbook equations that we defined earlier. The equations for the idf and tf-idf that were implemented in scikit-learn are:"
    ]
   },
   {
@@ -649,10 +647,9 @@
     "\n",
     "$$v_{\\text{norm}} = \\frac{v}{||v||_2} = \\frac{v}{\\sqrt{v_{1}^{2} + v_{2}^{2} + \\dots + v_{n}^{2}}} = \\frac{v}{\\big (\\sum_{i=1}^{n} v_{i}^{2}\\big)^\\frac{1}{2}}$$\n",
     "\n",
-    "To make sure that we understand how TfidfTransformer works, let us walk\n",
-    "through an example and calculate the tf-idf of the word is in the 3rd document.\n",
+    "To make sure that we understand how `TfidfTransformer` works, let us walk through an example and calculate the tf-idf of the word \"is\" in the 3rd document.\n",
     "\n",
-    "The word is has a term frequency of 3 (tf = 3) in document 3 ($d_3$), and the document frequency of this term is 3 since the term is occurs in all three documents (df = 3). Thus, we can calculate the idf as follows:\n",
+    "The word \"is\" has a term frequency of 3 (tf = 3) in document 3 ($d_3$), and the document frequency of this term is 3 since the term \"is\" occurs in all three documents (df = 3). Thus, we can calculate the idf as follows:\n",
     "\n",
     "$$\\text{idf}(\"is\", d_3) = log \\frac{1+3}{1+3} = 0$$\n",
     "\n",
@@ -686,7 +683,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "If we repeated these calculations for all terms in the 3rd document, we'd obtain the following tf-idf vectors: [3.39, 3.0, 3.39, 1.29, 1.29, 1.29, 2.0 , 1.69, 1.29]. However, we notice that the values in this feature vector are different from the values that we obtained from the TfidfTransformer that we used previously. The  nal step that we are missing in this tf-idf calculation is the L2-normalization, which can be applied as follows:"
+    "If we repeated these calculations for all terms in the 3rd document, we'd obtain the following tf-idf vectors: [3.39, 3.0, 3.39, 1.29, 1.29, 1.29, 2.0 , 1.69, 1.29]. However, we notice that the values in this feature vector are different from the values that we obtained from the `TfidfTransformer` that we used previously. The final step that we are missing in this tf-idf calculation is the L2-normalization, which can be applied as follows:"
    ]
   },
   {
@@ -1286,7 +1283,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As we can see, the result above is consistent with the average score computed the `cross_val_score`."
+    "As we can see, the result above is consistent with the average score computed with `cross_val_score`."
    ]
   },
   {
@@ -1841,7 +1838,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.7"
+   "version": "3.8.12"
   },
   "toc": {
    "nav_menu": {},

diff --git a/ch09/README.md b/ch09/README.md
@@ -13,7 +13,7 @@
   - Solving regression for regression parameters with gradient descent
   - Estimating the coefficient of a regression model via scikit-learn
 - Fitting a robust regression model using RANSAC
-- Evaluating the performance of linear regression modelss)
+- Evaluating the performance of linear regression models
 - Using regularized methods for regression
 - Turning a linear regression model into a curve - polynomial regression
   - Modeling nonlinear relationships in the Ames Housing dataset

diff --git a/ch19/README.md b/ch19/README.md
@@ -19,7 +19,7 @@
   - Dynamic programming using the Bellman equation
 - Reinforcement learning algorithms
   - Dynamic programming
-    - Policy evaluation – predicting the value function with dynamic programmin
+    - Policy evaluation – predicting the value function with dynamic programming
     - Improving the policy using the estimated value function
     - Policy iteration
     - Value iteration