diff --git a/.github/workflows/notebook-pr.yaml b/.github/workflows/notebook-pr.yaml
index 5c1b72339..5d3a3f64c 100644
--- a/.github/workflows/notebook-pr.yaml
+++ b/.github/workflows/notebook-pr.yaml
@@ -86,7 +86,7 @@ jobs:
           python ci/verify_exercises.py $nbs --c "$COMMIT_MESSAGE"
           python ci/make_pr_comment.py $nbs --branch $branch --o comment.txt
 
-      # This package is  outdated and no longer maintained
+      # This package is outdated and no longer maintained.
       # - name: Add PR comment
       #   if: "!contains(env.COMMIT_MESSAGE, 'skip ci')"
       #   uses: machine-learning-apps/pr-comment@1.0.0
diff --git a/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/W3D1_Tutorial2.ipynb b/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/W3D1_Tutorial2.ipynb
index a8f3f481e..cb1582b9a 100644
--- a/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/W3D1_Tutorial2.ipynb
+++ b/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/W3D1_Tutorial2.ipynb
@@ -301,7 +301,7 @@
     "\n",
     "In classical transformer systems, a core principle is encoding and decoding. We can encode an input sequence as a vector (that implicitly codes what we just read). And we can then take this vector and decode it, e.g., as a new sentence. So a sequence-to-sequence (e.g., sentence translation) system may read a sentence (made out of words embedded in a relevant space) and encode it as an overall vector. It then takes the resulting encoding of the sentence and decodes it into a translated sentence.\n",
     "\n",
-    "In modern transformer systems, such as GPT, all words are used parallelly. In that sense, the transformers generalize the encoding/decoding idea. Examples of this strategy include all the modern large language models (such as GPT)."
+    "In modern transformer systems, such as GPT, all words are used in parallel. In that sense, the transformers generalize the encoding/decoding idea. Examples of this strategy include all the modern large language models (such as GPT)."
    ]
   },
   {
diff --git a/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/instructor/W3D1_Tutorial2.ipynb b/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/instructor/W3D1_Tutorial2.ipynb
index 547e0add7..deefbfd34 100644
--- a/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/instructor/W3D1_Tutorial2.ipynb
+++ b/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/instructor/W3D1_Tutorial2.ipynb
@@ -301,7 +301,7 @@
     "\n",
     "In classical transformer systems, a core principle is encoding and decoding. We can encode an input sequence as a vector (that implicitly codes what we just read). And we can then take this vector and decode it, e.g., as a new sentence. So a sequence-to-sequence (e.g., sentence translation) system may read a sentence (made out of words embedded in a relevant space) and encode it as an overall vector. It then takes the resulting encoding of the sentence and decodes it into a translated sentence.\n",
     "\n",
-    "In modern transformer systems, such as GPT, all words are used parallelly. In that sense, the transformers generalize the encoding/decoding idea. Examples of this strategy include all the modern large language models (such as GPT)."
+    "In modern transformer systems, such as GPT, all words are used in parallel. In that sense, the transformers generalize the encoding/decoding idea. Examples of this strategy include all the modern large language models (such as GPT)."
    ]
   },
   {
diff --git a/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/student/W3D1_Tutorial2.ipynb b/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/student/W3D1_Tutorial2.ipynb
index 170e78797..453db45fa 100644
--- a/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/student/W3D1_Tutorial2.ipynb
+++ b/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/student/W3D1_Tutorial2.ipynb
@@ -301,7 +301,7 @@
     "\n",
     "In classical transformer systems, a core principle is encoding and decoding. We can encode an input sequence as a vector (that implicitly codes what we just read). And we can then take this vector and decode it, e.g., as a new sentence. So a sequence-to-sequence (e.g., sentence translation) system may read a sentence (made out of words embedded in a relevant space) and encode it as an overall vector. It then takes the resulting encoding of the sentence and decodes it into a translated sentence.\n",
     "\n",
-    "In modern transformer systems, such as GPT, all words are used parallelly. In that sense, the transformers generalize the encoding/decoding idea. Examples of this strategy include all the modern large language models (such as GPT)."
+    "In modern transformer systems, such as GPT, all words are used in parallel. In that sense, the transformers generalize the encoding/decoding idea. Examples of this strategy include all the modern large language models (such as GPT)."
    ]
   },
   {