jalammar · blais · Nov 16, 2024
diff --git a/_posts/2018-06-27-illustrated-transformer.md b/_posts/2018-06-27-illustrated-transformer.md
@@ -383,7 +383,7 @@ This goes for the sub-layers of the decoder as well. If we're to think of a Tran
 ## The Decoder Side
 Now that we've covered most of the concepts on the encoder side, we basically know how the components of decoders work as well. But let's take a look at how they work together.
 
-The encoder start by processing the input sequence. The output of the top encoder is then transformed into a set of attention vectors K and V. These are to be used by each decoder in its "encoder-decoder attention" layer which helps the decoder focus on appropriate places in the input sequence:
+The encoder starts by processing the input sequence. The output of the top encoder is then transformed into a set of attention vectors K and V. These are to be used by each decoder in its "encoder-decoder attention" layer which helps the decoder focus on appropriate places in the input sequence:
 
 
 <div class="img-div-any-width" markdown="0">