diff --git a/content/concepts.md b/content/concepts.md index c35e024..6b6d4a8 100644 --- a/content/concepts.md +++ b/content/concepts.md @@ -42,7 +42,7 @@ This process is common in fine-tuning tasks because it ensures that the model's ## Embeddings -**Embeddings** are numerical representations of words or tokens in a continuous vector space. They are crucial in enabling models to understand and process language. Unlike one-hot encoding, which represents words as discrete, high-dimensional vectors, embeddings capture semantic relationships between words in a dense, lower-dimensional space. +**Embeddings** are numerical representations of words or tokens in a continuous vector space. They are crucial in enabling models to understand and process language. Unlike one-hot encoding, which represents words as discrete, high-dimensional vectors, embeddings capture semantic relationships1 between words in a dense, lower-dimensional space. ### How Embeddings Work - **Dense Vectors**: Each word is represented by a dense vector of fixed size, where each dimension captures a different aspect of the word's meaning. @@ -83,3 +83,13 @@ Image from: https://machinelearningmastery.com/a-gentle-introduction-to-position [Complete Guide to Subword Tokenization Methods in the Neural Era](https://blog.octanove.org/guide-to-subword-tokenization/) [Summary of the tokenizers](https://huggingface.co/docs/transformers/tokenizer_summary) [Word Embedding: Basics](https://medium.com/@hari4om/word-embedding-d816f643140) + + +