-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
51071f1
commit 0b36dc7
Showing
15 changed files
with
89 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[tool.poetry] | ||
name = "hal9" | ||
version = "2.1.4" | ||
version = "2.1.5" | ||
description = "" | ||
authors = ["Javier Luraschi <[email protected]>"] | ||
readme = "README.md" | ||
|
Binary file not shown.
Binary file not shown.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
--- | ||
sidebar_position: 2 | ||
--- | ||
|
||
import Autoencoder from './llm-autoencoder.png'; | ||
import Transformer from './llm-transformer.png'; | ||
import GPT1 from './llm-gpt-1.png'; | ||
|
||
# Large Language Models | ||
|
||
## Embeddings | ||
|
||
An Autoencoder is a type of [DNN](intro-ai#) that does not require classification labels but rather, performs unsupervised learning by asking the DNN to classify the the inputs of the network as the outputs. For example, when classifying the image of a cat, the pixels of that cat would be the input and the classificatio label would also be all the pixels of the cat. | ||
|
||
<center><a href="https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798"><img src={Autoencoder} style={{width: 500}} /></a></center> | ||
|
||
This can seem pretty pointless, why would we spend so many compute resources training neural networks that produce the same output for the given input? Interestingly, it was discovered that the middle layer that contains a array (vector) of only a few numbers has very interesting properties, we will refer to this middle layer as the **embedding**. | ||
|
||
It was found that such embeddings generalize and build intuitive understanding of the underlining data. For example, when using embeddings with text as input (as opposed to images), one can use them to as a question like "What is the term for a king that is not a man?", such question can be answered by simply adding ans substracting [King – Man + Woman](https://www.technologyreview.com/2015/09/17/166211/king-man-woman-queen-the-marvelous-mathematics-of-computational-linguistics/) and find out the resulting embedding is actually the vector of Queen, which is surprising given that this was learned by the autoencoder itself. This is arguably an early example of emergent abilities, as in, an unexpected behavior the model was not designed to accomplish. | ||
|
||
## Transformers | ||
|
||
We can use a DNN to predict the next word from a given text; for example, we can train a DNN that given 3 embeddings tells us the next token, so we could ask the DNN to find the next work after `['King', 'wife', 'is']`. The initial text used is referred to as the text **prompt**. Using plain DNN, we would get descent completions from training over several books and we would be able to get reasonable guesses like 'Queen', for that example. | ||
|
||
However, using standard (feed=forward) DNN turns out to really show signs of intelligence. If we were to use the prompt "Rose is the Queen. Who is the King's wife?" we would like get a reply like "The Queen" or even worse, the name of a Queen seen in the training text as "Queen Elizabeth" and the like. | ||
|
||
To solve that problem, variations to DNNs where explored like Recursive Neural Networks (RNNs), Long-Short Term Memory (LSTM) DNNs, and the like. Those showed improvements but it was not until the **transformer** was presented in the [Attention Is All You Need](https://arxiv.org/abs/1706.03762) paper. | ||
|
||
<center><a href="https://arxiv.org/abd/1706.03762"><img src={Transformer} style={{width: 380}} /></a></center> | ||
|
||
You can think of attention as using a DNN to figure out where in the text to put attention to, even if the reference to "Rose is the Queen" is way early in the prompt, the DNN will tell the DNN, to answer this question also look for references in these other parts of the text. | ||
|
||
## Generative Pretrained Transformers | ||
|
||
Transformers also surprised us with another level of emergent abilities, related to answering very basic questions in early 2018 with [GPT-1 by OpenAI](https://openai.com/index/language-unsupervised/). Over time, we found out that thre are further [emergent abilities in larger transformer models](https://arxiv.org/abs/2206.07682) and started referring to pre-trained large transformer models as **Generative Pretrained Transformers** (**GPT**), models that use more data, more compute (GPUs), and more parameters to train complex DNN networks with backpropagation. To leave room for other kinds of models that go beyond transformers, we refer to large GPT models as **Large Language Models** (**LLM**). | ||
|
||
<center><a href="https://openai.com/index/language-unsupervised/"><img src={GPT1} style={{width: 500}} /></a></center> | ||
|
||
The *Generative* term in GPT comes from the ability to generate text (embeddings) and the focus on applications that generating content for question answering, summarization, and many of the emergent abilities a GPT shows. | ||
|
||
Refer to [Advancements in Generative AI](https://arxiv.org/abs/2311.10242) for additional details or hop into the [Prompting](prompts.md) section to learn techniques to maximize the practical use of LLMs. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
|
||
# Prompt Engineering | ||
|
||
Under construction, in the meantime, check [A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications](https://arxiv.org/pdf/2402.07927) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters