Skip to content

Commit

Permalink
sync
Browse files Browse the repository at this point in the history
  • Loading branch information
bitwiseops committed Mar 28, 2024
1 parent 84a8601 commit fa1dd0b
Show file tree
Hide file tree
Showing 17 changed files with 159 additions and 97 deletions.
6 changes: 5 additions & 1 deletion 2024-03-28/Chapter 0 - Introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@

## Who am I?

<!-- .slide: class="align-center" -->

<div class="pdf"><!-- { "pdf": "assets/CV_Flavio.pdf" } --></div>

--

Expand All @@ -27,13 +29,14 @@

# Introduction


--


## Definition


A **Large Language Model** (LLM) is a *type of artificial intelligence (AI) algorithm* that uses deep learning (DL) and natural language processing (NLP) techniques over massively large data sets to understand, summarize, generate and predict new content.
A **Large Language Model** (LLM) is a *type of artificial intelligence (AI) algorithm* that uses deep learning (DL) and natural language processing (NLP) techniques over massively large data sets to understand, summarize, generate and predict new textual content.

--

Expand Down Expand Up @@ -218,3 +221,4 @@ In the context of cognitive science, our brains understand and learn about the w
Notes:
The analogy here is that just as grimoires were the repositories of arcane knowledge and power in their time, LLMs are the contemporary digital equivalents, holding vast amounts of human knowledge. However, instead of spells and magical rites, LLMs contain the collective textual data of humanity, capable of generating insights, answers, and even creating new content based on this data.

--
17 changes: 7 additions & 10 deletions 2024-03-28/Chapter 1 - Deep Learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,29 +112,26 @@ Notes:

Notes:



--

<!-- .slide: class="align-center" -->

## Handwritten Recognition Neural Network
## Dense / Fully Connected Neural Networks

[Example](https://colab.research.google.com/drive/1-0RsBldZ0KlCeP6O4BeuPbB4Zi6tZn73?usp=sharing)
> A dense neural network, often referred to as a fully connected network, is a type of artificial neural network where **each neuron in one layer is connected to every neuron in the next layer**. These connections allow the network to *learn complex patterns and relationships from the input* data through a process of weighted inputs, biases, and activation functions.
<small style="font-size:xx-small"> [Dense NN Example](https://colab.research.google.com/drive/1-0RsBldZ0KlCeP6O4BeuPbB4Zi6tZn73?usp=sharing) </small>

--

<!-- .slide: class="align-center" -->

--

## Convolutional Neural Networks

<img src="assets/cnn.png" width="70%">
> A Convolutional Neural Network (CNN) is a type of artificial neural network designed to **process data with a grid-like topology**, such as images. CNNs are particularly powerful for tasks that involve spatial data, like image and video recognition, image classification, and also applications in areas beyond vision, such as audio processing and natural language processing.
<small style="font-size:xx-small"> [Introduction to Convolutional Neural Networks (CNN)](https://www.analyticsvidhya.com/blog/2021/05/convolutional-neural-networks-cnn/) </small>

<small style="font-size:xx-small"> [CNN Explainer](https://poloclub.github.io/cnn-explainer/) </small>

[CNN Explainer](https://poloclub.github.io/cnn-explainer/)

Notes:
- Specialized kind of neural network for processing data with a grid-like topology
Expand Down
119 changes: 77 additions & 42 deletions 2024-03-28/Chapter 2 - Natural Language Processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,17 +35,6 @@ The dialogue above is from ELIZA, an **early natural language processing system*

--

ELIZA works by having a series or **cascade of regular expression substitutions** each of which matches and changes some part of the input line.

```text
s/.* YOU ARE (depressed|sad) .*/I AM SORRY TO HEAR YOU ARE \1/
s/.* YOU ARE (depressed|sad) .*/WHY DO YOU THINK YOU ARE \1/
s/.* all .*/IN WHAT WAY/
s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/
```

--

<!-- .slide: class="align-center" -->

## Regular Expressions
Expand All @@ -60,12 +49,19 @@ Regular expressions are particularly useful for **searching in texts**, when we

--

<!-- .slide: class="align-center" -->
## Rule-based Chatbots

Rule-based Chatbots like ELIZA works by having a series or **cascade of regular expression substitutions** each of which matches and changes some part of the input line.

## A Rule-based chatbot
```text
s/.* YOU ARE (depressed|sad) .*/I AM SORRY TO HEAR YOU ARE \1/
s/.* YOU ARE (depressed|sad) .*/WHY DO YOU THINK YOU ARE \1/
s/.* all .*/IN WHAT WAY/
s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/
```

<small style="font-size:xx-small"> [Rule Based Chatbot Example](https://colab.research.google.com/drive/1yph2YtXs-6a08gwf4MymHBVlPEebva_y?usp=sharing) </small>

[Example](https://colab.research.google.com/drive/1yph2YtXs-6a08gwf4MymHBVlPEebva_y?usp=sharing)

--

Expand All @@ -92,6 +88,13 @@ Regular expressions are particularly useful for **searching in texts**, when we
</li>
</ol>

Notes:
The Natural Language Processing (NLP) pipeline refers to a **series of systematically arranged processes or steps** that are **followed to perform tasks involving the understanding, interpretation, and generation of human language** by computers. An NLP pipeline translates raw text into a form that machines can understand and analyze, facilitating various applications such as sentiment analysis, language translation, and question-answering systems.
1. Text Acquisition : Gathering text data, which could come from various sources like websites, books, social media, etc.
2. Pre-processing: Cleaning and normalizing the text. This includes tasks like removing unnecessary characters, correcting typos, converting text to lowercase, etc.
3. Feature Extraction: Converting tokens to numerical representations through embeddings, where words with similar meanings are mapped to points close to each other in a geometrical space. This facilitates the machine's understanding of semantic similarities between words.
4. Modelling: Using the processed data to train machine learning or deep learning models for specific tasks (e.g., classification, regression). After training, the model is evaluated to determine its accuracy, precision, recall, and other metrics.

--

<!-- .slide: class="align-center" -->
Expand All @@ -113,57 +116,89 @@ Notes:

--

<!-- .slide: class="align-center" -->

# Feature Extraction Techniques

```text
Review 1: This movie is very scary and long
Review 2: This movie is not scary and is slow
Review 3: This movie is spooky and good
```

<img src="assets/TF_IDF-matrix.webp" >
# Feature Extraction

<small style="font-size:xx-small"> [Quick Introduction to Bag-of-Words (BoW) and TF-IDF for Creating Features from Text](https://www.analyticsvidhya.com/blog/2020/02/quick-introduction-bag-of-words-bow-tf-idf/) </small>
- **Bag of Words (BoW)**: Represents text as an unordered collection of words and associates a frequencies.
- **Term Frequency-Inverse Document Frequency (TF-IDF)**: Reflects the importance of a word to a document in a collection.

Notes:
- **Bag of Words (BoW)**: Represents text as an unordered collection of words. (first three columns)
- **Term Frequency-Inverse Document Frequency (TF-IDF)**: Reflects the importance of a word to a document in a collection. (other columns)
- **Word Embeddings**: Dense representations of words in a continuous vector space (e.g., Word2Vec, GloVe).
- While both Bag-of-Words and TF-IDF have been popular in their own regard, there still remained **a void where understanding the context of words** was concerned. Detecting the similarity between the words ‘spooky’ and ‘scary’, or translating our given documents into another language, requires a lot more information on the documents.

--

## Word Embeddings (1)
<!-- .slide: class="align-center" -->

## Feature Extraction - Word Embeddings

<p class="fragment fade-out" data-fragment-index=0 >
Training Data: "Troll 2 is great!" and "Gymkata is great!"
</p>
<div class="r-stack">
<img class="fragment fade-out" src="assets/word-emb0.png" data-fragment-index=0>
<img class="fragment fade-in-then-out" src="assets/word-emb1.png" data-fragment-index=0 >
<img class="fragment fade-in-then-out" src="assets/word-emb3.png" >
<img class="fragment fade-in-then-out" src="assets/word-emb2.png" >
</div>

<small style="font-size:xx-small"> [Word Embedding and Word2Vec, Clearly Explained!!!](https://www.youtube.com/watch?v=viZrOnJclY0) </small>



Notes:
- Word embeddings are a type of word representation that allows words to be represented as vectors in a continuous vector space.
- Unlike one-hot encoding, embeddings **capture semantic meaning and relationships between words**.
- Unlike BoW and YTF-IDF, embeddings **capture semantic meaning and relationships between words**.
- Improve performance of NLP models.
- Reduce dimensionality compared to sparse representations.
- A **simple neural network** can automate the assignment of numbers to words, taking into account their context and usage. This network, through a process involving weights and activation functions, **learns to predict the next word in a sentence**. **The weights, adjusted through backpropagation, become the embeddings** that capture semantic relationships between words.

--

## Word Embeddings (2)
<!-- .slide: class="align-center" -->


## Word Embeddings - Word2Vec

<p class="fragment fade-out" data-fragment-index=0 >
Training Data: Wikipedia, Books, ...
</p>
<div class="r-stack">
<img class="fragment fade-out" src="assets/word-emb4.png" data-fragment-index=0>
<img class="fragment fade-in-then-out" src="assets/word-emb5.png" width="70%" data-fragment-index=0>
<img class="fragment fade-in-then-out" src="assets/word-emb6.png" width="70%" >
</div>

<small style="font-size:xx-small"> [Word Embedding and Word2Vec, Clearly Explained!!!](https://www.youtube.com/watch?v=viZrOnJclY0) </small>
<small style="font-size:xx-small"> [Word2Vec - Skipgram and CBOW](https://www.youtube.com/watch?app=desktop&v=UqRCEmrv1gQ) </small>


Notes:
- Word embeddings are learned from text data.
- The idea is to place words that have similar meanings close together in the vector space.
- Can be trained using models like Word2Vec, GloVe, or FastText.
- Training involves adjusting the vector representations based on word contexts.
- Is a popular tool that utilizes neural networks to create word embeddings. It employs two main strategies to enrich the context in which words are understood: Continuous Bag-of-Words (CBOW) and Skip-Gram. **CBOW predicts a word based on its context**, while **Skip-Gram predicts the context from a word**.
- Word2Vec can use a large number of activation functions (often 100 or more) to create detailed embeddings for each word, trained on extensive datasets like the entirety of Wikipedia. This approach results in a vast vocabulary and a high number of weights to optimize, making training resource-intensive.


--

## Word2Vec
## What is a Language Model?

> A language model is a *statistical and computational algorithm* that enables a computer to understand, interpret, and generate human language based on the likelihood of occurrence of words and sequences of words.
Notes:
- Developed by a team of researchers at Google.
- Utilizes two architectures: Continuous Bag of Words (CBOW) and Skip-Gram.
- **CBOW**: Predicts a word based on context.
- **Skip-Gram**: Predicts context words from a target word.
- Skip-Gram tends to perform well on smaller datasets and with rare words.
--

## Statistical Language Models

These earlier models rely on the **statistical properties** of language, using the probabilities of sequences of words (n-grams) to predict the likelihood of the next word in a sequence.

<small style="font-size:xx-small"> [Bigrams Example](https://colab.research.google.com/drive/1ikJuNYOOliuy8tTl9csKuWDlVdHJhVQg?usp=sharing) </small>


--


## Neural Language Models

These models use **neural networks** to predict the likelihood of a sequence of words, learning and representing language in high-dimensional spaces.

<small style="font-size:xx-small"> [NLM Example](https://colab.research.google.com/drive/1ON9CO6LUtX1mbDmYIq3Pt5mSqoxzGxPr?usp=sharing) </small>


68 changes: 24 additions & 44 deletions 2024-03-28/Chapter 3 - Large Language Models.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,44 +2,29 @@

--

## (Wikipedia) Definition

> A **large language model** (LLM) is a language model notable for its *ability to achieve general-purpose language generation and other natural language processing tasks* such as classification. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.
--

## What is a Language Model?

A language model is a statistical and computational tool that enables a computer to understand, interpret, and generate human language based on the likelihood of occurrence of words and sequences of words.

--
## What is a *Large* Language Model?

**Statistical Language Models:** These earlier models rely on the statistical properties of language, using the probabilities of sequences of words (n-grams) to predict the likelihood of the next word in a sequence.
A Large Language Model is a Neural Language Model
- which is trained on very big datasets
- where its underlying neural network uses billions of parameters

[Bigrams Example](https://colab.research.google.com/drive/1ikJuNYOOliuy8tTl9csKuWDlVdHJhVQg?usp=sharing)
Notes:
A large language model is a type of artificial intelligence algorithm designed to understand, generate, and work with human language in a way that mimics human-like understanding and production. These models are "large" both in terms of the size of the neural network architecture they are based on and the amount of data they are trained on.

--

**Neural Language Models:** These models use **neural networks** to predict the likelihood of a sequence of words, learning and representing language in high-dimensional spaces.

[Simplified NLM Example](https://colab.research.google.com/drive/1ON9CO6LUtX1mbDmYIq3Pt5mSqoxzGxPr?usp=sharing)
## (Wikipedia) Definition

--
> A **large language model** (LLM) is a language model notable for its *ability to achieve general-purpose language generation and other natural language processing tasks* such as classification. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.
## What is a *Large* Language Model?

--

A Large Language Model is a Neural Language Model
- which is trained on very big datasets
- where its underlying neural network uses billions of parameters

Notes:
A large language model is a type of artificial intelligence algorithm designed to understand, generate, and work with human language in a way that mimics human-like understanding and production. These models are "large" both in terms of the size of the neural network architecture they are based on and the amount of data they are trained on.
## Evolution of LLMs

--
<iframe width="100%" height="500" src="https://informationisbeautiful.net/visualizations/the-rise-of-generative-ai-large-language-models-llms-like-chatgpt/"> </iframe>

<div class="timeline" style="width: 100%; height: 500px"> <!-- {"url": "assets/timeline.json"} --> </div>
<small style="font-size:xx-small"> [The Rise and Rise of A.I. Large Language Models (LLMs)](https://informationisbeautiful.net/visualizations/the-rise-of-generative-ai-large-language-models-llms-like-chatgpt/) </small>

--

Expand All @@ -50,60 +35,55 @@ A large language model is a type of artificial intelligence algorithm designed t
<div class="pdf"><!-- { "pdf": "assets/1706.03762.pdf" } --></div>

Notes:
TODO
- This work introduced the **Transformer architecture**, which is the foundation upon which GPT and many other subsequent models are built.
- Before the Transformer, most NLP models relied on recurrent neural networks (RNNs) or convolutional neural networks (CNNs) to process text. The Transformer model introduced a **novel architecture based entirely on attention mechanisms**, specifically self-attention, allowing the model to **weigh the importance of different words within a sentence regardless of their positional distance from each other**.
- The Transformer architecture has made it **feasible to pre-train models** on large corpora of text **and then fine-tune** them for specific tasks.

--

<!-- .slide: class="align-center" -->

## Transformers

## LLM Visualization

[LLM Visualization](https://bbycroft.net/llm)
<iframe width="560" height="315" src="https://www.youtube.com/embed/ZXiruGOCn9s?si=WQyUV4YN9HWNffVj" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

--

## Real World Examples


Large language models (LLMs) can also be categorized based on their availability as either open source, where the model architecture and weights are publicly accessible, or closed source, where the model details are proprietary and access is restricted.
## Transformers - Details

[LLM Visualization Project](https://bbycroft.net/llm)

--

## Real World Examples

- Closed source
- OpenAI's GPT-3 / GPT-4
- Google's BERT models
- ...

- Open source
- [OpenAI's GPT-2](https://github.com/openai/gpt-2)
- [Hugging Face’s Transformers](https://huggingface.co/) (repository of open source models)
- ...

- Mixed open/closed source
- [Meta's LLaMA](https://github.com/Meta-Llama/llama)
- the company has provided some level of access to the research community but still maintains control over the distribution and usage of the model.

<!-- .element: style="font-size:large" -->

--

<!-- .slide: class="align-center" -->

## LLaMA 2


<div class="pdf"><!-- { "pdf": "assets/1706.03762.pdf" } --></div>
<div class="pdf"><!-- { "pdf": "assets/llama2.pdf" } --></div>

--

<!-- .slide: class="align-center" data-background-image="assets/matrix.gif" -->

## LLaMA 2 - Hands On!

--


## Extras

- [LLM Visualization](https://bbycroft.net/llm)
- ["Spreadsheets are all you need" Project](https://spreadsheets-are-all-you-need.ai)
- [Navigating the World of Large Language Models](https://www.bentoml.com/blog/navigating-the-world-of-large-language-models)
Loading

0 comments on commit fa1dd0b

Please sign in to comment.