Skip to content

Latest commit

 

History

History
44 lines (31 loc) · 2.51 KB

File metadata and controls

44 lines (31 loc) · 2.51 KB

Introduction to Large Language Models

Author: Archit Vasan , including materials on LLMs by Varuni Sastri and Carlo Graziani at Argonne, and discussion/editorial work by Taylor Childers, Bethany Lusch, and Venkat Vishwanath (Argonne)

Inspiration from the blog posts "The Illustrated Transformer" and "The Illustrated GPT2" by Jay Alammar, highly recommended reading.

This tutorial covers the some fundamental concepts necessary to to study of large language models (LLMs).

Brief overview

  • Scientific applications for language models
  • General overview of Transformers
  • Tokenization
  • Model Architecture
  • Pipeline using HuggingFace
  • Model loading

Sophia Setup

  1. If you are using ALCF, first log in. From a terminal run the following command:
  1. Although we already cloned the repo before, you'll want the updated version. To be reminded of the instructions for syncing your fork, click here.

  2. Now that we have the updated notebooks, we can open them. If you are using ALCF JupyterHub or Google Colab, you can be reminded of the steps here.

  3. Reminder: Change the notebook's kernel to datascience/conda-2024-08-08 (you may need to change kernel each time you open a notebook for the first time):

    1. select Kernel in the menu bar
    2. select Change kernel...
    3. select datascience/conda-2024-08-08 from the drop-down menu

Google colab setup

In case you have trouble accessing Sophia, all notebook material can be run in google colab.

Just:

  1. Go to this link: Colab
  2. Click on File/Open notebook
  3. Nagivate to the GitHub tab and find argonne-lcf/ai-science-training-series
  4. Click on 04_intro_to_llms/IntroLLMs.ipynb

References:

I strongly recommend reading "The Illustrated Transformer" by Jay AlammarAlammar also has a useful post dedicated more generally to Sequence-to-Sequence modeling "Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention), which illustrates the attention mechanism in the context of a more generic language translation model.