Skip to content

Latest commit

 

History

History
126 lines (125 loc) · 17.3 KB

Glossary.md

File metadata and controls

126 lines (125 loc) · 17.3 KB

Glossary of AI Jargon and Terminology

  • Activation Function: A function that determines the output of a neuron in a neural network. It introduces non-linearity to the network. See ReLU.
  • Adaption Tuning: See: Fine-Tuning
  • AGI: Artificial General Intelligence. Highly autonomous systems that outperform humans at most economically valuable work. See also ASI.
  • AI winter: A period of time when interest and funding for AI R&D is significantly reduced dut tofailing to meet expectations.
  • Algorithmic Bias: The presence of systematic and unfair biases in the outcomes produced by algorithms, typically due to biases present in the training data.
  • Alignment: The challenge of ensuring that an AI system's goals and behavior align with human values and intentions.
  • Andrej Karpathy: Key figure in CNNs and computer vision. Director of AI at Tesla.
  • Andrew Ng: Key figure in Machine Learning and Deep Learning. Founder of DeepLearning.AI.
  • ANN, Artificial Neural Network. A computational model inspired by the structure of biological neural networks, consisting of an interconnected network of nodes or neurons.
  • ASI: Artificial SuperIntelligence. Artificial intelligence where machines surpass human intelligence in virtually every aspect.
  • Attention, Attention Mechanism: A mechanism in neural networks, particularly in Transformer-based models, to capture contextual relationships between words in an input sequence.
  • Attention Head: In the context of neural networks, attention heads refer to the individual components responsible for attending to different parts of the input sequence.
  • Autodiff, Automatic Differentiation: Techniques for evaluating the partial derivative of a function. Used for implementing Backpropagation in Neural Networks. See also Chain Rule.
  • Autoencoder: A type of neural network architecture used for unsupervised learning and dimensionality reduction.
  • Autograd: See Autodiff.
  • Autoregression, Autoregressive: A statistical model that predicts the next value based on the previous values.
  • Backpropagation: An algorithm used in neural networks to calculate the gradient of the loss function with respect to the parameters of the network. See Geoffrey Hinton.
  • Beam Search: A search algorithm used in natural language processing tasks, such as machine translation or text generation, to find the most likely sequence of words given a set of candidate options.
  • BERT: Bidirectional Encoder Representations from Transformers: An architecture based on Transformers used in natural language processing.
  • Biases: In an artiificial neural network, parameters that add a constant value to the input. See also Weights. Not to be confused with Algorithmic Bias.
  • The Bitter Lesson: TBD
  • Black Box: A model that is not easily understood by humans.
  • BPE: Byte Pair Encoding. a sub-word tokenization technique used in natural language processing and machine learning. Introduced by Sennrich et al. in their paper "Neural Machine Translation of Rare Words with Subword Units" (2016)
  • Chain Rule: A concept in calculus used by Autodiff for finding the derivatives which is essential in Backpropagation.
  • ChatGPT: The first LLM that is capable of generating human-like text.
  • Classifier: A machine learning model that can be used to classify data.
  • CNN: Convolutional Neural Network: A type of neural network architecture commonly used for image and video processing.
  • Context Window: A window of input data that is used to predict the next word.
  • Deep Contrastive Network: TBD
  • Deep Learning: A subfield of machine learning that focuses on the development and application of artificial neural networks with multiple layers.
  • Diffusion, Diffusion Model: A technique used in Generative AI which involves learning how to remove blur/noise from images.
  • Domain Adaption: See: Fine-Tuning
  • Doomer, Doomerism: A disparaging term used by AI optimists to describe AI pessimists.
  • Eliezer Yudkowsky: A prominent figure in Alignment
  • ELIZA: A very early and simplistic natural language processing chatbot from the 1960s that nonetheless fooled many people into thinking it was human.
  • Embedding: See also Latent Space, Latent Variable, Word Embedding
  • Feedforward Neural Network: A type of artificial neural network where information flows from the input layer, through any hidden layers, to the output layer with no feedback.
  • Fine-tuning: The process of further training a pre-trained model on a specific task or dataset to improve its performance by updating the its parameters based on the new data while retaining the knowledge learned during pre-training. Also known as: Adaption Tuning, Domain Adaption
  • FOOM: TBD. See also Hard Takeoff, Elizezer Yudkowsky.
  • Foundation Model: The category of which the LLMs are the most well known member. They are not limited to text, but cover all modalities and work by segmenting data into Tokens or Patches.
  • GAN: Generative Adversarial Network: A type of machine learning model involving a generator AI and a discriminator AI. The former tries to generate realistic output the latter can't detect while the latter tries to detect whether input is real or generated.
  • Generative AI: Algorithms or models that can create new content, including text, audio, or images.
  • Geoffrey Hinton: Most known for his working developing Backpropagation, a key breakthrough in deep learning.
  • Glitch Token: A type of token in an LLM that can cause anomalous, unexpected, or nonsensical output apparently unrelated to the prompt. For example "SolidGoldMagikarp".
  • GPT: Generative Pre-trained Transformer. A type of LLM that utilizes the Transformer architecture and is trained on a large corpus of text data. GPT models have been successful in various natural language processing tasks, including text generation, language translation, and question-answering.
  • Gradient Descent: An optimization algorithm used in machine learning to minimize the loss function of a model.
  • Guardrails: A nontechnical umbrella term for various safety measures that attempt to counter toxicity, bias, etc.
  • Hallucination: A nontechnical term for generated LLM output that is not based on fact.
  • Hard Takeoff: A scenario in which AGI rapidly surpasses human intelligence, potentially leading to an uncontrollable impact on society.
  • Hidden Layer: A layer in a neural network that is not visible to the user.
  • Hyperparameter: Parameter that is not directly related to the model architecture.
  • Ilya Sutskever: Transformers. Co-founder of OpenAI.
  • In-Context Learning, ICL: The ability of Large Language Models to learn from information in the input context without needing to update model parameters by updating the state of latent variables based on the context and conditioning on this when predicting the next output.
  • Inference: The process of using a model to make predictions on unseen data.
  • Inference Kernel: TBD
  • Inner Alignment: TBD
  • Instruction Tuning: TBD
  • Interpretability: The degree to which we can understand the output of an LLM.
  • Jailbreak, Jailbreaking: Circumventing the Guardrails of an LLM with cleverly designed prompts.
  • Language Model: A computational model of human language built from statistical data.
  • Latent Space: TBD. See also Embedding.
  • Latent Variable: TBD. See also Embedding.
  • Layer: A set of artificial neurons that are not connected to each other but take input from the previous layer and pass their output to the next layer. Each layer may be seen as a level of generalization or abstraction.
  • LLM: Large Language Model. A type of Language Model that uses the Transformer architecture and is trained on a large corpus of text data. The most well-known category of Foundation Model.
  • Logit: TBD
  • Loss Function: In training a neural network, a function that measures how far the network's output is from the desired output.
  • LSTM: Long Short-Term Memory. A type of neural network architecture that is commonly used for sequence data processing.
  • Machine Learning: TBD
  • Machine Translation: Methods to translate text from one human language to another that may include NLP, statistics and probability, or more advanced Deep learning techniques such as LLMs.
  • Maximum Likelihood Estimation: TBD
  • Memorization: See also Overfitting. TBD
  • Modality: The type of data which a model uses, such as text, images, audio, and video. See also Multimodal, Multimodality.
  • Model Collapse: TBD.
  • Multi-Head Attention: TBD. See also Attention, Attention Head.
  • Multimodal, Multimodality: Refers to AI technologies that can be trained on and make inferences on multiple kinds of data, such as images, audio, and video in addtion to text.
  • Neural Network: TBD
  • NLP: Natural Language Processing. TBD
  • One Shot Learning: TBD
  • OOD, Out-of-Distribution: TBD
  • OpenAI: AI company prominent for GPTs.
  • Outer Alignment: TBD
  • Overfitting: TBD. Less technically also referred to as memorization. It is a situation in which the model learns the training data too well and fails to generalize to new data.
  • Parameter: TBD
  • Patch: The equivalent of the Tokens of an LLM for other Modalities of Foundation Model such as Audio, Speech, Video, etc.
  • P(doom): The prior probability of AI causing an existential crisis for humanity.
  • Perceptron: TBD
  • Positive transfer: TBD
  • Pre-training: A stage in which a model is trained on a large corpus of text data before being fine-tuned on a specific task or dataset.
  • Prompt: The text given to an LLM in the form of a question or command that the model will generate a response to.
  • Prompt Engineer, Prompt Engineering: A person/the process of coming up with effective prompts for an LLM.
  • Prompt Injection: TBD
  • Q-learning: TBD
  • Q*: TBD
  • Reinforcement Learning, RL: TBD
  • ReLU: A type of activation function that is used in neural networks to introduce non-linearity.
  • Retrieval-Augmented Generation, RAG: TBD
  • Reward Function: TBD
  • RLHF: Reinforcement Learning from Human Feedback: One of the Guardrails that is part of the Fine-Tuning process that attempts to align a trained model to human values and preferences. See Alignment.
  • RNN: Recurrent Neural Network: A type of neural network architecture commonly used for sequential data processing such as audio and text.
  • Safeguards: See: Guardrails
  • Scaling: TBD
  • Scaling Hypothesis: TBD
  • Self-Attention: See: Attention, Attention Mechanism
  • SGD, Stochastic Gradient Descent: TBD
  • The Singularity: The posited point in the future when AI will surpass human intelligence.
  • Softmax: TBD
  • Sparse Autoencoder: A type of Autoencoder inspired by the Sparse Coding Hypothesis in neuroscience, in which only a small number of neurons are activated at a time.
  • State space model (SSM): TBD
  • Stochastic Parrot: Coined by Emily M. Bender in 2021. Disparaging term used of LLMs to refute that they may have any inherent world-building.
  • Style Transfer: In image generation, a technique where a style image in used to modify an input image.
  • Supervised Learning: TBD
  • Synthetic Data: Algorithmically generated data used for training and validating models.
  • System Prompt: TBD
  • Temperature: TBD
  • Token: A unit of information in an LLM that roughly corresponds to a word in the vocabulary but is very often only part of a word. See also Patch.
  • Transfer Learning: TBD
  • Transformer: A neural network architecture introduced in the paper "Attention is All You Need" by Vaswani et al. (2017). It has become a popular model for various natural language processing tasks. The Transformer architecture utilizes self-attention mechanisms to capture contextual relationships between words in an input sequence, enabling effective modeling of long-range dependencies.
  • Unsupervised Learning: TBD
  • Weights: TBD. See also Biases
  • Word Embedding: A representation of a word or token as a set of numbers in a vector space.
  • World Model: Implicit representation of the world encoded in the weights and biases of a neural network captured through patterns in the data during training.
  • Yann LeCun: Deep Learning and CNNs. Chief AI Scientist at Facebook.
  • Yoshua Bengio: Deep Learning and Neural Networks.
  • Zero-Shot Learning: A machine learning paradigm where a model is trained to recognize and classify objects or concepts that it has never seen before. It uses auxiliary information about the unseen classes to generalize from the known classes.