diff --git a/index.html b/index.html index 27c0a53..754e0ce 100644 --- a/index.html +++ b/index.html @@ -1,7 +1,7 @@ - + Risks (and Benefits) of Generative AI and Large Language Models @@ -112,6 +112,12 @@ Recent Posts +
+ Week 12: Regulating Dangerous Technologies + + +
+
Week 11: Watermarking on Generative Models @@ -164,12 +170,6 @@ Week 2: Alignment -
- -
- Week 1: Introduction - -
@@ -235,12 +235,43 @@ + + + +

Week 12: Regulating Dangerous Technologies

+
+ + +
+ +
+

The slides are here: Regulating Dangerous Technologies (I’ve included some slides in the posted slides that I didn’t present in class but you might find interesting, including some excerpts from a talk I gave in 2018 on Mutually Assured Destruction and the Impending AI Apocalypse.)

+

Since one of the groups made the analogy to tobacco products, I also will take the liberty of pointing to a talk I gave at Google making a similar analogy: The Dragon in the Room.

+

Stephanie made the point after class about how important individuals +making brave decisions is to things working out, in particular with +humanity (so far!) avoiding annihilating ourselves with nuclear +weapons. Stanislav Petrov may well have been the single person between +us and nuclear destruction in 1983, when he prevented an alert (which +he correctly determined was a false alarm) produced by the Soviet +detection system from going up the chain. Here’s one (of many) +articles on this: ‘I Had A Funny Feeling in My +Gut’, +Washington Post, 10 Feb 1999. There is still a lot of uncertainty and +skepticism if we should be fearing any kind of out-of-control AI risk, +but it is not so hard to imagine scenarios where our fate will +similarly come down to an individual’s decision at a critical juncture.

+ +
+
+

Week 11: Watermarking on Generative Models

@@ -970,7 +1001,7 @@

Question: Why is there a discrepancy between crowdworker (and GPT-4) preferences evaluation and automatic benchmark evaluation?

-

Authors’ conclusion: Limitation models can learn style, but not factuality.

+

Authors' conclusion: Limitation models can learn style, but not factuality.

@@ -1331,14 +1362,14 @@

Discussion Questions

In The Curse of Recursion: Training on Generated Data Makes Models Forget, the authors rely on several assumptions to support their arguments. How strong those assumptions are and do you think these assumptions limit its applicability to broader contexts?

-
+

    -
  1. +
  2. Gudibande, A., Wallace, E., Snell, C., Geng, X., Liu, H., Abbeel, P., Levine, S. and Song, D., 2023. The false promise of imitating proprietary llms. arXiv preprint arXiv:2305.15717. ↩︎

-
+

@@ -1744,7 +1775,7 @@

Sparse Autoencoders

  • Input Bias: Introduces an approach of adding an input bias to the representations in autoencoders, which demonstrates a significant boost in performance for the models used in toy examples.
  • -

    The purpose of sparse autoencoders is to extract meaningful features from neural network activations. To avhice a good decomposition, where the features extracted should be interpretable and able to describe the activations’ context requires the ability to describe activations, interpret downstream effects of changes, and cover a significant portion of functionality within the data.

    +

    The purpose of sparse autoencoders is to extract meaningful features from neural network activations. To avhice a good decomposition, where the features extracted should be interpretable and able to describe the activations' context requires the ability to describe activations, interpret downstream effects of changes, and cover a significant portion of functionality within the data.

    Are these features “interpretable”

    @@ -1754,7 +1785,7 @@

    Are these features “interpretabl


    Feature Activation Sampling Bias: In previous evaluations, there was a bias due to just considering the top-activation neurons which might inaccurately appear monosemantic due to their higher activations. To mitigate this bias, the approach involves sampling uniformly across all possible activations for each given feature.

    -

    Evaluation of Interpretable Features: The authors used an evaluation process where human-based assessments are used to determine the interpretability of the features extracted. The criteria for interpretability are based on the authors’ distributed-based evaluation, where a score above eight is considered sufficiently interpretable.

    +

    Evaluation of Interpretable Features: The authors used an evaluation process where human-based assessments are used to determine the interpretability of the features extracted. The criteria for interpretability are based on the authors' distributed-based evaluation, where a score above eight is considered sufficiently interpretable.

    Automated Evaluation

    @@ -1911,7 +1942,7 @@

    Monday,

    Here is an example of pseudocode from the activity:

    -
    Sentence = "The students like to read interesting books."
    +
    Sentence = "The students like to read interesting books."
     # The bilingual dictionary from English to Chinese: Eng_chinese_dict
     Translation = []
     for word in Sentence.split():
    @@ -1919,7 +1950,7 @@ 

    Translation.append(Eng_chinese_dict[word]) else: Translation.append(word) -Translated_sentence = " ".join(Translation) +Translated_sentence = " ".join(Translation)

    After the activity discussion, here are the challenges encountered when translating from English to another language:

    • Variations in Word Order: Different languages have varying word orders, affecting sentence structure.
    • @@ -2821,7 +2852,7 @@

      Monday, September 18

    Figure 1 (Image source)
    -

    LLMs and fine-tuned models perform better on different tasks. According to a study from the paper Benchmarking Large Language Models for News Summarization 1, LLMs perform better than fine-tuned models on text summarization according to the preferences of human raters. Additionally, LLMs perform better on tasks that require large amounts of knowledge from a variety of domains2. In machine translation, however, fine-tuned models generally do better than LLMs, although they only do slightly better in low-resource settings2. Additionally, the presenters explained that fine-tuned models and LLMs have similar performance when the task only requires very specific knowledge.

    +

    LLMs and fine-tuned models perform better on different tasks. According to a study from the paper Benchmarking Large Language Models for News Summarization 1, LLMs perform better than fine-tuned models on text summarization according to the preferences of human raters. Additionally, LLMs perform better on tasks that require large amounts of knowledge from a variety of domains2. In machine translation, however, fine-tuned models generally do better than LLMs, although they only do slightly better in low-resource settings2. Additionally, the presenters explained that fine-tuned models and LLMs have similar performance when the task only requires very specific knowledge.

    The presenters then posed the following question to the class: “How could we enhance LLM in the scenario where the required knowledge does not match their learned knowledge?” The class formed four groups to discuss the question. Each group then shared a summary of what they had discussed:

    Group 1: Discussed enhancements in both training and testing. For testing, use intentional prompts to get the knowledge you want from the model. For training, adding more training data, using a knowledge graph to classify knowledge into different clusters for ease of search, and using a plug-in, as mentioned in the GitHub discussion.

    Group 2: Advocated for enhancement through the model’s ability to retrieve information from external sources and undergo fine-tuning.

    @@ -2985,13 +3016,13 @@

    Discussion for Wednesday:

    The paper mentions the importance of safety and minimizing bias in LLM-generated medical information, and the optional reading reports on some experiments that show biases in GPT’s medical diagnoses. Should models be tuned to ignore protected attributes? Should we prevent models from being used in medical applications until these problems can be solved?

    -
    +

      -
    1. +
    2. Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, and Tatsunori B. Hashimoto. Benchmarking large language models for news summarization, 2023. https://arxiv.org/abs/2301.13848 ↩︎

    3. -
    4. +
    5. Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek @@ -3004,13 +3035,13 @@

      Discussion for Wednesday:

      drew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. Palm: Scaling -language modeling with pathways, 2022. https://arxiv.org/abs/2204.02311 ↩︎ ↩︎

      +language modeling with pathways, 2022. https://arxiv.org/abs/2204.02311 ↩︎

    6. -
    7. +
    8. OpenAI. GPT-4 Technical Report. March 2023. https://arxiv.org/abs/2303.08774 ↩︎

    -
    +

    @@ -3243,17 +3274,17 @@

    By Tuesday: Questions about

    The authors recommend transparency of bias mitigation methods, citing the benefit it could provide to researchers and practitioners. Specifically, how might researchers benefit from this? Can you foresee any negative consequences (either to researchers or the general users of these models) of this transparency?

    -
    +

      -
    1. +
    2. Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh. “Calibrate before use: Improving few-shot performance of language models.” International Conference on Machine Learning. PMLR, 2021. ↩︎

    3. -
    4. +
    5. Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman. “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.” arXiv preprint arXiv:2305.04388, 2023. ↩︎

    -
    +
    @@ -3380,7 +3411,7 @@

    Introduction to AI Align (Feng et al.) show that famous models like BERT and ChatGPT do appear to have socioeconomic political leanings (of course, there is no true -neutral'' or center’’ position, these are just defined by where +neutral'' or center'' position, these are just defined by where the expected distribution of beliefs lies).

    Figure 1 shows the political leanings of famous LLMs.

    @@ -3465,10 +3496,10 @@

    Discussion Questions

    However, a developer’s responsibility doesn’t culminate once the AI product hits the market. The journey is continuous. Post-deployment, it’s crucial for developers to monitor the system’s alignment with human values and rectify any deviations. It’s an ongoing commitment to refinement and recalibration. Moreover, transparency is key. Developers should be proactive in highlighting potential concerns related to their models and fostering a culture where the public is not just a passive victim but an active participant in the model alignment process.

    To round off, it’s essential for developers to adopt a forward-thinking mindset. The decisions made today in the AI labs and coding chambers will shape the world of tomorrow. Thus, every developer should think about the long-term consequences of their work, always aiming to ensure that AI not only dazzles with its brilliance but also remains beneficial for generations to come.

    -

    How might AI developers’ responsibility evolve?

    +

    How might AI developers' responsibility evolve?

    It’s impossible to catch all edge cases. As AI systems grow in complexity, predicting every potential outcome or misalignment becomes a herculean task. Developers, in the future, might need to shift from a perfectionist mindset to one that emphasizes robustness and adaptability. While it’s essential to put in rigorous engineering effort to minimize errors, it’s equally crucial to understand and communicate that no system can be flawless.

    -

    Besides, given that catching all cases isn’t feasible, developers’ roles might evolve to include more dynamic and real-time monitoring of AI systems. This would involve continuously learning from real-world interactions, gathering feedback, and iterating on the model to ensure better alignment with human values.

    +

    Besides, given that catching all cases isn’t feasible, developers' roles might evolve to include more dynamic and real-time monitoring of AI systems. This would involve continuously learning from real-world interactions, gathering feedback, and iterating on the model to ensure better alignment with human values.

    The Alignment Problem from a Deep Learning Perspective

    In this part of today’s seminar, the whole class was divided into 3 groups to discuss the possible alignment problems from a deep learning perspective. Specifically, three groups were focusing on the alignment problems regarding different categories of Deep Learning methods, which are:

      @@ -3820,241 +3851,6 @@

      Discussion Questions


      -

      Week 1: Introduction

      - - -
      -

      (see bottom for assigned readings and questions)

      -

      Attention, Transformers, and BERT

      -

      Monday, 28 August

      -

      Transformers1 are a class of deep learning models that have -revolutionized the field of natural language processing (NLP) and -various other domains. The concept of transformers originated as an -attempt to address the limitations of traditional recurrent neural -networks (RNNs) in sequential data processing. Here’s an overview of -transformers’ evolution and significance.

      -

      Background and Origin

      -

      RNNs2 were one of the earliest models used for sequence-based tasks in machine learning. They processed input tokens one after another and used their internal memory to capture dependencies in the sequence. The following figure gives an illustration of the RNN architecture.

      -
      -
      -

      RNN (Image Source)

      -
      -

      Limitations of RNNs. -Despite many improvements over this basic architecture, RNNs have the following shortcomings:

      -
        -
      • RNNs struggle with long sequences. It only keeps recent information but looses long-term memory.
      • -
      • RNNs suffer from vanishing gradients3. In this, the gradients that are used to update the model become very small during back propagation, leading the RNNs to learn nothing from training.
      • -
      -

      Introduction of LSTMs. -Long Short-Term Memory (LSTM)4 networks were then introduced to address the vanishing gradient problem in RNNs. LSTMs had memory cells and gating mechanisms that allowed them to capture long-term memories more effectively. While LSTMs improved memory retention, they were still computationally expensive and slow to train, especially on large datasets.

      -

      Attention Mechanism. -The attention mechanism561 was introduced as a way to help models focus on relevant parts of the input sequence when generating output. This addressed the memory issues that plagued previous models. -Attention mechanisms allowed models to weigh the importance of different input tokens when making predictions or encoding information. In essence, it enables the model to focus selectively on relevant parts of the input sequence while disregarding less pertinent ones. In practice, attention mechanism can be categorized into self-attention and multi-head attention based on the number of heads used in the attention structure.

      -

      The Transformer Model

      -

      The transformer architecture, introduced by Vaswani et al. (2017) 1, marked a significant advance in NLP. It used self-attention mechanisms to process input tokens in parallel and capture contextual information more effectively. -Transformers broke down sentences into smaller parts and learned statistical relationships between these parts to understand meaning and generate responses. -The model utilized input embeddings to represent words and positional encodings to address the lack of inherent sequence information. -The core innovation was the self-attention mechanism, which allowed tokens to consider their relationships with all other tokens in the sequence

      -

      Benefits of Transformers. -Transformers can capture complex contextual relationships in language, making them highly effective for a wide range of NLP tasks. -The parallel processing capabilities of transformers, enabled by self-attention, drastically improved training efficiency and reduced the vanishing gradient problem.

      -

      Mathematical Foundations. -Transformers involve mathematical representations of words and their relationships. The model learns to establish connections between words based on their contextual importance.

      -

      Crucial Role in NLP. -Transformers play a crucial role in capturing the meaning of words and sentences78, allowing for more accurate and contextually relevant outputs in various NLP tasks. -In summary, transformers, with their innovative attention mechanisms, have significantly advanced the field of NLP by enabling efficient processing of sequences, capturing context effectively, and achieving state-of-the-art performance on a variety of tasks.

      -

      Advancements in Transformers. -One significant advancement of transformers over previous models like LSTMs and RNNs is their ability to handle long-range dependencies and capture contextual information more effectively. Transformers achieve this through self-attention and multi-head attention. This allows them to process input tokens in parallel, rather than sequentially, leading to improved efficiency and performance. However, a drawback could be increased computational complexity due to the parallel processing, especially in multi-head attention.

      -

      Positional Encodings. -The use of positional encodings in transformers helps address the lack of inherent positional information in their architecture. This enables transformers to handle sequential data effectively without relying solely on the order of tokens. The benefits include scalability and the ability to handle longer sequences, but a potential drawback is that these positional encodings might not fully capture complex positional relationships in very long sequences.

      -

      Self-Attention and Multi-Head Attention. -Self-attention is a useful mechanism that allows each token to consider the relationships between all other tokens in a sequence. While it provides a more nuanced understanding of input, it can be computationally expensive. The use of multi-head attention further enhances the model’s ability to capture different types of dependencies in the data. The number of attention heads (e.g., 8 in BERT) is a balance between performance and complexity. Too few or too many heads can result in suboptimal performance. More details about self-attention and multi-head attention can be found in 9.

      -

      Context and Answers in Activities. -Let’s do some activity now!

      -
      I used to ___ 
      -
      -Yesterday, I went to ___
      -
      -It is raining ___
      -

      The context given in the activities influences the answers provided. More context leads to more accurate responses. This highlights how models like BERT benefit from bidirectional attention, as they can consider both preceding and succeeding words when generating predictions.

      -

      BERT: Bidirectional Transformers

      -

      BERT’s Design and Limitations. -BERT10 uses bidirectional attention and masking to enable it to capture context from both sides of a word. The masking during training helps the model learn to predict words in context, simulating its real-world usage. While BERT’s design was successful, it does require a substantial amount of training data and resources. Its application may be more focused on tasks such as sentiment analysis, named entity recognition, and Question answering, while GPT is better at handling tasks such as content creation, text summarization, and machine translation11.

      -
      -
      -

      Image Source

      -
      -

      Future Intent of BERT Authors. -The authors of BERT might not have fully anticipated its exact future use and impact. While they likely foresaw its usefulness, the swift and extensive adoption of language models across diverse applications likely surpassed their expectations. The increasing accessibility and scalability of technology likely contributed to this rapid adoption. As mentioned by the professor in class, the decision to publish something in industry (and at Google in particular) often depends on its perceived commercial value. If Google were aware of the future commercial value of transformers and the methods introduced by BERT, they may not have published these papers openly (although this is purely speculation without any knowledge of the internal process that might have been followed to publish these papers).

      -

      Discussion Questions

      -
      -

      Q: What makes language models different from transformers?

      -
      -

      A language model encompasses various models that understand language, whereas transformers represent a specific architecture. Language models are tailored for natural languages, while transformers have broader applications. For example, transformers can be utilized in tasks beyond language processing, such as predicting protein structures from genomic sequences (as done by AlphaFold).

      -
      -

      Q: Why was BERT published in 2019, inspiring large language models, and why have GPT models continued to improve while BERT’s advancements seem comparatively limited?

      -
      -

      Decoder models, responsible for generating content, boast applications that are both visible and instantly captivating to the public. Examples like chatbots, story generators, and models from the GPT series showcase this ability by producing human-like text. This immediate allure likely fuels increased research and investment. Due to the inherent challenges in producing coherent and contextually appropriate outputs, generative tasks have garnered significant research attention. Additionally, decoder models, especially transformers like GPT-212 and GPT-313, excel in transfer learning, allowing pre-trained models to be fine-tuned for specific tasks, highlighting their remarkable adaptability.

      -
      -

      Q: Why use 8-headers in the transformer architecture?

      -
      -

      The decision to use 8 attention heads is a deliberate choice that strikes a balance between complexity and performance. Having more attention heads can capture more intricate relationships but increases computational demands, whereas fewer heads might not capture as much detail.

      -
      -

      Q: BERT employs bidirectional context to pretrain its embeddings, but there is debate about whether this approach genuinely captures the entirety of language context?

      -
      -

      The debate arises from the fact that while bidirectional context is powerful, it might not always capture more complex contextual relationships, such as those involving long-range dependencies or nuanced interactions between distant words. Some argue that models with other architectures or training techniques might better capture such intricate language nuances.

      -

      Wednesday: Training LLMs, Risks and Rewards

      -

      In the second class discussion, the team talked about LLMs and tried to make sense of how they’re trained, where they get their knowledge, and where they’re used. Here’s what they found out.

      -

      How do LLMs become so clever?

      -

      Before LLMs become language wizards, they need to be trained. The crucial question is where they acquire their knowledge.

      -

      LLMs need lots and lots of information to learn from. They look at stuff like internet articles, books, and even Wikipedia. But there’s a catch. They have a clean-up crew called “C4” to make sure the information is tidy and reliable.

      -

      Training LLMs requires potent computational resources, such as Graphics Processing Units (GPUs). Computationally-expensive large-scale training, while crucial for enhancing their capabilities, involves substantial energy consumption, which, depending on how it is produces may emit large amounts of carbon dioxide.

      -

      Transitioning to the practical applications of these language models, LLMs excel in diverse domains14. They can undergo meticulous fine-tuning to perform specialized tasks, ranging from aiding in customer service to content generation for websites. Furthermore, these models exhibit the ability to adapt and learn from feedback, mirroring human learning processes.

      -

      Risks and Rewards

      -

      In our class discussion, we had a friendly debate about LLMs. Some students thought they were fantastic because they can boost productivity, assist with learning, and bridge gaps between people. They even saw LLMs as potential problem solvers for biases in the human world.

      -

      But others had concerns. They worried about things like LLMs being too mysterious (like a black box), how they could influence the way people think, and the risks of false information and deep fakes. Some even thought that LLMs might detrimentally impact human intelligence and creativity.

      -

      In our debate, there were some interesting points made:

      -

      Benefits Group.

      -
        -
      • LLMs can enhance creativity and accelerate tasks.
      • -
      • They have the potential to facilitate understanding and learning.
      • -
      • Utilizing LLMs may streamline the search for ideas.
      • -
      • LLMs offer a tool for uncovering and rectifying biases within our human society. Unlike human biases, there are technical approaches to mitigate biases in models.
      • -
      -

      Risks Group.

      -
        -
      • Concerns were expressed regarding LLMs’ opacity and complexity, making them challenging to comprehend.
      • -
      • Apprehensions were raised about LLMs potentially exerting detrimental influences on human cognition and societal dynamics.
      • -
      • LLMs are ripe for potential abuses in their ability to generate convincing false information cheaply.
      • -
      • The potential impact of LLMs on human intelligence and creativity was a topic of contemplation.
      • -
      -

      After the debate, both sides had a chance to respond:

      -

      Benefits Group Rebuttals.

      -
        -
      • Advocates pointed out that ongoing research aims to enhance the transparency of LLMs, reducing their resemblance to black boxes.
      • -
      • They highlighted collaborative efforts directed at the improvement of LLMs.
      • -
      • The significance and potential of LLMs in domains such as medicine and engineering was emphasized.
      • -
      • Although the ability of generative AI to produce art in the style of an artist is damaging to the career of that artist, it is overall beneficial to society, enabling many others to create desired images.
      • -
      • Addressing economic concerns, proponents saw LLMs as catalysts for the creation of new employment opportunities and enhancers of human creativity.
      • -
      -

      Risks Group Rebuttals.

      -
        -
      • They noted the existence of translation models and the priority of fairness in AI.
      • -
      • Advocates asserted that LLMs can serve as tools to identify and mitigate societal biases.
      • -
      • The point was made that AI can complement, rather than supplant, human creativity.
      • -
      • Although generating AI art may have immediate benefits to its users, it has long term risks to our culture and society if individuals are no longer able to make a living as artists or find the motivation to learn difficult skills.
      • -
      -

      Wrapping It Up. -So, there you have it, a peek into the world of Large Language Models and the lively debate about their pros and cons. As you explore the world of LLMs, remember that they have the power to be amazing tools, but they also come with responsibilities. Use them wisely, consider their impact on our world, and keep the discussion going!

      -

      Readings

      -

      Introduction to Large Language Models (from Stanford course)

      -

      Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention Is All You Need. https://arxiv.org/abs/1706.03762. NeurIPS 2017.

      -

      These two blog posts by Jay Alammar are not required readings but may be helpful for understanding attention and Transformers:

      - -

      Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ACL 2019.

      -

      Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel. Ethical and social risks of harm from Language -Models DeepMind, 2021. https://arxiv.org/abs/2112.04359

      -

      Optional Additional Readings:

      -

      Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). On the Opportunities and Risks of Foundation Models

      -

      Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. Conference of the North American Chapter of the Association for Computational Linguistics, 2018.

      -

      GPT1: Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. 2018.

      -

      GPT2: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners, 2019.

      -

      GPT3: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners, 2020.

      -

      Discussion Questions

      -

      Before 5:29pm on Sunday, August 27, everyone who is not in either the lead or blogging team for the week should post (in the comments below) an answer to at least one of these three questions in the first section (1–3) and one of the questions in the section section (4–7), or a substantive response to someone else’s comment, or something interesting about the readings that is not covered by these questions.

      -

      Don’t post duplicates - if others have already posted, you should read their responses before adding your own.

      -

      Questions about “Attention is All You Need” and “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”:

      -
        -
      1. -

        Many things in the paper (especially “Attention is All You Need”) seem mysterious and arbitrary. Identify one design decision described in the paper that seems arbitrary, and possible alternatives. If you can, hypothesize on why the one the authors made was selected and worked.

        -
      2. -
      3. -

        What were the key insights that led to the Transformers/BERT design?

        -
      4. -
      5. -

        What is something you don’t understand in the paper?

        -
      6. -
      -

      ===

      -

      Questions about “Ethical and social risks of harm from Language Models”

      -
        -
      1. -

        The paper identifies six main risk areas and 21 specific risks. Do you agree with their choices? What are important risks that are not included in their list?

        -
      2. -
      3. -

        The authors are at a company (DeepMind, part of Google/Alphabet). How might their company setting have influenced the -way they consider risks?

        -
      4. -
      5. -

        This was written in December 2021 (DALL-E was released in January 2021; ChatGPT was released in November 2022; GPT-4 was released in March 2023). What has changed since then that would have impacted perception of these risks?

        -
      6. -
      7. -

        Because training and operating servers typically requires fresh water and fossil fuels, how should we think about the environmental harms associated with LLMs?

        -
      8. -
      9. -

        The near and long-term impact of LLMs on employment is hard to predict. What jobs do you think are vulnerable to LLMs beyond the (seemingly) obvious ones mentioned in the paper? What are some jobs you think will be most resilient to advances in AI?

        -
      10. -
      -
      -
      -
        -
      1. -

        Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. ↩︎ ↩︎ ↩︎

        -
      2. -
      3. -

        Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. ↩︎

        -
      4. -
      5. -

        Pascanu, R., Mikolov, T., & Bengio, Y. (2013, May). On the difficulty of training recurrent neural networks. In International conference on machine learning (pp. 1310-1318). Pmlr. ↩︎

        -
      6. -
      7. -

        Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. ↩︎

        -
      8. -
      9. -

        Mnih, V., Heess, N., & Graves, A. (2014). Recurrent models of visual attention. Advances in neural information processing systems, 27. ↩︎

        -
      10. -
      11. -

        Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. ↩︎

        -
      12. -
      13. -

        Lin, T., Wang, Y., Liu, X., & Qiu, X. (2022). A survey of transformers. AI Open. ↩︎

        -
      14. -
      15. -

        Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s), 1-41. ↩︎

        -
      16. -
      17. -

        Karim, R. (2023, January 2). Illustrated: Self-attention. Medium. https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a ↩︎

        -
      18. -
      19. -

        Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. ↩︎

        -
      20. -
      21. -

        Ahmad, K. (2023b, April 26). GPT vs. Bert: What are the differences between the two most popular language models?. MUO. https://www.makeuseof.com/gpt-vs-bert/ ↩︎

        -
      22. -
      23. -

        Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. ↩︎

        -
      24. -
      25. -

        Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. ↩︎

        -
      26. -
      27. -

        Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., … & Hu, X. (2023). Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712. ↩︎

        -
      28. -
      -
      - -
      -
      - -
      diff --git a/index.xml b/index.xml index a5e99e3..c16511e 100644 --- a/index.xml +++ b/index.xml @@ -8,7 +8,17 @@ en-us evans@virginia.edu (David Evans) evans@virginia.edu (David Evans) - Mon, 13 Nov 2023 00:00:00 +0000 + Mon, 20 Nov 2023 00:00:00 +0000 + + Week 12: Regulating Dangerous Technologies + https://llmrisks.github.io/week12/ + Mon, 20 Nov 2023 00:00:00 +0000 + evans@virginia.edu (David Evans) + https://llmrisks.github.io/week12/ + The slides are here: Regulating Dangerous Technologies (I’ve included some slides in the posted slides that I didn’t present in class but you might find interesting, including some excerpts from a talk I gave in 2018 on Mutually Assured Destruction and the Impending AI Apocalypse.) +Since one of the groups made the analogy to tobacco products, I also will take the liberty of pointing to a talk I gave at Google making a similar analogy: The Dragon in the Room. + + Week 11: Watermarking on Generative Models https://llmrisks.github.io/week11/ @@ -27,7 +37,8 @@ Monday, November 6: Watermarking LLM Outputs Recent instances of AI-generated te evans@virginia.edu (David Evans) https://llmrisks.github.io/week10/ (see bottom for assigned readings and questions) -Presenting Team: Haolin Liu, Xueren Ge, Ji Hyun Kim, Stephanie Schoch Blogging Team: Aparna Kishore, Elena Long, Erzhen Hu, Jingping Wan +Presenting Team: Haolin Liu, Xueren Ge, Ji Hyun Kim, Stephanie Schoch +Blogging Team: Aparna Kishore, Elena Long, Erzhen Hu, Jingping Wan Monday, 30 October: Data Selection for Fine-tuning LLMs Question: Would more models help? We’ve discussed so many risks and issues of GenAI so far and one question is that it can be difficult for us to come up with a possible solution to these problems. @@ -40,7 +51,7 @@ Monday, 30 October: Data Selection for Fine-tuning LLMs Question: Would more mod (see bottom for assigned readings and questions) Presenting Team: Anshuman Suri, Jacob Christopher, Kasra Lekan, Kaylee Liu, My Dinh Blogging Team: Hamza Khalid, Liu Zhe, Peng Wang, Sikun Guo, Yinhan He, Zhepei Wei -Monday, 23 October: Interpretability: Overview, Limitations, & Challenges Definition of Interpretability Interpretability in the context of artificial intelligence (AI) and machine learning refers to the extent to which a model’s decisions, predictions, or internal workings can be understood and explained by humans. +Monday, 23 October: Interpretability: Overview, Limitations, & Challenges Definition of Interpretability Interpretability in the context of artificial intelligence (AI) and machine learning refers to the extent to which a model’s decisions, predictions, or internal workings can be understood and explained by humans. @@ -62,8 +73,10 @@ Monday, 16 Oct: Diving into the History of Machine Translation Let’s k evans@virginia.edu (David Evans) https://llmrisks.github.io/week7/ (see bottom for assigned readings and questions) -Presenting Team: Aparna Kishore, Elena Long, Erzhen Hu, Jingping Wan Blogging Team: Haochen Liu, Haolin Liu, Ji Hyun Kim, Stephanie Schoch, Xueren Ge Monday, 9 October: Generative Adversarial Networks and DeepFakes Today's topic is how to utilize generative adversarial networks to create fake images and how to identify the images generated by these models. -Generative Adversarial Network (GAN) is a revolutionary deep learning framework that pits two neural networks against each other in a creative showdown. +Presenting Team: Aparna Kishore, Elena Long, Erzhen Hu, Jingping Wan +Blogging Team: Haochen Liu, Haolin Liu, Ji Hyun Kim, Stephanie Schoch, Xueren Ge +Monday, 9 October: Generative Adversarial Networks and DeepFakes Today's topic is how to utilize generative adversarial networks to create fake images and how to identify the images generated by these models. + Generative Adversarial Network (GAN) is a revolutionary deep learning framework that pits two neural networks against each other in a creative showdown. @@ -75,7 +88,7 @@ Generative Adversarial Network (GAN) is a revolutionary deep learning framework (see bottom for assigned readings and questions) Hallucination (Week 5) Presenting Team: Liu Zhe, Peng Wang, Sikun Guo, Yinhan He, Zhepei Wei Blogging Team: Anshuman Suri, Jacob Christopher, Kasra Lekan, Kaylee Liu, My Dinh -Wednesday, September 27th: Intro to Hallucination People Hallucinate Too Hallucination Definition There are three types of hallucinations according to the “Siren's Song in the AI Ocean” paper: Input-conflict: This subcategory of hallucinations deviates from user input. Context-conflict: Context-conflict hallucinations occur when a model generates contradicting information within a response. +Wednesday, September 27th: Intro to Hallucination People Hallucinate Too Hallucination Definition There are three types of hallucinations according to the “Siren's Song in the AI Ocean” paper: Input-conflict: This subcategory of hallucinations deviates from user input. @@ -97,8 +110,9 @@ Monday, September 18 Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qiz evans@virginia.edu (David Evans) https://llmrisks.github.io/week3/ (see bottom for assigned readings and questions) -Prompt Engineering (Week 3) Presenting Team: Haolin Liu, Xueren Ge, Ji Hyun Kim, Stephanie Schoch Blogging Team: Aparna Kishore, Erzhen Hu, Elena Long, Jingping Wan -(Monday, 09/11/2023) Prompt Engineering Warm-up questions What is Prompt Engineering? How is prompt-based learning different from traditional supervised learning? In-context learning and different types of prompts What is the difference between prompts and fine-tuning? When is the best to use prompts vs fine-tuning? +Prompt Engineering (Week 3) Presenting Team: Haolin Liu, Xueren Ge, Ji Hyun Kim, Stephanie Schoch +Blogging Team: Aparna Kishore, Erzhen Hu, Elena Long, Jingping Wan + (Monday, 09/11/2023) Prompt Engineering Warm-up questions What is Prompt Engineering? How is prompt-based learning different from traditional supervised learning? In-context learning and different types of prompts What is the difference between prompts and fine-tuning? When is the best to use prompts vs fine-tuning? @@ -108,7 +122,7 @@ Prompt Engineering (Week 3) Presenting Team: Haolin Liu, Xueren Ge, Ji Hyun Kim, evans@virginia.edu (David Evans) https://llmrisks.github.io/week2/ (see bottom for assigned readings and questions) -Table of Contents (Monday, 09/04/2023) Introduction to Alignment Introduction to AI Alignment and Failure Cases Discussion Questions The Alignment Problem from a Deep Learning Perspective Group of RL-based methods Group of LLM-based methods Group of Other ML methods (Wednesday, 09/06/2023) Alignment Challenges and Solutions Opening Discussion Introduction to Red-Teaming In-class Activity (5 groups) How to use Red-Teaming? Alignment Solutions LLM Jailbreaking - Introduction LLM Jailbreaking - Demo Observations Potential Improvement Ideas Closing Remarks (by Prof. +Table of Contents (Monday, 09/04/2023) Introduction to Alignment Introduction to AI Alignment and Failure Cases Discussion Questions The Alignment Problem from a Deep Learning Perspective Group of RL-based methods Group of LLM-based methods Group of Other ML methods (Wednesday, 09/06/2023) Alignment Challenges and Solutions Opening Discussion Introduction to Red-Teaming In-class Activity (5 groups) How to use Red-Teaming? @@ -119,7 +133,7 @@ Table of Contents (Monday, 09/04/2023) Introduction to Alignment Introduction to https://llmrisks.github.io/week1/ (see bottom for assigned readings and questions) Attention, Transformers, and BERT Monday, 28 August -Transformers1 are a class of deep learning models that have revolutionized the field of natural language processing (NLP) and various other domains. The concept of transformers originated as an attempt to address the limitations of traditional recurrent neural networks (RNNs) in sequential data processing. Here’s an overview of transformers’ evolution and significance. +Transformers1 are a class of deep learning models that have revolutionized the field of natural language processing (NLP) and various other domains. The concept of transformers originated as an attempt to address the limitations of traditional recurrent neural networks (RNNs) in sequential data processing. Here’s an overview of transformers' evolution and significance. Background and Origin RNNs2 were one of the earliest models used for sequence-based tasks in machine learning. @@ -143,7 +157,7 @@ Once you’ve accepted the invitation, you should be able to visit https https://llmrisks.github.io/class0/ I’ve updated the Schedule and Bi-Weekly Schedule based on the discussions today. The plan is below: -Week Lead Team Blogging Team Everyone Else Two Weeks Before Come up with idea for the week and planned readings, send to me by 5:29pm on Tuesday (2 weeks - 1 day before) - - Week Before Post plan and questions in github discussions by no later than 9am Wednesday; prepare for leading meetings Prepare plan for blogging (how you will divide workload, collaborative tools for taking notes and writing) Read/do materials and respond to preparation questions in github discussions (by 5:29pm Sunday) Week of Leading Meetings Lead interesting, engaging, and illuminating meetings! + Week Lead Team Blogging Team Everyone Else Two Weeks Before Come up with idea for the week and planned readings, send to me by 5:29pm on Tuesday (2 weeks - 1 day before) - - Week Before Post plan and questions in github discussions by no later than 9am Wednesday; prepare for leading meetings Prepare plan for blogging (how you will divide workload, collaborative tools for taking notes and writing) Read/do materials and respond to preparation questions in github discussions (by 5:29pm Sunday) Week of Leading Meetings Lead interesting, engaging, and illuminating meetings! @@ -153,7 +167,7 @@ Week Lead Team Blogging Team Everyone Else Two Weeks Before Come up with idea fo evans@virginia.edu (David Evans) https://llmrisks.github.io/weeklyschedule/ This is the regular bi-weekly schedule: -Week Lead Team Blogging Team Everyone Else Two Weeks Before Come up with idea for the week and planned readings, send to me by 5:29pm on Tuesday (2 weeks - 1 day before) - - Week Before Post plan and questions in github discussions by no later than 9am Wednesday; prepare for leading meetings Prepare plan for blogging (how you will divide workload, collaborative tools for taking notes and writing) Read/do materials and respond to preparation questions in github discussions (by 5:29pm Sunday) Week of Leading Meetings Lead interesting, engaging, and illuminating meetings! + Week Lead Team Blogging Team Everyone Else Two Weeks Before Come up with idea for the week and planned readings, send to me by 5:29pm on Tuesday (2 weeks - 1 day before) - - Week Before Post plan and questions in github discussions by no later than 9am Wednesday; prepare for leading meetings Prepare plan for blogging (how you will divide workload, collaborative tools for taking notes and writing) Read/do materials and respond to preparation questions in github discussions (by 5:29pm Sunday) Week of Leading Meetings Lead interesting, engaging, and illuminating meetings! @@ -177,7 +191,7 @@ Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training https://llmrisks.github.io/schedule/ The schedule details will be filled in as the semester progresses (and future weeks are subject to change, but as much as is known is documented here). See Weekly Schedule for the bi-weekly expectations for each team. -Week Lead Team Blog Team Topic 0: 23 Aug Dave Starting the Seminar 1: 28/30 Aug 14 2: 4/6 Sep 25 3: 11/13 Sep 36 4: 18/20 Sep 41 5: 25/27 Sep 52 6: 4 Oct TBD (2 Oct is Fall Classes Break) 7: 9/11 Oct 63 8: 16/18 Oct 14 9: 23/25 Oct 25 10: 30 Oct/1 Nov 36 11: 6/8 Nov 41 14: 13/15 Nov 5 2 15: 20 Nov TBD (22 Nov is Thanksgiving Break) 16: 27/29 Nov 6 3 17: 4 Dec TBD (Last meeting is 4 December) Leading Team Schedule As the leading team, your job is to select a worthwhile topic, decide on a reading assignment (which can include things other than reading and is not limited to typical research papers) for the class, write questions that the class should write responses to in preparation for the discussion, and lead an interesting, engaging, and illuminating class! + Week Lead Team Blog Team Topic 0: 23 Aug Dave Starting the Seminar 1: 28/30 Aug 14 2: 4/6 Sep 25 3: 11/13 Sep 36 4: 18/20 Sep 41 5: 25/27 Sep 52 6: 4 Oct TBD (2 Oct is Fall Classes Break) 7: 9/11 Oct 63 8: 16/18 Oct 14 9: 23/25 Oct 25 10: 30 Oct/1 Nov 36 11: 6/8 Nov 41 14: 13/15 Nov 5 2 15: 20 Nov TBD (22 Nov is Thanksgiving Break) 16: 27/29 Nov 6 3 17: 4 Dec TBD (Last meeting is 4 December) Leading Team Schedule As the leading team, your job is to select a worthwhile topic, decide on a reading assignment (which can include things other than reading and is not limited to typical research papers) for the class, write questions that the class should write responses to in preparation for the discussion, and lead an interesting, engaging, and illuminating class! @@ -187,7 +201,8 @@ Week Lead Team Blog Team Topic 0: 23 Aug Dave Starting the Seminar 1: 28/30 Aug evans@virginia.edu (David Evans) https://llmrisks.github.io/updates/ Some materials have been posted on the course site: -Syllabus Schedule (you will find out which team you are on at the first class Wednesday) Readings and Topics (a start on a list of some potential readings and topics that we might want to cover) Dall-E Prompt: "comic style drawing of a phd seminar on AI" + Syllabus Schedule (you will find out which team you are on at the first class Wednesday) Readings and Topics (a start on a list of some potential readings and topics that we might want to cover) +Dall-E Prompt: "comic style drawing of a phd seminar on AI" @@ -197,8 +212,8 @@ Syllabus Schedule (you will find out which team you are on at the first class We evans@virginia.edu (David Evans) https://llmrisks.github.io/survey/ Please submit this welcome survey before 8:59pm on Monday, August 21: -https://forms.gle/dxhFmJH7WRs32s1ZA -Your answers won’t be shared publicly, but I will use the responses to the survey to plan the seminar, including forming teams, and may share some aggregate and anonymized results and anonymized quotes from the surveys. + https://forms.gle/dxhFmJH7WRs32s1ZA + Your answers won’t be shared publicly, but I will use the responses to the survey to plan the seminar, including forming teams, and may share some aggregate and anonymized results and anonymized quotes from the surveys. @@ -208,7 +223,7 @@ Your answers won’t be shared publicly, but I will use the responses to evans@virginia.edu (David Evans) https://llmrisks.github.io/welcome/ Full Transcript -Seminar Plan The actual seminar won’t be fully planned by GPT-4, but more information on it won’t be available until later. + Seminar Plan The actual seminar won’t be fully planned by GPT-4, but more information on it won’t be available until later. I’m expecting the structure and format to that combines aspects of this seminar on adversarial machine learning and this course on computing ethics, but with a topic focused on learning as much as we can about the potential for both good and harm from generative AI (including large language models) and things we can do (mostly technically, but including policy) to mitigate the harms. @@ -232,7 +247,7 @@ Expected Background: Students are not required to have prior background in machi https://llmrisks.github.io/blogging/ Here are some suggestions for how to create the class blog posts for your assigned classes. I believe each team has at least a few members with enough experience using git and web contruction tools that following these instructions won’t be a big burden, but if you have other ways you want to build your blog page for a topic let me know and we can discuss alternative options. -Install Hugo. + Install Hugo. diff --git a/post/index.html b/post/index.html index 3723695..7756352 100644 --- a/post/index.html +++ b/post/index.html @@ -83,6 +83,21 @@
      +

      Week 12: Regulating Dangerous Technologies

      + + + +The slides are here: Regulating Dangerous Technologies (I’ve included some slides in the posted slides that I didn’t present in class but you might find interesting, including some excerpts from a talk I gave in 2018 on Mutually Assured Destruction and the Impending AI Apocalypse.) +Since one of the groups made the analogy to tobacco products, I also will take the liberty of pointing to a talk I gave at Google making a similar analogy: The Dragon in the Room. +

      Read More…

      + + +

      Week 11: Watermarking on Generative Models

    -
    +

      -
    1. -

      Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. ↩︎ ↩︎ ↩︎

      +
    2. +

      Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. ↩︎

    3. -
    4. +
    5. Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. ↩︎

    6. -
    7. +
    8. Pascanu, R., Mikolov, T., & Bengio, Y. (2013, May). On the difficulty of training recurrent neural networks. In International conference on machine learning (pp. 1310-1318). Pmlr. ↩︎

    9. -
    10. +
    11. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. ↩︎

    12. -
    13. +
    14. Mnih, V., Heess, N., & Graves, A. (2014). Recurrent models of visual attention. Advances in neural information processing systems, 27. ↩︎

    15. -
    16. +
    17. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. ↩︎

    18. -
    19. +
    20. Lin, T., Wang, Y., Liu, X., & Qiu, X. (2022). A survey of transformers. AI Open. ↩︎

    21. -
    22. +
    23. Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s), 1-41. ↩︎

    24. -
    25. +
    26. Karim, R. (2023, January 2). Illustrated: Self-attention. Medium. https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a ↩︎

    27. -
    28. +
    29. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. ↩︎

    30. -
    31. +
    32. Ahmad, K. (2023b, April 26). GPT vs. Bert: What are the differences between the two most popular language models?. MUO. https://www.makeuseof.com/gpt-vs-bert/ ↩︎

    33. -
    34. +
    35. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. ↩︎

    36. -
    37. +
    38. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. ↩︎

    39. -
    40. +
    41. Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., … & Hu, X. (2023). Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712. ↩︎

    -
    + diff --git a/week10/index.html b/week10/index.html index 0b95bd8..a0afc76 100644 --- a/week10/index.html +++ b/week10/index.html @@ -258,7 +258,7 @@

    Question: Why is there a discrepancy between crowdworker (and GPT-4) preferences evaluation and automatic benchmark evaluation?

    -

    Authors’ conclusion: Limitation models can learn style, but not factuality.

    +

    Authors' conclusion: Limitation models can learn style, but not factuality.

    @@ -619,18 +619,18 @@

    Discussion Questions

    In The Curse of Recursion: Training on Generated Data Makes Models Forget, the authors rely on several assumptions to support their arguments. How strong those assumptions are and do you think these assumptions limit its applicability to broader contexts?

    -
    +

      -
    1. +
    2. Gudibande, A., Wallace, E., Snell, C., Geng, X., Liu, H., Abbeel, P., Levine, S. and Song, D., 2023. The false promise of imitating proprietary llms. arXiv preprint arXiv:2305.15717. ↩︎

    -
    +
    - + diff --git a/week11/index.html b/week11/index.html index 9388667..1fbd495 100644 --- a/week11/index.html +++ b/week11/index.html @@ -642,6 +642,8 @@

    Questions:

  • « Previous page: Week 10: Data Selection for LLMs
  • +
  • Next page: Week 12: Regulating Dangerous Technologies »
  • + diff --git a/week12/index.html b/week12/index.html new file mode 100644 index 0000000..26812e1 --- /dev/null +++ b/week12/index.html @@ -0,0 +1,180 @@ + + + + + Week 12: Regulating Dangerous Technologies | Risks (and Benefits) of Generative AI and Large Language Models + + + + + + + + + + + + + + + + + + + + + + + + +
    + + +
    + +
    + +
    +
    +
    + +

    Week 12: Regulating Dangerous Technologies

    + + +
    +

    The slides are here: Regulating Dangerous Technologies (I’ve included some slides in the posted slides that I didn’t present in class but you might find interesting, including some excerpts from a talk I gave in 2018 on Mutually Assured Destruction and the Impending AI Apocalypse.)

    +

    Since one of the groups made the analogy to tobacco products, I also will take the liberty of pointing to a talk I gave at Google making a similar analogy: The Dragon in the Room.

    +

    Stephanie made the point after class about how important individuals +making brave decisions is to things working out, in particular with +humanity (so far!) avoiding annihilating ourselves with nuclear +weapons. Stanislav Petrov may well have been the single person between +us and nuclear destruction in 1983, when he prevented an alert (which +he correctly determined was a false alarm) produced by the Soviet +detection system from going up the chain. Here’s one (of many) +articles on this: ‘I Had A Funny Feeling in My +Gut’, +Washington Post, 10 Feb 1999. There is still a lot of uncertainty and +skepticism if we should be fearing any kind of out-of-control AI risk, +but it is not so hard to imagine scenarios where our fate will +similarly come down to an individual’s decision at a critical juncture.

    + +
    + + + + +
    + + + +
    +
    + +
    + + + + + + +
    +
    + + + + + + + + + + + + + + + diff --git a/week2/index.html b/week2/index.html index c78f48e..66b5705 100644 --- a/week2/index.html +++ b/week2/index.html @@ -203,7 +203,7 @@

    Introduction to AI Align (Feng et al.) show that famous models like BERT and ChatGPT do appear to have socioeconomic political leanings (of course, there is no true -neutral'' or center’’ position, these are just defined by where +neutral'' or center'' position, these are just defined by where the expected distribution of beliefs lies).

    Figure 1 shows the political leanings of famous LLMs.

    @@ -288,10 +288,10 @@

    Discussion Questions

    However, a developer’s responsibility doesn’t culminate once the AI product hits the market. The journey is continuous. Post-deployment, it’s crucial for developers to monitor the system’s alignment with human values and rectify any deviations. It’s an ongoing commitment to refinement and recalibration. Moreover, transparency is key. Developers should be proactive in highlighting potential concerns related to their models and fostering a culture where the public is not just a passive victim but an active participant in the model alignment process.

    To round off, it’s essential for developers to adopt a forward-thinking mindset. The decisions made today in the AI labs and coding chambers will shape the world of tomorrow. Thus, every developer should think about the long-term consequences of their work, always aiming to ensure that AI not only dazzles with its brilliance but also remains beneficial for generations to come.

    -

    How might AI developers’ responsibility evolve?

    +

    How might AI developers' responsibility evolve?

    It’s impossible to catch all edge cases. As AI systems grow in complexity, predicting every potential outcome or misalignment becomes a herculean task. Developers, in the future, might need to shift from a perfectionist mindset to one that emphasizes robustness and adaptability. While it’s essential to put in rigorous engineering effort to minimize errors, it’s equally crucial to understand and communicate that no system can be flawless.

    -

    Besides, given that catching all cases isn’t feasible, developers’ roles might evolve to include more dynamic and real-time monitoring of AI systems. This would involve continuously learning from real-world interactions, gathering feedback, and iterating on the model to ensure better alignment with human values.

    +

    Besides, given that catching all cases isn’t feasible, developers' roles might evolve to include more dynamic and real-time monitoring of AI systems. This would involve continuously learning from real-world interactions, gathering feedback, and iterating on the model to ensure better alignment with human values.

    The Alignment Problem from a Deep Learning Perspective

    In this part of today’s seminar, the whole class was divided into 3 groups to discuss the possible alignment problems from a deep learning perspective. Specifically, three groups were focusing on the alignment problems regarding different categories of Deep Learning methods, which are:

      diff --git a/week3/index.html b/week3/index.html index 522dba4..08fd03a 100644 --- a/week3/index.html +++ b/week3/index.html @@ -309,17 +309,17 @@

      By Tuesday: Questions about

      The authors recommend transparency of bias mitigation methods, citing the benefit it could provide to researchers and practitioners. Specifically, how might researchers benefit from this? Can you foresee any negative consequences (either to researchers or the general users of these models) of this transparency?

    -
    +

      -
    1. +
    2. Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh. “Calibrate before use: Improving few-shot performance of language models.” International Conference on Machine Learning. PMLR, 2021. ↩︎

    3. -
    4. +
    5. Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman. “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.” arXiv preprint arXiv:2305.04388, 2023. ↩︎

    -
    + diff --git a/week4/index.html b/week4/index.html index 5bf5ddf..dec34d2 100644 --- a/week4/index.html +++ b/week4/index.html @@ -106,7 +106,7 @@

    Monday, September 18

    Figure 1 (Image source) -

    LLMs and fine-tuned models perform better on different tasks. According to a study from the paper Benchmarking Large Language Models for News Summarization 1, LLMs perform better than fine-tuned models on text summarization according to the preferences of human raters. Additionally, LLMs perform better on tasks that require large amounts of knowledge from a variety of domains2. In machine translation, however, fine-tuned models generally do better than LLMs, although they only do slightly better in low-resource settings2. Additionally, the presenters explained that fine-tuned models and LLMs have similar performance when the task only requires very specific knowledge.

    +

    LLMs and fine-tuned models perform better on different tasks. According to a study from the paper Benchmarking Large Language Models for News Summarization 1, LLMs perform better than fine-tuned models on text summarization according to the preferences of human raters. Additionally, LLMs perform better on tasks that require large amounts of knowledge from a variety of domains2. In machine translation, however, fine-tuned models generally do better than LLMs, although they only do slightly better in low-resource settings2. Additionally, the presenters explained that fine-tuned models and LLMs have similar performance when the task only requires very specific knowledge.

    The presenters then posed the following question to the class: “How could we enhance LLM in the scenario where the required knowledge does not match their learned knowledge?” The class formed four groups to discuss the question. Each group then shared a summary of what they had discussed:

    Group 1: Discussed enhancements in both training and testing. For testing, use intentional prompts to get the knowledge you want from the model. For training, adding more training data, using a knowledge graph to classify knowledge into different clusters for ease of search, and using a plug-in, as mentioned in the GitHub discussion.

    Group 2: Advocated for enhancement through the model’s ability to retrieve information from external sources and undergo fine-tuning.

    @@ -270,13 +270,13 @@

    Discussion for Wednesday:

    The paper mentions the importance of safety and minimizing bias in LLM-generated medical information, and the optional reading reports on some experiments that show biases in GPT’s medical diagnoses. Should models be tuned to ignore protected attributes? Should we prevent models from being used in medical applications until these problems can be solved?

    -
    +

      -
    1. +
    2. Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, and Tatsunori B. Hashimoto. Benchmarking large language models for news summarization, 2023. https://arxiv.org/abs/2301.13848 ↩︎

    3. -
    4. +
    5. Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek @@ -289,13 +289,13 @@

      Discussion for Wednesday:

      drew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. Palm: Scaling -language modeling with pathways, 2022. https://arxiv.org/abs/2204.02311 ↩︎ ↩︎

      +language modeling with pathways, 2022. https://arxiv.org/abs/2204.02311 ↩︎

    6. -
    7. +
    8. OpenAI. GPT-4 Technical Report. March 2023. https://arxiv.org/abs/2303.08774 ↩︎

    -
    + diff --git a/week8/index.html b/week8/index.html index c466052..1b7833f 100644 --- a/week8/index.html +++ b/week8/index.html @@ -104,7 +104,7 @@

    Monday,

    Here is an example of pseudocode from the activity:

    -
    Sentence = "The students like to read interesting books."
    +
    Sentence = "The students like to read interesting books."
     # The bilingual dictionary from English to Chinese: Eng_chinese_dict
     Translation = []
     for word in Sentence.split():
    @@ -112,7 +112,7 @@ 

    Translation.append(Eng_chinese_dict[word]) else: Translation.append(word) -Translated_sentence = " ".join(Translation) +Translated_sentence = " ".join(Translation)

    After the activity discussion, here are the challenges encountered when translating from English to another language:

    • Variations in Word Order: Different languages have varying word orders, affecting sentence structure.
    • diff --git a/week9/index.html b/week9/index.html index 9c3d19d..b2fb12e 100644 --- a/week9/index.html +++ b/week9/index.html @@ -482,7 +482,7 @@

      Sparse Autoencoders

    • Input Bias: Introduces an approach of adding an input bias to the representations in autoencoders, which demonstrates a significant boost in performance for the models used in toy examples.
    • -

      The purpose of sparse autoencoders is to extract meaningful features from neural network activations. To avhice a good decomposition, where the features extracted should be interpretable and able to describe the activations’ context requires the ability to describe activations, interpret downstream effects of changes, and cover a significant portion of functionality within the data.

      +

      The purpose of sparse autoencoders is to extract meaningful features from neural network activations. To avhice a good decomposition, where the features extracted should be interpretable and able to describe the activations' context requires the ability to describe activations, interpret downstream effects of changes, and cover a significant portion of functionality within the data.

      Are these features “interpretable”

      @@ -492,7 +492,7 @@

      Are these features “interpretabl


      Feature Activation Sampling Bias: In previous evaluations, there was a bias due to just considering the top-activation neurons which might inaccurately appear monosemantic due to their higher activations. To mitigate this bias, the approach involves sampling uniformly across all possible activations for each given feature.

      -

      Evaluation of Interpretable Features: The authors used an evaluation process where human-based assessments are used to determine the interpretability of the features extracted. The criteria for interpretability are based on the authors’ distributed-based evaluation, where a score above eight is considered sufficiently interpretable.

      +

      Evaluation of Interpretable Features: The authors used an evaluation process where human-based assessments are used to determine the interpretability of the features extracted. The criteria for interpretability are based on the authors' distributed-based evaluation, where a score above eight is considered sufficiently interpretable.

      Automated Evaluation

      @@ -625,7 +625,7 @@

      Discussion Questions

      - +