From 93365a75edd05eafc5e4be7c8fb4565a30e733dd Mon Sep 17 00:00:00 2001 From: David Evans <> Date: Thu, 2 Nov 2023 14:02:11 -0400 Subject: [PATCH] Rebuilt site --- index.html | 70 +++++++++++++++---------------- index.xml | 10 ++--- post/index.html | 2 +- post/index.xml | 10 ++--- readings/index.html | 7 ++-- sitemap.xml | 92 +++++++++++------------------------------ src/content/readings.md | 3 ++ tags/index.xml | 8 +--- topics/index.xml | 8 +--- week1/index.html | 36 ++++++++-------- week2/index.html | 6 +-- week3/index.html | 6 +-- week4/index.html | 8 ++-- week5/index.html | 12 +++--- week8/index.html | 2 +- week9/index.html | 4 +- 16 files changed, 114 insertions(+), 170 deletions(-) diff --git a/index.html b/index.html index 65f13da..19cad8c 100644 --- a/index.html +++ b/index.html @@ -1,7 +1,7 @@
- +The purpose of sparse autoencoders is to extract meaningful features from neural network activations. To avhice a good decomposition, where the features extracted should be interpretable and able to describe the activations’ context requires the ability to describe activations, interpret downstream effects of changes, and cover a significant portion of functionality within the data.
+The purpose of sparse autoencoders is to extract meaningful features from neural network activations. To avhice a good decomposition, where the features extracted should be interpretable and able to describe the activations' context requires the ability to describe activations, interpret downstream effects of changes, and cover a significant portion of functionality within the data.
Feature Activation Sampling Bias: In previous evaluations, there was a bias due to just considering the top-activation neurons which might inaccurately appear monosemantic due to their higher activations. To mitigate this bias, the approach involves sampling uniformly across all possible activations for each given feature. -Evaluation of Interpretable Features: The authors used an evaluation process where human-based assessments are used to determine the interpretability of the features extracted. The criteria for interpretability are based on the authors’ distributed-based evaluation, where a score above eight is considered sufficiently interpretable. +Evaluation of Interpretable Features: The authors used an evaluation process where human-based assessments are used to determine the interpretability of the features extracted. The criteria for interpretability are based on the authors' distributed-based evaluation, where a score above eight is considered sufficiently interpretable. |
- _Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models_ + Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models |
_Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models_ +
Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models
For inference, one strategy is to reduce the snowballing of hallucinations by designing a dynamic p-value. The p-value should start off large and shrink as more tokens are generated. Furthermore, introducing new or external knowledge can be done at two different positions: before and after generation. @@ -1579,7 +1579,7 @@
_DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models_
+DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Based on evolving trends, the concept of contrastive decoding is introduced. For example, one might ask, "How do we decide between Seattle or Olympia?" When considering the last layer as a mature layer, it is beneficial to contrast the differences between the preceding layers, which can be deemed as premature. For each of these layers, it is possible to calculate the difference between each probability distribution by comparing mature and premature layers, a process that utilizes the Jensen-Shannon Divergence. Such an approach permits the amplification of the factual knowledge that the model has acquired, thereby enhancing its output generation. @@ -1592,7 +1592,7 @@
_In-Context Retrieval-Augmented Language Models_
+In-Context Retrieval-Augmented Language Models
The model parameters are kept frozen. Instead of directly inputting text into the model, the approach first uses retrieval to search for relevant documents from external sources. The findings from these sources are then concatenated with the original text. Re-ranking results from the retrieval model also provides benefits; the exact perplexities can be referred to in the slide. It has been observed that smaller strides can enhance performance, albeit at the cost of increased runtime. The authors have noticed that the information at the end of a query is typically more relevant for output generation. In general, shorter queries tend to outperform longer ones.
@@ -1607,7 +1607,7 @@Group 2: Discussed two papers from this week's reading which highlighted the use of semantic search and the introduction of external context to aid the model. This approach, while useful for diminishing hallucination, heavily depends on external information, which is not effective in generic cases. @@ -1628,7 +1628,7 @@
One advantage discussed was that hallucinations "train" users to not blindly trust the model outputs. If such models are blindly trusted, there is a much greater risk associated with their use. If users can conclusively discern, however, that the produced information is fictitious, it could assist in fostering new ideas or fresh perspectives on a given topic. @@ -1883,7 +1883,7 @@
Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, and Tatsunori B. Hashimoto. Benchmarking large language models for news summarization, 2023. https://arxiv.org/abs/2301.13848 ↩︎
+Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, and Tatsunori B. Hashimoto. Benchmarking large language models for news summarization, 2023. https://arxiv.org/abs/2301.13848 ↩︎
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav @@ -1898,10 +1898,10 @@
OpenAI. GPT-4 Technical Report. March 2023. https://arxiv.org/abs/2303.08774 ↩︎
+OpenAI. GPT-4 Technical Report. March 2023. https://arxiv.org/abs/2303.08774 ↩︎
Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh. “Calibrate before use: Improving few-shot performance of language models.” International Conference on Machine Learning. PMLR, 2021. ↩︎
+Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh. “Calibrate before use: Improving few-shot performance of language models.” International Conference on Machine Learning. PMLR, 2021. ↩︎
Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman. “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.” arXiv preprint arXiv:2305.04388, 2023. ↩︎
+Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman. “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.” arXiv preprint arXiv:2305.04388, 2023. ↩︎
neutral'' or
center’’ position, these are just defined by where
+neutral'' or
center'' position, these are just defined by where
the expected distribution of beliefs lies).
Figure 1 shows the political leanings of famous LLMs.
However, a developer’s responsibility doesn’t culminate once the AI product hits the market. The journey is continuous. Post-deployment, it’s crucial for developers to monitor the system’s alignment with human values and rectify any deviations. It’s an ongoing commitment to refinement and recalibration. Moreover, transparency is key. Developers should be proactive in highlighting potential concerns related to their models and fostering a culture where the public is not just a passive victim but an active participant in the model alignment process.
To round off, it’s essential for developers to adopt a forward-thinking mindset. The decisions made today in the AI labs and coding chambers will shape the world of tomorrow. Thus, every developer should think about the long-term consequences of their work, always aiming to ensure that AI not only dazzles with its brilliance but also remains beneficial for generations to come.
-How might AI developers’ responsibility evolve?
+How might AI developers' responsibility evolve?
It’s impossible to catch all edge cases. As AI systems grow in complexity, predicting every potential outcome or misalignment becomes a herculean task. Developers, in the future, might need to shift from a perfectionist mindset to one that emphasizes robustness and adaptability. While it’s essential to put in rigorous engineering effort to minimize errors, it’s equally crucial to understand and communicate that no system can be flawless.
-Besides, given that catching all cases isn’t feasible, developers’ roles might evolve to include more dynamic and real-time monitoring of AI systems. This would involve continuously learning from real-world interactions, gathering feedback, and iterating on the model to ensure better alignment with human values.
+Besides, given that catching all cases isn’t feasible, developers' roles might evolve to include more dynamic and real-time monitoring of AI systems. This would involve continuously learning from real-world interactions, gathering feedback, and iterating on the model to ensure better alignment with human values.
In this part of today’s seminar, the whole class was divided into 3 groups to discuss the possible alignment problems from a deep learning perspective. Specifically, three groups were focusing on the alignment problems regarding different categories of Deep Learning methods, which are:
RNNs2 were one of the earliest models used for sequence-based tasks in machine learning. They processed input tokens one after another and used their internal memory to capture dependencies in the sequence. The following figure gives an illustration of the RNN architecture.
Context and Answers in Activities. Let’s do some activity now!
-I used to ___
+I used to ___
Yesterday, I went to ___
@@ -2822,7 +2822,7 @@ Risks and Rewards
Risks Group.
-- Concerns were expressed regarding LLMs’ opacity and complexity, making them challenging to comprehend.
+- Concerns were expressed regarding LLMs' opacity and complexity, making them challenging to comprehend.
- Apprehensions were raised about LLMs potentially exerting detrimental influences on human cognition and societal dynamics.
- LLMs are ripe for potential abuses in their ability to generate convincing false information cheaply.
- The potential impact of LLMs on human intelligence and creativity was a topic of contemplation.
@@ -2901,46 +2901,46 @@ Discussion Questions
-
-
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. ↩︎
+Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. ↩︎
-
-
Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. ↩︎
+Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. ↩︎
-
-
Pascanu, R., Mikolov, T., & Bengio, Y. (2013, May). On the difficulty of training recurrent neural networks. In International conference on machine learning (pp. 1310-1318). Pmlr. ↩︎
+Pascanu, R., Mikolov, T., & Bengio, Y. (2013, May). On the difficulty of training recurrent neural networks. In International conference on machine learning (pp. 1310-1318). Pmlr. ↩︎
-
-
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. ↩︎
+Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. ↩︎
-
-
Mnih, V., Heess, N., & Graves, A. (2014). Recurrent models of visual attention. Advances in neural information processing systems, 27. ↩︎
+Mnih, V., Heess, N., & Graves, A. (2014). Recurrent models of visual attention. Advances in neural information processing systems, 27. ↩︎
-
-
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. ↩︎
+Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. ↩︎
-
-
Lin, T., Wang, Y., Liu, X., & Qiu, X. (2022). A survey of transformers. AI Open. ↩︎
+Lin, T., Wang, Y., Liu, X., & Qiu, X. (2022). A survey of transformers. AI Open. ↩︎
-
-
Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s), 1-41. ↩︎
+Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s), 1-41. ↩︎
-
-
Karim, R. (2023, January 2). Illustrated: Self-attention. Medium. https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a ↩︎
+Karim, R. (2023, January 2). Illustrated: Self-attention. Medium. https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a ↩︎
-
-
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. ↩︎
+Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. ↩︎
-
-
Ahmad, K. (2023b, April 26). GPT vs. Bert: What are the differences between the two most popular language models?. MUO. https://www.makeuseof.com/gpt-vs-bert/ ↩︎
+Ahmad, K. (2023b, April 26). GPT vs. Bert: What are the differences between the two most popular language models?. MUO. https://www.makeuseof.com/gpt-vs-bert/ ↩︎
-
-
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. ↩︎
+Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. ↩︎
-
-
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. ↩︎
+Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. ↩︎
-
-
Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., … & Hu, X. (2023). Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712. ↩︎
+Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., … & Hu, X. (2023). Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712. ↩︎
diff --git a/index.xml b/index.xml
index 6c451a4..9bafb10 100644
--- a/index.xml
+++ b/index.xml
@@ -8,11 +8,7 @@
en-us
evans@virginia.edu (David Evans)
evans@virginia.edu (David Evans)
- Wed, 23 Aug 2023 00:00:00 +0000
-
-
-
-
+ Mon, 30 Oct 2023 00:00:00 +0000
-
Week 9: Interpretability
https://llmrisks.github.io/week9/
@@ -104,7 +100,7 @@ Table of Contents (Monday, 09/04/2023) Introduction to Alignment Introduction
https://llmrisks.github.io/week1/
(see bottom for assigned readings and questions)
Attention, Transformers, and BERT Monday, 28 August
-Transformers1 are a class of deep learning models that have revolutionized the field of natural language processing (NLP) and various other domains. The concept of transformers originated as an attempt to address the limitations of traditional recurrent neural networks (RNNs) in sequential data processing. Here’s an overview of transformers’ evolution and significance.
+Transformers1 are a class of deep learning models that have revolutionized the field of natural language processing (NLP) and various other domains. The concept of transformers originated as an attempt to address the limitations of traditional recurrent neural networks (RNNs) in sequential data processing. Here’s an overview of transformers' evolution and significance.
Background and Origin RNNs2 were one of the earliest models used for sequence-based tasks in machine learning.
@@ -222,4 +218,4 @@ I believe each team has at least a few members with enough experience using git
-
\ No newline at end of file
+
diff --git a/post/index.html b/post/index.html
index 3838945..853726e 100644
--- a/post/index.html
+++ b/post/index.html
@@ -212,7 +212,7 @@ Week 1: Introduction
(see bottom for assigned readings and questions)
Attention, Transformers, and BERT Monday, 28 August
-Transformers1 are a class of deep learning models that have revolutionized the field of natural language processing (NLP) and various other domains. The concept of transformers originated as an attempt to address the limitations of traditional recurrent neural networks (RNNs) in sequential data processing. Here’s an overview of transformers’ evolution and significance.
+Transformers1 are a class of deep learning models that have revolutionized the field of natural language processing (NLP) and various other domains. The concept of transformers originated as an attempt to address the limitations of traditional recurrent neural networks (RNNs) in sequential data processing. Here’s an overview of transformers' evolution and significance.
Background and Origin RNNs2 were one of the earliest models used for sequence-based tasks in machine learning.
diff --git a/post/index.xml b/post/index.xml
index c0d2963..45afcf7 100644
--- a/post/index.xml
+++ b/post/index.xml
@@ -8,11 +8,7 @@
en-us
evans@virginia.edu (David Evans)
evans@virginia.edu (David Evans)
- Mon, 30 Oct 2023 00:00:00 +0000
-
-
-
-
+ Mon, 30 Oct 2023 00:00:00 +0000
-
Week 9: Interpretability
https://llmrisks.github.io/week9/
@@ -104,7 +100,7 @@ Table of Contents (Monday, 09/04/2023) Introduction to Alignment Introduction
https://llmrisks.github.io/week1/
(see bottom for assigned readings and questions)
Attention, Transformers, and BERT Monday, 28 August
-Transformers1 are a class of deep learning models that have revolutionized the field of natural language processing (NLP) and various other domains. The concept of transformers originated as an attempt to address the limitations of traditional recurrent neural networks (RNNs) in sequential data processing. Here’s an overview of transformers’ evolution and significance.
+Transformers1 are a class of deep learning models that have revolutionized the field of natural language processing (NLP) and various other domains. The concept of transformers originated as an attempt to address the limitations of traditional recurrent neural networks (RNNs) in sequential data processing. Here’s an overview of transformers' evolution and significance.
Background and Origin RNNs2 were one of the earliest models used for sequence-based tasks in machine learning.
@@ -165,4 +161,4 @@ I’m expecting the structure and format to that combines aspects of thi
-
\ No newline at end of file
+
diff --git a/readings/index.html b/readings/index.html
index 4678700..8c07586 100644
--- a/readings/index.html
+++ b/readings/index.html
@@ -139,18 +139,19 @@ Abuses of LLMs
Nicholas Carlini. A LLM Assisted Exploitation of AI-Guardian.
Andy Zou, Zifan Wang, J. Zico Kolter, Matt Fredrikson. Universal and Transferable Adversarial Attacks on Aligned Language Models. https://arxiv.org/abs/2307.15043.
Project Website: https://llm-attacks.org/.
+Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz. Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. https://arxiv.org/abs/2302.12173.
Fairness and Bias
Shangbin Feng, Chan Young Park, Yuhan Liu, Yulia Tsvetkov. From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. ACL 2023.
Myra Cheng, Esin Durmus, Dan Jurafsky. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. ACL 2023.
“Alignment”
-Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li. Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment. https://arxiv.org/abs/2308.05374.
+Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li. Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment. https://arxiv.org/abs/2308.05374.
AGI
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang. Sparks of Artificial General Intelligence: Early experiments with GPT-4. Microsoft, March 2023. https://arxiv.org/abs/2303.12712
Yejin Choi. The Curious Case of Commonsense Intelligence. Daedalus, Spring 2022.
Konstantine Arkoudas. GPT-4 Can’t Reason. https://arxiv.org/abs/2308.03762.
Natalie Shapira, Mosh Levy, Seyed Hossein Alavi, Xuhui Zhou, Yejin Choi, Yoav Goldberg, Maarten Sap, Vered Shwartz. Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models.
Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi, Yulia
-Tsvetkov. Minding Language Models’ (Lack of) Theory of Mind: A
+Tsvetkov. Minding Language Models' (Lack of) Theory of Mind: A
Plug-and-Play Multi-Character Belief
Tracker. ACL 2023
Boaz Barak. The shape of AGI: Cartoons and back of envelope. July 2023.
@@ -196,7 +197,7 @@ More Sources
-
+
diff --git a/sitemap.xml b/sitemap.xml
index a87c1e7..13e3c8b 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -1,112 +1,68 @@
-
https://llmrisks.github.io/post/
2023-10-30T00:00:00+00:00
-
-
-
+
+ https://llmrisks.github.io/
+ 2023-10-30T00:00:00+00:00
+
https://llmrisks.github.io/week9/
2023-10-30T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/week8/
2023-10-22T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/week7/
2023-10-16T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/week5/
2023-10-04T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/week4/
2023-09-25T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/week3/
2023-09-18T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/week2/
2023-09-11T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/week1/
2023-09-03T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/discussions/
2023-08-25T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/class0/
2023-08-23T00:00:00+00:00
-
-
-
- https://llmrisks.github.io/
- 2023-08-23T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/weeklyschedule/
2023-08-23T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/readings/
2023-08-21T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/schedule/
2023-08-21T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/updates/
2023-08-21T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/survey/
2023-08-17T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/welcome/
2023-05-26T00:00:00+00:00
-
-
-
+
https://llmrisks.github.io/syllabus/
0
-
-
-
+
https://llmrisks.github.io/blogging/
-
-
-
+
https://llmrisks.github.io/tags/
-
-
-
+
https://llmrisks.github.io/topics/
-
-
\ No newline at end of file
+
diff --git a/src/content/readings.md b/src/content/readings.md
index b832533..88c3d98 100644
--- a/src/content/readings.md
+++ b/src/content/readings.md
@@ -102,6 +102,9 @@ Nicholas Carlini. [_A LLM Assisted Exploitation of AI-Guardian_](https://arxiv.o
Andy Zou, Zifan Wang, J. Zico Kolter, Matt Fredrikson. [_Universal and Transferable Adversarial Attacks on Aligned Language Models_](https://arxiv.org/abs/2307.15043). [https://arxiv.org/abs/2307.15043](https://arxiv.org/abs/2307.15043).
[Project Website: https://llm-attacks.org/](https://llm-attacks.org/).
+Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz. [_Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection_](https://arxiv.org/abs/2302.12173). [https://arxiv.org/abs/2302.12173](https://arxiv.org/abs/2302.12173).
+
+
## Fairness and Bias
Shangbin Feng, Chan Young Park, Yuhan Liu, Yulia Tsvetkov. [_From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models_](https://arxiv.org/abs/2305.08283). ACL 2023.
diff --git a/tags/index.xml b/tags/index.xml
index f28ad5b..6883615 100644
--- a/tags/index.xml
+++ b/tags/index.xml
@@ -7,10 +7,6 @@
Hugo -- gohugo.io
en-us
evans@virginia.edu (David Evans)
- evans@virginia.edu (David Evans)
-
-
-
-
+ evans@virginia.edu (David Evans)
-
\ No newline at end of file
+
diff --git a/topics/index.xml b/topics/index.xml
index 5ad8dad..313e6f4 100644
--- a/topics/index.xml
+++ b/topics/index.xml
@@ -7,10 +7,6 @@
Hugo -- gohugo.io
en-us
evans@virginia.edu (David Evans)
- evans@virginia.edu (David Evans)
-
-
-
-
+ evans@virginia.edu (David Evans)
-
\ No newline at end of file
+
diff --git a/week1/index.html b/week1/index.html
index 602f032..dfe53ce 100644
--- a/week1/index.html
+++ b/week1/index.html
@@ -99,7 +99,7 @@ Attention, Transformers, and BERT
various other domains. The concept of transformers originated as an
attempt to address the limitations of traditional recurrent neural
networks (RNNs) in sequential data processing. Here’s an overview of
-transformers’ evolution and significance.
+transformers' evolution and significance.
Background and Origin
RNNs2 were one of the earliest models used for sequence-based tasks in machine learning. They processed input tokens one after another and used their internal memory to capture dependencies in the sequence. The following figure gives an illustration of the RNN architecture.
@@ -138,7 +138,7 @@ The Transformer Model
Self-attention is a useful mechanism that allows each token to consider the relationships between all other tokens in a sequence. While it provides a more nuanced understanding of input, it can be computationally expensive. The use of multi-head attention further enhances the model’s ability to capture different types of dependencies in the data. The number of attention heads (e.g., 8 in BERT) is a balance between performance and complexity. Too few or too many heads can result in suboptimal performance. More details about self-attention and multi-head attention can be found in 9.
Context and Answers in Activities.
Let’s do some activity now!
-I used to ___
+I used to ___
Yesterday, I went to ___
@@ -190,7 +190,7 @@ Risks and Rewards
Risks Group.
-- Concerns were expressed regarding LLMs’ opacity and complexity, making them challenging to comprehend.
+- Concerns were expressed regarding LLMs' opacity and complexity, making them challenging to comprehend.
- Apprehensions were raised about LLMs potentially exerting detrimental influences on human cognition and societal dynamics.
- LLMs are ripe for potential abuses in their ability to generate convincing false information cheaply.
- The potential impact of LLMs on human intelligence and creativity was a topic of contemplation.
@@ -269,53 +269,53 @@ Discussion Questions
-
-
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. ↩︎
+Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. ↩︎
-
-
Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. ↩︎
+Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. ↩︎
-
-
Pascanu, R., Mikolov, T., & Bengio, Y. (2013, May). On the difficulty of training recurrent neural networks. In International conference on machine learning (pp. 1310-1318). Pmlr. ↩︎
+Pascanu, R., Mikolov, T., & Bengio, Y. (2013, May). On the difficulty of training recurrent neural networks. In International conference on machine learning (pp. 1310-1318). Pmlr. ↩︎
-
-
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. ↩︎
+Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. ↩︎
-
-
Mnih, V., Heess, N., & Graves, A. (2014). Recurrent models of visual attention. Advances in neural information processing systems, 27. ↩︎
+Mnih, V., Heess, N., & Graves, A. (2014). Recurrent models of visual attention. Advances in neural information processing systems, 27. ↩︎
-
-
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. ↩︎
+Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. ↩︎
-
-
Lin, T., Wang, Y., Liu, X., & Qiu, X. (2022). A survey of transformers. AI Open. ↩︎
+Lin, T., Wang, Y., Liu, X., & Qiu, X. (2022). A survey of transformers. AI Open. ↩︎
-
-
Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s), 1-41. ↩︎
+Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s), 1-41. ↩︎
-
-
Karim, R. (2023, January 2). Illustrated: Self-attention. Medium. https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a ↩︎
+Karim, R. (2023, January 2). Illustrated: Self-attention. Medium. https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a ↩︎
-
-
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. ↩︎
+Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. ↩︎
-
-
Ahmad, K. (2023b, April 26). GPT vs. Bert: What are the differences between the two most popular language models?. MUO. https://www.makeuseof.com/gpt-vs-bert/ ↩︎
+Ahmad, K. (2023b, April 26). GPT vs. Bert: What are the differences between the two most popular language models?. MUO. https://www.makeuseof.com/gpt-vs-bert/ ↩︎
-
-
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. ↩︎
+Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. ↩︎
-
-
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. ↩︎
+Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. ↩︎
-
-
Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., … & Hu, X. (2023). Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712. ↩︎
+Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., … & Hu, X. (2023). Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712. ↩︎
-
+
diff --git a/week2/index.html b/week2/index.html
index c78f48e..66b5705 100644
--- a/week2/index.html
+++ b/week2/index.html
@@ -203,7 +203,7 @@ Introduction to AI Align
(Feng et al.) show
that famous models like BERT and ChatGPT do appear to have
socioeconomic political leanings (of course, there is no true
-neutral'' or
center’’ position, these are just defined by where
+neutral'' or
center'' position, these are just defined by where
the expected distribution of beliefs lies).
Figure 1 shows the political leanings of famous LLMs.
@@ -288,10 +288,10 @@ Discussion Questions
However, a developer’s responsibility doesn’t culminate once the AI product hits the market. The journey is continuous. Post-deployment, it’s crucial for developers to monitor the system’s alignment with human values and rectify any deviations. It’s an ongoing commitment to refinement and recalibration. Moreover, transparency is key. Developers should be proactive in highlighting potential concerns related to their models and fostering a culture where the public is not just a passive victim but an active participant in the model alignment process.
To round off, it’s essential for developers to adopt a forward-thinking mindset. The decisions made today in the AI labs and coding chambers will shape the world of tomorrow. Thus, every developer should think about the long-term consequences of their work, always aiming to ensure that AI not only dazzles with its brilliance but also remains beneficial for generations to come.
-How might AI developers’ responsibility evolve?
+How might AI developers' responsibility evolve?
It’s impossible to catch all edge cases. As AI systems grow in complexity, predicting every potential outcome or misalignment becomes a herculean task. Developers, in the future, might need to shift from a perfectionist mindset to one that emphasizes robustness and adaptability. While it’s essential to put in rigorous engineering effort to minimize errors, it’s equally crucial to understand and communicate that no system can be flawless.
-Besides, given that catching all cases isn’t feasible, developers’ roles might evolve to include more dynamic and real-time monitoring of AI systems. This would involve continuously learning from real-world interactions, gathering feedback, and iterating on the model to ensure better alignment with human values.
+Besides, given that catching all cases isn’t feasible, developers' roles might evolve to include more dynamic and real-time monitoring of AI systems. This would involve continuously learning from real-world interactions, gathering feedback, and iterating on the model to ensure better alignment with human values.
The Alignment Problem from a Deep Learning Perspective
In this part of today’s seminar, the whole class was divided into 3 groups to discuss the possible alignment problems from a deep learning perspective. Specifically, three groups were focusing on the alignment problems regarding different categories of Deep Learning methods, which are:
diff --git a/week3/index.html b/week3/index.html
index 29196db..08fd03a 100644
--- a/week3/index.html
+++ b/week3/index.html
@@ -313,17 +313,17 @@ By Tuesday: Questions about
-
-
Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh. “Calibrate before use: Improving few-shot performance of language models.” International Conference on Machine Learning. PMLR, 2021. ↩︎
+Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh. “Calibrate before use: Improving few-shot performance of language models.” International Conference on Machine Learning. PMLR, 2021. ↩︎
-
-
Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman. “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.” arXiv preprint arXiv:2305.04388, 2023. ↩︎
+Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman. “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.” arXiv preprint arXiv:2305.04388, 2023. ↩︎
-
+
diff --git a/week4/index.html b/week4/index.html
index be673ce..dec34d2 100644
--- a/week4/index.html
+++ b/week4/index.html
@@ -274,7 +274,7 @@ Discussion for Wednesday:
-
-
Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, and Tatsunori B. Hashimoto. Benchmarking large language models for news summarization, 2023. https://arxiv.org/abs/2301.13848 ↩︎
+Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, and Tatsunori B. Hashimoto. Benchmarking large language models for news summarization, 2023. https://arxiv.org/abs/2301.13848 ↩︎
-
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav
@@ -289,17 +289,17 @@
Discussion for Wednesday:
drew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz,
Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy
Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. Palm: Scaling
-language modeling with pathways, 2022. https://arxiv.org/abs/2204.02311 ↩︎
+language modeling with pathways, 2022. https://arxiv.org/abs/2204.02311 ↩︎
-
-
OpenAI. GPT-4 Technical Report. March 2023. https://arxiv.org/abs/2303.08774 ↩︎
+OpenAI. GPT-4 Technical Report. March 2023. https://arxiv.org/abs/2303.08774 ↩︎
-
+
diff --git a/week5/index.html b/week5/index.html
index 8cf940c..ed1d3c7 100644
--- a/week5/index.html
+++ b/week5/index.html
@@ -142,7 +142,7 @@ Wednesday, September 27
Sources of Hallucination
-_Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models_
+
Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models
@@ -173,7 +173,7 @@ Wednesday, October 4th: H
_Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models_ +
Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models
For inference, one strategy is to reduce the snowballing of hallucinations by designing a dynamic p-value. The p-value should start off large and shrink as more tokens are generated. Furthermore, introducing new or external knowledge can be done at two different positions: before and after generation. @@ -188,7 +188,7 @@
_DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models_
+DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Based on evolving trends, the concept of contrastive decoding is introduced. For example, one might ask, "How do we decide between Seattle or Olympia?" When considering the last layer as a mature layer, it is beneficial to contrast the differences between the preceding layers, which can be deemed as premature. For each of these layers, it is possible to calculate the difference between each probability distribution by comparing mature and premature layers, a process that utilizes the Jensen-Shannon Divergence. Such an approach permits the amplification of the factual knowledge that the model has acquired, thereby enhancing its output generation. @@ -201,7 +201,7 @@
_In-Context Retrieval-Augmented Language Models_
+In-Context Retrieval-Augmented Language Models
The model parameters are kept frozen. Instead of directly inputting text into the model, the approach first uses retrieval to search for relevant documents from external sources. The findings from these sources are then concatenated with the original text. Re-ranking results from the retrieval model also provides benefits; the exact perplexities can be referred to in the slide. It has been observed that smaller strides can enhance performance, albeit at the cost of increased runtime. The authors have noticed that the information at the end of a query is typically more relevant for output generation. In general, shorter queries tend to outperform longer ones.
@@ -216,7 +216,7 @@Group 2: Discussed two papers from this week's reading which highlighted the use of semantic search and the introduction of external context to aid the model. This approach, while useful for diminishing hallucination, heavily depends on external information, which is not effective in generic cases. @@ -237,7 +237,7 @@
One advantage discussed was that hallucinations "train" users to not blindly trust the model outputs. If such models are blindly trusted, there is a much greater risk associated with their use. If users can conclusively discern, however, that the produced information is fictitious, it could assist in fostering new ideas or fresh perspectives on a given topic. diff --git a/week8/index.html b/week8/index.html index 4df87b6..1b7833f 100644 --- a/week8/index.html +++ b/week8/index.html @@ -104,7 +104,7 @@
Here is an example of pseudocode from the activity:
-Sentence = "The students like to read interesting books."
+Sentence = "The students like to read interesting books."
# The bilingual dictionary from English to Chinese: Eng_chinese_dict
Translation = []
for word in Sentence.split():
diff --git a/week9/index.html b/week9/index.html
index 03ce6d3..fc8a061 100644
--- a/week9/index.html
+++ b/week9/index.html
@@ -482,7 +482,7 @@ Sparse Autoencoders
Input Bias: Introduces an approach of adding an input bias to the representations in autoencoders, which demonstrates a significant boost in performance for the models used in toy examples.
The purpose of sparse autoencoders is to extract meaningful features from neural network activations. To avhice a good decomposition, where the features extracted should be interpretable and able to describe the activations’ context requires the ability to describe activations, interpret downstream effects of changes, and cover a significant portion of functionality within the data.
+The purpose of sparse autoencoders is to extract meaningful features from neural network activations. To avhice a good decomposition, where the features extracted should be interpretable and able to describe the activations' context requires the ability to describe activations, interpret downstream effects of changes, and cover a significant portion of functionality within the data.
Feature Activation Sampling Bias: In previous evaluations, there was a bias due to just considering the top-activation neurons which might inaccurately appear monosemantic due to their higher activations. To mitigate this bias, the approach involves sampling uniformly across all possible activations for each given feature. -Evaluation of Interpretable Features: The authors used an evaluation process where human-based assessments are used to determine the interpretability of the features extracted. The criteria for interpretability are based on the authors’ distributed-based evaluation, where a score above eight is considered sufficiently interpretable. +Evaluation of Interpretable Features: The authors used an evaluation process where human-based assessments are used to determine the interpretability of the features extracted. The criteria for interpretability are based on the authors' distributed-based evaluation, where a score above eight is considered sufficiently interpretable. |