title

abstract

layout

series

publisher

issn

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

container-title

volume

genre

issued

pdf

extras

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

Over the past decade, there has been extensive research aimed at enhancing the robustness of neural networks, yet this problem remains vastly unsolved. Here, one major impediment has been the overestimation of the robustness of new defense approaches due to faulty defense evaluations. Flawed robustness evaluations necessitate rectifications in subsequent works, dangerously slowing down the research and providing a false sense of security. In this context, we will face substantial challenges associated with an impending adversarial arms race in natural language processing, specifically with closed-source Large Language Models (LLMs), such as ChatGPT, Google Bard, or Anthropic’s Claude. We provide a first set of prerequisites to improve the robustness assessment of new approaches and reduce the amount of faulty evaluations. Additionally, we identify embedding space attacks on LLMs as another viable threat model for the purposes of generating malicious content in open-sourced models. Finally, we demonstrate on a recently proposed defense that, without LLM-specific best practices in place, it is easy to overestimate the robustness of a new approach.

inproceedings

Proceedings of Machine Learning Research

PMLR

2640-3498

schwinn23a

0

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

103

117

103-117

103

false

Schwinn, Leo and Dobre, David and G{\"u}nnemann, Stephan and Gidel, Gauthier

given	family
Leo	Schwinn

given	family
David	Dobre

given	family
Stephan	Günnemann

given	family
Gauthier	Gidel

2023-04-24

Proceedings on "I Can't Believe It's Not Better: Failure Modes in the Age of Foundation Models" at NeurIPS 2023 Workshops

239

inproceedings

date-parts

2023

4

24

https://proceedings.mlr.press/v239/schwinn23a/schwinn23a.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2023-04-24-schwinn23a.md

2023-04-24-schwinn23a.md

Files

2023-04-24-schwinn23a.md

Latest commit

History

2023-04-24-schwinn23a.md

File metadata and controls