Skip to content

Latest commit

 

History

History
19 lines (10 loc) · 1.21 KB

README.md

File metadata and controls

19 lines (10 loc) · 1.21 KB

Dreadnode Research

This is a general repository to hold research, projects, reference code, etc. for research we perform at dreadnode.

Mistral - Adversarial Suffix

Implementation of "Universal and Transferable Adversarial Attacks on Aligned Language Models" for Mistral 7B.

Mistral - BEAST Beam Attack

Implementation of "Fast Adversarial Attacks on Language Models In One GPU Minute" for Mistral 7B. At the time of release the authors have not posted the reference code from the paper, so this implementation is likely incorrect.

Llama PGD

Implementation of "Attacking Large Language Models with Projected Gradient Descent" for Llama model variants with LitGPT. At teh time of release the authors have not posted any reference code, so be careful.

Needle Triage/Fix

Research in partnership with OpenSSF for the AIxCC Event.