Contains all the papers presented at ACM Summer School on Generative AI for Text 2024
👺WARNING❗: This repo contains several unethical and sensitive statements
🌟🌟 New! See Useful Links to access the tutorial slides 🤗
-
🎯 Somnath Banerjee, Sayan Layek, Rima Hazra, Animesh Mukherjee. How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries. 👉 Paper [Under Review]
-
Fengqing Jiang, Zhangchen Xu, Luyao Niu, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran. ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs 👉 Paper [ACL 2024]
-
Divij Handa, Advait Chirmule, Bimal Gajera, Chitta Baral. Jailbreaking Proprietary Large Language Models using Word Substitution Cipher. 👉 Paper [Under Review]
-
Yue Deng, Wenxuan Zhang, Sinno Jialin Pan, Lidong Bing. Multilingual Jailbreak Challenges in Large Language Models. 👉 Paper [ICLR 2024]
-
Javier Rando, Florian Tramèr. Universal Jailbreak Backdoors from Poisoned Human Feedback. 👉 Paper [ICLR 2024]
-
🎯 Rima Hazra, Sayan Layek, Somnath Banerjee, Soujanya Poria. Sowing the Wind, Reaping the Whirlwind: The Impact of Editing Language Models. 👉 Paper [ACL 2024]
-
Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson. Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!. 👉 Paper [ICLR 2024]
-
🎯 Rima Hazra, Sayan Layek, Somnath Banerjee, Soujanya Poria. Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations. 👉 Paper [Under Review]
-
🎯 Somnath Banerjee, Soham Tripathy, Sayan Layek, Shanu Kumar, Animesh Mukherjee, Rima Hazra. SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models. 👉 Paper [Under Review]
- NicheHazardQA 👉 download 🎯
- TechHazardQA 👉 download 🎯
- DangerousQA 👉 download
- AdvBench 👉 download
- Anthropic HH dataset 👉 download
- Simple jailbreaking with naive prompt - Safe_Unsafe_Examples.ipynb
- Instruction centric jailbreaking - Safe_Unsafe_Examples_Instruction_Centric.ipynb
- ⭐️ If you find the github resources helpful, our papers and datasets (🎯) interesting, please encourage us by starring, upvoting and sharing our papers and datasets! 😊