Skip to content

Commit

Permalink
Update leaderboards-on-the-hub-haizelab.md (huggingface#1858)
Browse files Browse the repository at this point in the history
  • Loading branch information
clefourrier authored Feb 23, 2024
1 parent 7152fc6 commit cbeb6fd
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion leaderboards-on-the-hub-haizelab.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ LLM research is moving fast. Indeed, some might say too fast.
While researchers in the field continue to rapidly expand and improve LLM performance, there is growing concern over whether these models are capable of realizing increasingly more undesired and unsafe behaviors. In recent months, there has been no shortage of [legislation](https://www.usnews.com/news/business/articles/2024-01-29/ai-companies-will-need-to-start-reporting-their-safety-tests-to-the-us-government) and [direct calls](https://openai.com/safety/preparedness) from industry labs calling for additional scrutiny on models – not as a means to hinder this technology’s progress but as a means to ensure it is responsibly deployed for the world to use.


To this end, Haize Labs is thrilled to announce the [Red Teaming Resistance Benchmark][https://huggingface.co/spaces/HaizeLabs/red-teaming-resistance-benchmark), built with generous support from the Hugging Face team. In this benchmark, we thoroughly probe the robustness of frontier models under extreme red teaming efforts. That is, we systematically challenge and test these models with craftily constructed prompts to uncover their failure modes and vulnerabilities – revealing where precisely these models are susceptible to generating problematic outputs.
To this end, Haize Labs is thrilled to announce the [Red Teaming Resistance Benchmark](https://huggingface.co/spaces/HaizeLabs/red-teaming-resistance-benchmark), built with generous support from the Hugging Face team. In this benchmark, we thoroughly probe the robustness of frontier models under extreme red teaming efforts. That is, we systematically challenge and test these models with craftily constructed prompts to uncover their failure modes and vulnerabilities – revealing where precisely these models are susceptible to generating problematic outputs.

## Measuring Robustness to Realistic, Human-Like Attacks

Expand Down

0 comments on commit cbeb6fd

Please sign in to comment.