diff --git a/.nojekyll b/.nojekyll index a26fdbf..930340a 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -a65188df \ No newline at end of file +520d524d \ No newline at end of file diff --git a/index.html b/index.html index c5a66f0..143e615 100644 --- a/index.html +++ b/index.html @@ -143,7 +143,7 @@
-
+ -
+ -
+

diff --git a/posts/TDC2023.html b/posts/TDC2023.html index c186d18..60f19f6 100644 --- a/posts/TDC2023.html +++ b/posts/TDC2023.html @@ -301,7 +301,7 @@

Summary of Our Major Takeaways

\(-\textrm{mm}_{\omega} (-\log p(t_k | t_0, …, t_{i-1}), …, -\log p(t_{k+n} | t_0, …, t_{k+n-1}))\)

  • Hyperparameter tuning of GCG was very useful. Compared to the default hyperparameters used in Zou et al. 2023, we reduced our average optimizer runtime by ~7x. The average time to force an output sequence on a single A100 40GB went from 120 seconds to 17 seconds.

  • -
  • Presented benchmarks in some recent red-teaming & optimization papers can be misleading. Attacks with GCG performed well, better than we had expected.

    +
  • The benchmarks in some recent red-teaming & optimization papers can be misleading. Attacks with GCG performed well, better than we had expected.

    Papers will often select a model/task combination that is very easy to red-team. Recent black-box adversarial attacks papers in the literature using GCG as a comparator method would often use poor GCG hyper-parameters, count computational costs unfairly, or select too-easy baselines.

  • - + diff --git a/posts/catalog.out.ipynb b/posts/catalog.out.ipynb index 44b8e2f..0b5f67a 100644 --- a/posts/catalog.out.ipynb +++ b/posts/catalog.out.ipynb @@ -297,7 +297,7 @@ "Pythia-12B is miscalibrated on 20% of the bigrams and 45% of the\n", "trigrams when we ask for prediction of $p \\geq 0.45$." ], - "id": "f8435ac4-ab16-42b5-97e5-61de950e8321" + "id": "72ea1b0f-9434-4c67-903f-d78c5236b4bc" }, { "cell_type": "code", @@ -313,7 +313,7 @@ } ], "source": [], - "id": "4fb84275-0c73-444d-9a06-724d3e3fe2da" + "id": "30d01822-25dd-4005-8691-c6d39286b28a" }, { "cell_type": "markdown", @@ -377,7 +377,7 @@ "The dataset is available on Huggingface:\n", "[pile_scan_4](https://huggingface.co/datasets/Confirm-Labs/pile_scan_4)" ], - "id": "11074994-84b4-49df-a833-f6e03b064d7e" + "id": "390a8236-ac9a-4571-84bd-a904d3b687d8" }, { "cell_type": "code", @@ -391,7 +391,7 @@ } ], "source": [], - "id": "3fe4554b-080a-4453-8a59-60cafaec7875" + "id": "91ecfef7-31c7-40fc-b0b8-80e33bba3244" }, { "cell_type": "markdown", @@ -423,7 +423,7 @@ "Computational Linguistics, May 2022, pp. 95–136. doi:\n", "[10.18653/v1/2022.bigscience-1.9](https://doi.org/10.18653/v1/2022.bigscience-1.9)." ], - "id": "c1001ff8-ea42-40b8-92b1-91a0a39d6dbe" + "id": "e0211f42-cf01-42c7-b000-dde9e6d41909" } ], "nbformat": 4, diff --git a/sitemap.xml b/sitemap.xml index d78d305..905df04 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,18 +2,18 @@ https://confirmlabs.org/posts/catalog.html - 2024-01-12T11:49:20.565Z + 2024-01-12T11:50:28.740Z https://confirmlabs.org/posts/TDC2023.html - 2024-01-12T11:49:17.469Z + 2024-01-12T11:50:25.556Z https://confirmlabs.org/index.html - 2024-01-12T11:49:16.053Z + 2024-01-12T11:50:24.100Z https://confirmlabs.org/posts/fight_the_illusion.html - 2024-01-12T11:49:18.133Z + 2024-01-12T11:50:26.244Z