Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Jan 12, 2024
1 parent cbd10bd commit 742b692
Show file tree
Hide file tree
Showing 6 changed files with 17 additions and 17 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
a65188df
520d524d
6 changes: 3 additions & 3 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@

<div class="quarto-listing quarto-listing-container-grid" id="listing-listing">
<div class="list grid quarto-listing-cols-3">
<div class="g-col-1" data-index="0" data-listing-date-sort="1704326400000" data-listing-file-modified-sort="1705060145477" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="25">
<div class="g-col-1" data-index="0" data-listing-date-sort="1704326400000" data-listing-file-modified-sort="1705060212544" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="25">
<a href="./posts/TDC2023.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top"><img src="posts/TDC2023-sample-instances.png" style="height: 150px;" class="thumbnail-image card-img"/></p>
Expand All @@ -166,7 +166,7 @@ <h5 class="no-anchor card-title listing-title">
</div>
</a>
</div>
<div class="g-col-1" data-index="1" data-listing-date-sort="1701302400000" data-listing-file-modified-sort="1705060145497" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<div class="g-col-1" data-index="1" data-listing-date-sort="1701302400000" data-listing-file-modified-sort="1705060212564" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<a href="./posts/fight_the_illusion.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<div class="listing-item-img-placeholder card-img-top" style="height: 150px;">&nbsp;</div>
Expand All @@ -189,7 +189,7 @@ <h5 class="no-anchor card-title listing-title">
</div>
</a>
</div>
<div class="g-col-1" data-index="2" data-listing-date-sort="1687651200000" data-listing-file-modified-sort="1705060145497" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<div class="g-col-1" data-index="2" data-listing-date-sort="1687651200000" data-listing-file-modified-sort="1705060212564" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<a href="./posts/catalog.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top"><img src="posts/catalog_files/figure-html/cell-9-output-1.png" style="height: 150px;" class="thumbnail-image card-img"/></p>
Expand Down
6 changes: 3 additions & 3 deletions posts/TDC2023.html
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ <h1>Summary of Our Major Takeaways</h1>
</p>
<p><span class="math inline">\(-\textrm{mm}_{\omega} (-\log p(t_k | t_0, …, t_{i-1}), …, -\log p(t_{k+n} | t_0, …, t_{k+n-1}))\)</span></p></li>
<li><p><strong>Hyperparameter tuning of GCG was very useful. Compared to the default hyperparameters used in Zou et al.&nbsp;2023, we reduced our average optimizer runtime by ~7x. The average time to force an output sequence on a single A100 40GB went from 120 seconds to 17 seconds.</strong></p></li>
<li><p><strong>Presented benchmarks in some recent red-teaming &amp; optimization papers can be misleading. Attacks with GCG performed well, better than we had expected.</strong></p>
<li><p><strong>The benchmarks in some recent red-teaming &amp; optimization papers can be misleading. Attacks with GCG performed well, better than we had expected.</strong></p>
<p>Papers will often select a model/task combination that is very easy to red-team. Recent black-box adversarial attacks papers in the literature using GCG as a comparator method would often use poor GCG hyper-parameters, count computational costs unfairly, or select too-easy baselines.</p>
<ul>
<li>For example, the gradient-based AutoDAN-Zhu (Zhu et al 2023) benchmarks appear favorable at a glance, but they omit well-safety-trained models like Llama-2-chat and mention in the appendix that their method struggles on it. Llama-2-chat seems to be one of the hardest models to crack.</li>
Expand Down Expand Up @@ -398,7 +398,7 @@ <h4 class="anchored" data-anchor-id="although-we-struggled-to-use-activation-eng
<p>to compare/rank different sequences of tokens. Since <span class="math inline">\(u_i\)</span> is now a scalar for each x, given a collection of such x’s we can construct a z-score for our dataset as <span class="math inline">\((u_i - mean(u_i))/std(u_i)\)</span>, and rank them.</p>
<div class="quarto-figure quarto-figure-left">
<figure class="figure">
<p><a href="TDC2023-sample-instances.png" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="The Z-scores of activation vector similarity for the provided sample instances"><img src="TDC2023-sample-instances.png" class="img-fluid figure-img" style="width:60.0%"></a></p>
<p><a href="TDC2023-sample-instances.png" class="lightbox" title="The Z-scores of activation vector similarity for the provided sample instances" data-gallery="quarto-lightbox-gallery-1"><img src="TDC2023-sample-instances.png" class="img-fluid figure-img" style="width:60.0%"></a></p>
<figcaption class="figure-caption">The Z-scores of activation vector similarity for the provided sample instances</figcaption>
</figure>
</div>
Expand Down Expand Up @@ -718,7 +718,7 @@ <h4 class="anchored" data-anchor-id="tricks-that-we-found-to-improve-performance
});
</script>
</div> <!-- /content -->
<script>var lightboxQuarto = GLightbox({"selector":".lightbox","openEffect":"zoom","closeEffect":"zoom","descPosition":"bottom","loop":true});</script>
<script>var lightboxQuarto = GLightbox({"loop":true,"selector":".lightbox","openEffect":"zoom","descPosition":"bottom","closeEffect":"zoom"});</script>



Expand Down
2 changes: 1 addition & 1 deletion posts/catalog.html
Original file line number Diff line number Diff line change
Expand Up @@ -814,7 +814,7 @@ <h2 class="anchored" data-anchor-id="github">GitHub</h2>
});
</script>
</div> <!-- /content -->
<script>var lightboxQuarto = GLightbox({"openEffect":"zoom","descPosition":"bottom","loop":true,"closeEffect":"zoom","selector":".lightbox"});</script>
<script>var lightboxQuarto = GLightbox({"closeEffect":"zoom","descPosition":"bottom","openEffect":"zoom","loop":true,"selector":".lightbox"});</script>



Expand Down
10 changes: 5 additions & 5 deletions posts/catalog.out.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@
"Pythia-12B is miscalibrated on 20% of the bigrams and 45% of the\n",
"trigrams when we ask for prediction of $p \\geq 0.45$."
],
"id": "f8435ac4-ab16-42b5-97e5-61de950e8321"
"id": "72ea1b0f-9434-4c67-903f-d78c5236b4bc"
},
{
"cell_type": "code",
Expand All @@ -313,7 +313,7 @@
}
],
"source": [],
"id": "4fb84275-0c73-444d-9a06-724d3e3fe2da"
"id": "30d01822-25dd-4005-8691-c6d39286b28a"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -377,7 +377,7 @@
"The dataset is available on Huggingface:\n",
"[pile_scan_4](https://huggingface.co/datasets/Confirm-Labs/pile_scan_4)"
],
"id": "11074994-84b4-49df-a833-f6e03b064d7e"
"id": "390a8236-ac9a-4571-84bd-a904d3b687d8"
},
{
"cell_type": "code",
Expand All @@ -391,7 +391,7 @@
}
],
"source": [],
"id": "3fe4554b-080a-4453-8a59-60cafaec7875"
"id": "91ecfef7-31c7-40fc-b0b8-80e33bba3244"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -423,7 +423,7 @@
"Computational Linguistics, May 2022, pp. 95–136. doi:\n",
"[10.18653/v1/2022.bigscience-1.9](https://doi.org/10.18653/v1/2022.bigscience-1.9).</span>"
],
"id": "c1001ff8-ea42-40b8-92b1-91a0a39d6dbe"
"id": "e0211f42-cf01-42c7-b000-dde9e6d41909"
}
],
"nbformat": 4,
Expand Down
8 changes: 4 additions & 4 deletions sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://confirmlabs.org/posts/catalog.html</loc>
<lastmod>2024-01-12T11:49:20.565Z</lastmod>
<lastmod>2024-01-12T11:50:28.740Z</lastmod>
</url>
<url>
<loc>https://confirmlabs.org/posts/TDC2023.html</loc>
<lastmod>2024-01-12T11:49:17.469Z</lastmod>
<lastmod>2024-01-12T11:50:25.556Z</lastmod>
</url>
<url>
<loc>https://confirmlabs.org/index.html</loc>
<lastmod>2024-01-12T11:49:16.053Z</lastmod>
<lastmod>2024-01-12T11:50:24.100Z</lastmod>
</url>
<url>
<loc>https://confirmlabs.org/posts/fight_the_illusion.html</loc>
<lastmod>2024-01-12T11:49:18.133Z</lastmod>
<lastmod>2024-01-12T11:50:26.244Z</lastmod>
</url>
</urlset>

0 comments on commit 742b692

Please sign in to comment.