Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Jan 13, 2024
1 parent 2968fe9 commit 105f524
Show file tree
Hide file tree
Showing 6 changed files with 17 additions and 18 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
dafc4add
0e5b2c24
6 changes: 3 additions & 3 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@

<div class="quarto-listing quarto-listing-container-grid" id="listing-listing">
<div class="list grid quarto-listing-cols-3">
<div class="g-col-1" data-index="0" data-listing-date-sort="1705104000000" data-listing-file-modified-sort="1705146893489" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="25">
<div class="g-col-1" data-index="0" data-listing-date-sort="1705104000000" data-listing-file-modified-sort="1705146960541" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="25">
<a href="./posts/TDC2023.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top"><img src="posts/TDC2023-sample-instances.png" style="height: 150px;" class="thumbnail-image card-img"/></p>
Expand All @@ -166,7 +166,7 @@ <h5 class="no-anchor card-title listing-title">
</div>
</a>
</div>
<div class="g-col-1" data-index="1" data-listing-date-sort="1701302400000" data-listing-file-modified-sort="1705146893509" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<div class="g-col-1" data-index="1" data-listing-date-sort="1701302400000" data-listing-file-modified-sort="1705146960561" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<a href="./posts/fight_the_illusion.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<div class="listing-item-img-placeholder card-img-top" style="height: 150px;">&nbsp;</div>
Expand All @@ -189,7 +189,7 @@ <h5 class="no-anchor card-title listing-title">
</div>
</a>
</div>
<div class="g-col-1" data-index="2" data-listing-date-sort="1687651200000" data-listing-file-modified-sort="1705146893509" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<div class="g-col-1" data-index="2" data-listing-date-sort="1687651200000" data-listing-file-modified-sort="1705146960561" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<a href="./posts/catalog.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top"><img src="posts/catalog_files/figure-html/cell-9-output-1.png" style="height: 150px;" class="thumbnail-image card-img"/></p>
Expand Down
7 changes: 3 additions & 4 deletions posts/TDC2023.html
Original file line number Diff line number Diff line change
Expand Up @@ -347,9 +347,8 @@ <h4 class="anchored" data-anchor-id="the-models-may-have-developed-internal-geom
<p>Assume we are trying to find triggers for some payload <span class="math inline">\(s_2\)</span>. Take a completely unrelated known trigger-payload pair <span class="math inline">\((p_1, s_1)\)</span>, such that trigger <span class="math inline">\(p_1\)</span> yields a different payload <span class="math inline">\(s_1\)</span>. Then, while optimizing for payload <span class="math inline">\(s_2\)</span>, initialize the optimization at the point <span class="math inline">\(x = p_1\)</span>. This turns out to speed up the process of finding a trigger for <span class="math inline">\(s_2\)</span>, often with far fewer iterations than if we had initialized with random tokens or text from the Pile.</p>
<p>Somehow, GCG’s first-order approximation (which it uses to select candidate mutations) is accurate enough to rapidly descend in this setting. In some cases, payload <span class="math inline">\(s_2\)</span> could be produced with <em>only 1-3 optimizer iterations</em> starting from trigger <span class="math inline">\(p_1\)</span>. We were very surprised by this. Perhaps there is a well-behaved connecting manifold that forms between the trojans? <strong>If we were to continue attempting to reverse engineer trojan insertion, understanding this phenomenon is where we would start.</strong></p>
</section>
<section id="section" class="level4">
<h4 class="anchored" data-anchor-id="section">5.</h4>
<p>For some additional details on our investigations, see <a href="https://zygi.me/blog/adventures-in-trojan-detection/#open-questions">Zygi’s personal site</a></p>
<section id="for-some-additional-details-on-our-investigations-see-zygis-personal-site" class="level4">
<h4 class="anchored" data-anchor-id="for-some-additional-details-on-our-investigations-see-zygis-personal-site">5. For some additional details on our investigations, see <a href="https://zygi.me/blog/adventures-in-trojan-detection/#open-questions">Zygi’s personal site</a></h4>
</section>
</section>
<section id="red-teaming-track-takeaways" class="level1">
Expand Down Expand Up @@ -722,7 +721,7 @@ <h4 class="anchored" data-anchor-id="tricks-that-we-found-to-improve-performance
});
</script>
</div> <!-- /content -->
<script>var lightboxQuarto = GLightbox({"closeEffect":"zoom","loop":true,"selector":".lightbox","openEffect":"zoom","descPosition":"bottom"});</script>
<script>var lightboxQuarto = GLightbox({"selector":".lightbox","closeEffect":"zoom","loop":true,"descPosition":"bottom","openEffect":"zoom"});</script>



Expand Down
2 changes: 1 addition & 1 deletion posts/catalog.html
Original file line number Diff line number Diff line change
Expand Up @@ -814,7 +814,7 @@ <h2 class="anchored" data-anchor-id="github">GitHub</h2>
});
</script>
</div> <!-- /content -->
<script>var lightboxQuarto = GLightbox({"closeEffect":"zoom","selector":".lightbox","openEffect":"zoom","descPosition":"bottom","loop":true});</script>
<script>var lightboxQuarto = GLightbox({"openEffect":"zoom","loop":true,"closeEffect":"zoom","selector":".lightbox","descPosition":"bottom"});</script>



Expand Down
10 changes: 5 additions & 5 deletions posts/catalog.out.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@
"Pythia-12B is miscalibrated on 20% of the bigrams and 45% of the\n",
"trigrams when we ask for prediction of $p \\geq 0.45$."
],
"id": "1ec3c3b4-b659-498e-bc68-f2a8bbe54777"
"id": "062529bd-af2a-4a3e-b7ed-35bdc7bc1382"
},
{
"cell_type": "code",
Expand All @@ -313,7 +313,7 @@
}
],
"source": [],
"id": "0e4f5b15-f23e-45cc-abf7-944935ed0841"
"id": "5b05fa42-f1e3-450f-a99c-6b483129088a"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -377,7 +377,7 @@
"The dataset is available on Huggingface:\n",
"[pile_scan_4](https://huggingface.co/datasets/Confirm-Labs/pile_scan_4)"
],
"id": "5ed0ab19-3c14-441e-adf5-ee50406f84cf"
"id": "f7ec37af-d06f-4c5f-89ce-610af6a75527"
},
{
"cell_type": "code",
Expand All @@ -391,7 +391,7 @@
}
],
"source": [],
"id": "38552396-a4f9-4da2-8ee5-71390ae67081"
"id": "5415a94a-58f3-40b3-a3bd-1907a8154425"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -423,7 +423,7 @@
"Computational Linguistics, May 2022, pp. 95–136. doi:\n",
"[10.18653/v1/2022.bigscience-1.9](https://doi.org/10.18653/v1/2022.bigscience-1.9).</span>"
],
"id": "6fc1d3c5-b97f-4c22-ae18-e8eb76855546"
"id": "be2e28c3-b4fe-4345-abac-4bbf28f5bfe5"
}
],
"nbformat": 4,
Expand Down
8 changes: 4 additions & 4 deletions sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://confirmlabs.org/posts/catalog.html</loc>
<lastmod>2024-01-13T11:55:08.657Z</lastmod>
<lastmod>2024-01-13T11:56:28.113Z</lastmod>
</url>
<url>
<loc>https://confirmlabs.org/posts/TDC2023.html</loc>
<lastmod>2024-01-13T11:55:05.549Z</lastmod>
<lastmod>2024-01-13T11:56:24.957Z</lastmod>
</url>
<url>
<loc>https://confirmlabs.org/index.html</loc>
<lastmod>2024-01-13T11:55:04.121Z</lastmod>
<lastmod>2024-01-13T11:56:23.513Z</lastmod>
</url>
<url>
<loc>https://confirmlabs.org/posts/fight_the_illusion.html</loc>
<lastmod>2024-01-13T11:55:06.221Z</lastmod>
<lastmod>2024-01-13T11:56:25.629Z</lastmod>
</url>
</urlset>

0 comments on commit 105f524

Please sign in to comment.