Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Nov 29, 2023
1 parent 8610267 commit f7b5f19
Show file tree
Hide file tree
Showing 7 changed files with 15 additions and 15 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
fd316aa3
a5fa4430
2 changes: 1 addition & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@

<div class="quarto-listing quarto-listing-container-grid" id="listing-listing">
<div class="list grid quarto-listing-cols-3">
<div class="g-col-1" data-index="0" data-listing-date-sort="1687651200000" data-listing-file-modified-sort="1701229445824" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<div class="g-col-1" data-index="0" data-listing-date-sort="1687651200000" data-listing-file-modified-sort="1701229501419" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<a href="./posts/catalog.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top"><img src="posts/catalog_files/figure-html/cell-9-output-1.png" style="height: 150px;" class="thumbnail-image card-img"/></p>
Expand Down
2 changes: 1 addition & 1 deletion posts/catalog.html
Original file line number Diff line number Diff line change
Expand Up @@ -798,7 +798,7 @@ <h2 class="anchored" data-anchor-id="github">GitHub</h2>
});
</script>
</div> <!-- /content -->
<script>var lightboxQuarto = GLightbox({"selector":".lightbox","closeEffect":"zoom","loop":true,"openEffect":"zoom","descPosition":"bottom"});</script>
<script>var lightboxQuarto = GLightbox({"selector":".lightbox","descPosition":"bottom","closeEffect":"zoom","loop":true,"openEffect":"zoom"});</script>



Expand Down
10 changes: 5 additions & 5 deletions posts/catalog.out.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@
"Pythia-12B is miscalibrated on 20% of the bigrams and 45% of the\n",
"trigrams when we ask for prediction of $p \\geq 0.45$."
],
"id": "8166b4e1-fe28-4a80-b13b-40e262561d90"
"id": "10170b94-e26d-44e3-a77a-aaf9dc15a2c5"
},
{
"cell_type": "code",
Expand All @@ -313,7 +313,7 @@
}
],
"source": [],
"id": "9a5d22eb-e826-4e06-8620-b2e9b3e9812e"
"id": "819430fe-bdf0-409c-ab58-fa645952a489"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -375,7 +375,7 @@
"The dataset is available on Huggingface:\n",
"[pile_scan_4](https://huggingface.co/datasets/Confirm-Labs/pile_scan_4)"
],
"id": "c2618577-5464-4db9-a381-ebcb860c4b24"
"id": "9f0f2e6b-2702-4164-85da-3d4910a53bee"
},
{
"cell_type": "code",
Expand All @@ -389,7 +389,7 @@
}
],
"source": [],
"id": "d49b8ca6-83a7-4d0e-93a6-c071bdeb33d2"
"id": "94ad5779-f7f6-45e9-a312-4fc7fcb0b068"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -419,7 +419,7 @@
"Charles Foster, Jason Phang, et al. 2020. “The Pile: An 800GB Dataset of\n",
"Diverse Text for Language Modeling.” *arXiv Preprint arXiv:2101.00027*."
],
"id": "9be8445b-e0de-44ee-add5-30d2114b7acc"
"id": "42230dda-9a8a-4433-ab56-d17fab69aead"
}
],
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion posts/fight_the_illusion.html
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ <h1 class="title">6 Ways to Fight the Interpretability Illusion</h1>
<li><a href="https://www.lesswrong.com/posts/RFtkRXHebkwxygDe2/an-interpretability-illusion-for-activation-patching-of">An Interpretability Illusion for Activation Patching of Arbitrary Subspaces</a>.</li>
<li>The corresponding <a href="https://openreview.net/forum?id=Ebt7JgMHv1">ICLR paper, “Is This the Subspace You Are Looking For?</a></li>
</ul>
<p>__</p>
<hr>
<p>This post is motivated by Lange, Makelov, and Nanda’s LessWrong post <a href="https://www.lesswrong.com/posts/RFtkRXHebkwxygDe2/an-interpretability-illusion-for-activation-patching-of">Interpretability Illusion for Activation Patching</a> and <a href="https://openreview.net/forum?id=Ebt7JgMHv1">ICLR paper</a>. They study <a href="https://arxiv.org/abs/2303.02536">Geiger et al’s DAS</a> method, which uses optimization to identify an abstracted causal model with a small subset of dimensions in a neural network’s residual stream or internal MLP layer. Their results show that DAS can, depending on the situation, turn up both “correct” and “spurious” findings on the train-set. From the investigations in the <a href="https://openreview.net/forum?id=Ebt7JgMHv1">ICLR paper</a> and conversations with a few researchers, my understanding is these “spurious” directions have not performed well on held-out generalization sets, so in practice it is easy to distinguish the “illusions” from “real effects”. But, I am interested in developing even stronger optimize-to-interpret methods. With more powerful optimizers, illusion effects should be even stronger, and competition from spurious signals may make true signals harder to locate in training. So, here are 6 possible ways to fight against the interpretability illusion. Most of them can be tried in combination.</p>
<ol type="1">
<li><strong>The causal model still holds, and may still be what we want.</strong>: We call it an interpretability <em>illusion</em> because we are failing to describe the model’s normal functioning. But unusual functioning is fine for some goals! Applications include:
Expand Down
8 changes: 4 additions & 4 deletions posts/fight_the_illusion.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"Michael Sklar \n",
"2023-11-28"
],
"id": "424fbf11-ccd4-4568-ab4e-fddffb1f3ad2"
"id": "91cbd9f0-1b56-481f-9797-9e9e4ea86ef9"
},
{
"cell_type": "raw",
Expand Down Expand Up @@ -38,7 +38,7 @@
"* Source doc: 6 ways to fight the Interpretability illusion\n",
"----->"
],
"id": "84ddb8d9-6186-4cfe-910e-6a7a3d14517f"
"id": "f0e12637-658b-4d98-880f-9d1548981580"
},
{
"cell_type": "markdown",
Expand All @@ -53,7 +53,7 @@
"- The corresponding [ICLR paper, “Is This the Subspace You Are Looking\n",
" For?](https://openreview.net/forum?id=Ebt7JgMHv1)”\n",
"\n",
"\\_\\_\n",
"------------------------------------------------------------------------\n",
"\n",
"This post is motivated by Lange, Makelov, and Nanda’s LessWrong post\n",
"[Interpretability Illusion for Activation\n",
Expand Down Expand Up @@ -199,7 +199,7 @@
"Thanks to Atticus Geiger, Jing Huang, Ben Thompson, Zygimantas\n",
"Straznickas and others for conversations and feedback on earlier drafts."
],
"id": "3498b614-95a8-4bef-8255-56b161af2e4d"
"id": "e3dba39b-5650-40d7-ab8d-92969230deb7"
}
],
"nbformat": 4,
Expand Down
4 changes: 2 additions & 2 deletions sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://confirmlabs.org/index.html</loc>
<lastmod>2023-11-29T03:44:16.316Z</lastmod>
<lastmod>2023-11-29T03:45:11.999Z</lastmod>
</url>
<url>
<loc>https://confirmlabs.org/posts/catalog.html</loc>
<lastmod>2023-11-29T03:44:19.160Z</lastmod>
<lastmod>2023-11-29T03:45:14.831Z</lastmod>
</url>
</urlset>

0 comments on commit f7b5f19

Please sign in to comment.