diff --git a/content/publication/franken2024sami.md b/content/publication/franken2024sami.md index eefe8e5..6ba0065 100644 --- a/content/publication/franken2024sami.md +++ b/content/publication/franken2024sami.md @@ -7,7 +7,7 @@ # 5 -> 'Thesis' title = "Self-supervised alignment with mutual information: Learning to follow principles without preference labels" -date = "2024-04-22" +date = "2024-10-08" authors = ["J. Fränken","E. Zelikman","R. Rafailov","K. Gandhi","T. Gerstenberg","N. D. Goodman"] publication_types = ["3"] publication_short = "_Advances in Neural Information Processing Systems_" diff --git a/docs/index.xml b/docs/index.xml index 69171ad..ebb4add 100644 --- a/docs/index.xml +++ b/docs/index.xml @@ -127,20 +127,20 @@ - Self-supervised alignment with mutual information: Learning to follow principles without preference labels - https://cicl.stanford.edu/publication/franken2024sami/ - Mon, 22 Apr 2024 00:00:00 +0000 + Procedural dilemma generation for evaluating moral reasoning in humans and language models + https://cicl.stanford.edu/publication/franken2024rails/ + Wed, 17 Apr 2024 00:00:00 +0000 - https://cicl.stanford.edu/publication/franken2024sami/ + https://cicl.stanford.edu/publication/franken2024rails/ - Procedural dilemma generation for evaluating moral reasoning in humans and language models - https://cicl.stanford.edu/publication/franken2024rails/ - Wed, 17 Apr 2024 00:00:00 +0000 + STaR-GATE: Teaching Language Models to Ask Clarifying Questions + https://cicl.stanford.edu/publication/andukuri2024stargate/ + Sun, 31 Mar 2024 00:00:00 +0000 - https://cicl.stanford.edu/publication/franken2024rails/ + https://cicl.stanford.edu/publication/andukuri2024stargate/ diff --git a/docs/member/tobias_gerstenberg/index.html b/docs/member/tobias_gerstenberg/index.html index 1638089..1561dbd 100644 --- a/docs/member/tobias_gerstenberg/index.html +++ b/docs/member/tobias_gerstenberg/index.html @@ -967,53 +967,6 @@

Publications

-

- - -
- - - (2024). - - Self-supervised alignment with mutual information: Learning to follow principles without preference labels. - Advances in Neural Information Processing Systems. - - - - -

- - - - - - Preprint - - - - - PDF - - - - - - - - - - - - - - - - - Github - - -

diff --git a/docs/publication/index.html b/docs/publication/index.html index f39141b..df4b561 100644 --- a/docs/publication/index.html +++ b/docs/publication/index.html @@ -1597,19 +1597,6 @@

Publications

- - - - - - - - - - - - - @@ -2517,65 +2504,6 @@

Publications

-
- -
- - - (2024). - - Self-supervised alignment with mutual information: Learning to follow principles without preference labels. - Advances in Neural Information Processing Systems. - - - - -

- - - - - - Preprint - - - - - PDF - - - - - - - - - - - - - - - - - Github - - - -

- -
- - -
- - - - - - -
diff --git a/docs/publication/index.xml b/docs/publication/index.xml index 11bc447..b0d1e21 100644 --- a/docs/publication/index.xml +++ b/docs/publication/index.xml @@ -129,15 +129,6 @@ - - Self-supervised alignment with mutual information: Learning to follow principles without preference labels - https://cicl.stanford.edu/publication/franken2024sami/ - Mon, 22 Apr 2024 00:00:00 +0000 - - https://cicl.stanford.edu/publication/franken2024sami/ - - - Procedural dilemma generation for evaluating moral reasoning in humans and language models https://cicl.stanford.edu/publication/franken2024rails/ diff --git a/docs/publication_types/3/index.html b/docs/publication_types/3/index.html index 5a605e6..80f7a06 100644 --- a/docs/publication_types/3/index.html +++ b/docs/publication_types/3/index.html @@ -311,19 +311,19 @@

Resource-rat

-

Self-supervised alignment with mutual information: Learning to follow principles without preference labels

+

Procedural dilemma generation for evaluating moral reasoning in humans and language models

- When prompting a language model (LM), users frequently expect the model to adhere to a set of behavioral principles across diverse tasks, such as producing insightful content while avoiding harmful or biased language. Instilling such principles into … + As AI systems like language models are increasingly integrated into decision-making processes affecting people's lives, it's critical to ensure that these systems have sound moral reasoning. To test whether they do, we need to develop systematic …
-

Procedural dilemma generation for evaluating moral reasoning in humans and language models

+

Anticipating the risks and benefits of counterfactual world simulation models

- As AI systems like language models are increasingly integrated into decision-making processes affecting people's lives, it's critical to ensure that these systems have sound moral reasoning. To test whether they do, we need to develop systematic … + This paper examines the transformative potential of Counterfactual World Simulation Models (CWSMs). CWSMs use pieces of multi-modal evidence, such as the CCTV footage or sound recordings of a road accident, to build a high-fidelity 3D reconstruction …
diff --git a/docs/publication_types/3/index.xml b/docs/publication_types/3/index.xml index 1c53827..14e9ad6 100644 --- a/docs/publication_types/3/index.xml +++ b/docs/publication_types/3/index.xml @@ -84,15 +84,6 @@ - - Self-supervised alignment with mutual information: Learning to follow principles without preference labels - https://cicl.stanford.edu/publication/franken2024sami/ - Mon, 22 Apr 2024 00:00:00 +0000 - - https://cicl.stanford.edu/publication/franken2024sami/ - - - Procedural dilemma generation for evaluating moral reasoning in humans and language models https://cicl.stanford.edu/publication/franken2024rails/ diff --git a/docs/publication_types/3/page/2/index.html b/docs/publication_types/3/page/2/index.html index 1465bb3..57ef31f 100644 --- a/docs/publication_types/3/page/2/index.html +++ b/docs/publication_types/3/page/2/index.html @@ -238,15 +238,6 @@

3

-
-

Anticipating the risks and benefits of counterfactual world simulation models

-
- - This paper examines the transformative potential of Counterfactual World Simulation Models (CWSMs). CWSMs use pieces of multi-modal evidence, such as the CCTV footage or sound recordings of a road accident, to build a high-fidelity 3D reconstruction … - -
-
-

Off The Rails: Procedural Dilemma Generation for Moral Reasoning

@@ -328,6 +319,15 @@

You are what y

+
+

A Semantics for Causing, Enabling, and Preventing Verbs Using Structural Causal Models

+
+ + When choosing how to describe what happened, we have a number of causal verbs at our disposal. In this paper, we develop a model-theoretic formal semantics for nine causal verbs that span the categories of CAUSE, ENABLE, and PREVENT. We use … + +
+
+