Skip to content

Commit

Permalink
fp-16 comment
Browse files Browse the repository at this point in the history
  • Loading branch information
mikesklar committed Jan 8, 2024
1 parent bd7de1b commit 8a65f9f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion posts/TDC2023.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ If you want to get up to speed, we recommend this [Lil’log post](https://lilia

#### 1. **Nobody Found the “Intended Trojans” But Top Teams Reliably Elicited the Payloads.**

Using GCG, we successfully elicited 100% of the payloads. Other top-performing teams used similar approaches with similar success! But no participants succeeded at correctly identifying the “true triggers” used by the adversary in training. Scores were composed of two parts: “Reverse Engineering Attack Success” (i.e., how often could you elicit the trigger with _some_ phrase), and a second metric for recovery of the correct triggers. Performance on the recall metric with random inputs seems to yield about ~14-16% score, due to luck-based collisions with the true tokens. [Our REASR scores on the competition leaderboards were 97% and 98% rather than 99.9 - 100% on our side. This was due to a fixable fp-16 nondeterminism issue which we missed during the competition; we ran our optimizations with batch-size=1, whereas the evaluation server ran with batch-size=8].
Using GCG, we successfully elicited 100% of the payloads. Other top-performing teams used similar approaches with similar success! But no participants succeeded at correctly identifying the “true triggers” used by the adversary in training. Scores were composed of two parts: “Reverse Engineering Attack Success Rate” (REASR) (i.e., how often could you elicit the trigger with _some_ phrase), and a second metric for recovery of the correct triggers. Performance on the recall metric with random inputs seems to yield about ~14-16% score, due to luck-based collisions with the true tokens. [Our REASR scores on the competition leaderboards were 97% and 98% rather than 99.9 - 100% on our side. This was due to a fixable fp-16 nondeterminism issue, which we missed because test server scores were hidden until after the competition].

#### 2. **Reverse Engineering Trojans "In Practice" Seems Quite Hard.**

Expand Down

0 comments on commit 8a65f9f

Please sign in to comment.