fp-16 comment

Confirm-Solutions · Jan 8, 2024 · 8a65f9f · 8a65f9f
1 parent bd7de1b
commit 8a65f9f
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/posts/TDC2023.md b/posts/TDC2023.md
@@ -146,7 +146,7 @@ If you want to get up to speed, we recommend this [Lil’log post](https://lilia
 
 #### 1. **Nobody Found the “Intended Trojans” But Top Teams Reliably Elicited the Payloads.**
 
- Using GCG, we successfully elicited 100% of the payloads. Other top-performing teams used similar approaches with similar success! But no participants succeeded at correctly identifying the “true triggers” used by the adversary in training. Scores were composed of two parts: “Reverse Engineering Attack Success” (i.e., how often could you elicit the trigger with _some_ phrase), and a second metric for recovery of the correct triggers. Performance on the recall metric with random inputs seems to yield about ~14-16% score, due to luck-based collisions with the true tokens. [Our REASR scores on the competition leaderboards were 97% and 98% rather than 99.9 - 100% on our side. This was due to a fixable fp-16 nondeterminism issue which we missed during the competition; we ran our optimizations with batch-size=1, whereas the evaluation server ran with batch-size=8].
+ Using GCG, we successfully elicited 100% of the payloads. Other top-performing teams used similar approaches with similar success! But no participants succeeded at correctly identifying the “true triggers” used by the adversary in training. Scores were composed of two parts: “Reverse Engineering Attack Success Rate” (REASR) (i.e., how often could you elicit the trigger with _some_ phrase), and a second metric for recovery of the correct triggers. Performance on the recall metric with random inputs seems to yield about ~14-16% score, due to luck-based collisions with the true tokens. [Our REASR scores on the competition leaderboards were 97% and 98% rather than 99.9 - 100% on our side. This was due to a fixable fp-16 nondeterminism issue, which we missed  because test server scores were hidden until after the competition].
 
 #### 2. **Reverse Engineering Trojans "In Practice" Seems Quite Hard.**