You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For paper replication purposes, in the Readme, it is stated that for "Every metric was collected by running the experiment 10 times separately and calculating the average value." Is this done only for collecting the training/inference speed/gpu-usage, or is this also applicable for getting the reported task-specific (i.e. arc-e, boolq, etc.) accuracy scores?
The text was updated successfully, but these errors were encountered:
We collected these performance metrics by running the evaluation 10 separate times and calculating the average values. For accuracy scores, running multiple trials is unnecessary because our code ensures reproducibility when the random seed is fixed in the same environment and on the same device.
mikecovlee
changed the title
Rerunning Experiments for Replicating Paper Results
[Reproduction] Rerunning Experiments for Replicating Paper Results
Dec 8, 2024
mikecovlee
changed the title
[Reproduction] Rerunning Experiments for Replicating Paper Results
[help] Rerunning Experiments for Replicating Paper Results
Dec 21, 2024
For paper replication purposes, in the Readme, it is stated that for "Every metric was collected by running the experiment 10 times separately and calculating the average value." Is this done only for collecting the training/inference speed/gpu-usage, or is this also applicable for getting the reported task-specific (i.e. arc-e, boolq, etc.) accuracy scores?
The text was updated successfully, but these errors were encountered: