You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bug I met is similar to #203. I'm trying to reproduce the evaluation results of ablation model trained on FineWeb, using LightEval of commit_id=a98210fd3a2d1e8bface1c32b72ebd5017173a4c.
The MMLU result of step-5000/10000/15000/19000/24000 (namely, 5 ckpts from the first 50b consumed tokens) are as below:
I don't know what causes this gap, when debugging I discover that:
The last token of the prepared_batch is missing. Does this mean the evaluation results of fineweb blogpost is inaccurate?
The evaluation results became totally random guess for all ckpts. I suppose there are more lines to modify, or something else caused the gaps in my reproduction results.
The text was updated successfully, but these errors were encountered:
Using the latest repo (commit_id = 7261d80), I evaluated the same 5 ckpts again (red line in figure). The results are still below the official results.
And when examining the prepared_batch, the last token still seemed to be missing.
The bug I met is similar to #203. I'm trying to reproduce the evaluation results of ablation model trained on FineWeb, using LightEval of commit_id=a98210fd3a2d1e8bface1c32b72ebd5017173a4c.
The MMLU result of step-5000/10000/15000/19000/24000 (namely, 5 ckpts from the first 50b consumed tokens) are as below:

I don't know what causes this gap, when debugging I discover that:

The last token of the prepared_batch is missing. Does this mean the evaluation results of fineweb blogpost is inaccurate?
But when I delete
[:-1]
inlighteval/src/lighteval/models/base_model.py
Line 851 in aaa8bbf
The evaluation results became totally random guess for all ckpts. I suppose there are more lines to modify, or something else caused the gaps in my reproduction results.
The text was updated successfully, but these errors were encountered: