Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Question on batch preparation in MMLU evaluation #288

Open
JefferyChen453 opened this issue Sep 4, 2024 · 3 comments
Open

[BUG] Question on batch preparation in MMLU evaluation #288

JefferyChen453 opened this issue Sep 4, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@JefferyChen453
Copy link

The bug I met is similar to #203. I'm trying to reproduce the evaluation results of ablation model trained on FineWeb, using LightEval of commit_id=a98210fd3a2d1e8bface1c32b72ebd5017173a4c.

The MMLU result of step-5000/10000/15000/19000/24000 (namely, 5 ckpts from the first 50b consumed tokens) are as below:
img_v3_02ed_4a3641b1-a270-4aed-84cc-9b47ce4447eg

I don't know what causes this gap, when debugging I discover that:
image

The last token of the prepared_batch is missing. Does this mean the evaluation results of fineweb blogpost is inaccurate?

But when I delete [:-1] in

request.tokenized_context + request.tokenized_continuation[:-1] for request in batch

The evaluation results became totally random guess for all ckpts. I suppose there are more lines to modify, or something else caused the gaps in my reproduction results.

@JefferyChen453 JefferyChen453 added the bug Something isn't working label Sep 4, 2024
@JefferyChen453
Copy link
Author

I've tried adding the param add_special_tokens=True in config file but the last token is still missing

@JefferyChen453
Copy link
Author

JefferyChen453 commented Sep 9, 2024

Using the latest repo (commit_id = 7261d80), I evaluated the same 5 ckpts again (red line in figure). The results are still below the official results.
plot_mmlu_acc_norm

And when examining the prepared_batch, the last token still seemed to be missing.
image

My command:

accelerate launch --num_processes=1 -m \
    lighteval accelerate \
    --model_args="pretrained=/mnt/data/user/tc_agi/caijie/fineweb_models/ablation-model-fineweb-v1_5000,trust_remote_code=True" \
    --override_batch_size 128 \
    --custom_tasks "/data/fineweb-pipeline/lighteval-main/lighteval_tasks.py" \
    --output_dir "/data/fineweb-pipeline/lighteval-main/evals/" \
    --tasks "custom|mmlu:abstract_algebra|0|1"

@clefourrier
Copy link
Member

clefourrier commented Sep 14, 2024

Thanks for the report, we'll investigate!
cc @hynky1999 and @guipenedo for the fineweb aspect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants
@clefourrier @JefferyChen453 and others