[BUG] Question on batch preparation in MMLU evaluation #288

JefferyChen453 · 2024-09-04T13:18:22Z

The bug I met is similar to #203. I'm trying to reproduce the evaluation results of ablation model trained on FineWeb, using LightEval of commit_id=a98210fd3a2d1e8bface1c32b72ebd5017173a4c.

The MMLU result of step-5000/10000/15000/19000/24000 (namely, 5 ckpts from the first 50b consumed tokens) are as below:

I don't know what causes this gap, when debugging I discover that:

The last token of the prepared_batch is missing. Does this mean the evaluation results of fineweb blogpost is inaccurate?

But when I delete [:-1] in

lighteval/src/lighteval/models/base_model.py

Line 851 in aaa8bbf

    
           request.tokenized_context + request.tokenized_continuation[:-1] for request in batch

The evaluation results became totally random guess for all ckpts. I suppose there are more lines to modify, or something else caused the gaps in my reproduction results.

The text was updated successfully, but these errors were encountered:

JefferyChen453 · 2024-09-05T11:19:10Z

I've tried adding the param add_special_tokens=True in config file but the last token is still missing

JefferyChen453 · 2024-09-09T10:57:23Z

Using the latest repo (commit_id = 7261d80), I evaluated the same 5 ckpts again (red line in figure). The results are still below the official results.

And when examining the prepared_batch, the last token still seemed to be missing.

My command:

accelerate launch --num_processes=1 -m \
    lighteval accelerate \
    --model_args="pretrained=/mnt/data/user/tc_agi/caijie/fineweb_models/ablation-model-fineweb-v1_5000,trust_remote_code=True" \
    --override_batch_size 128 \
    --custom_tasks "/data/fineweb-pipeline/lighteval-main/lighteval_tasks.py" \
    --output_dir "/data/fineweb-pipeline/lighteval-main/evals/" \
    --tasks "custom|mmlu:abstract_algebra|0|1"

clefourrier · 2024-09-14T09:04:25Z

Thanks for the report, we'll investigate!
cc @hynky1999 and @guipenedo for the fineweb aspect

JefferyChen453 added the bug Something isn't working label Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Question on batch preparation in MMLU evaluation #288

[BUG] Question on batch preparation in MMLU evaluation #288

JefferyChen453 commented Sep 4, 2024

JefferyChen453 commented Sep 5, 2024

JefferyChen453 commented Sep 9, 2024 •

edited

Loading

clefourrier commented Sep 14, 2024 •

edited

Loading

[BUG] Question on batch preparation in MMLU evaluation #288

[BUG] Question on batch preparation in MMLU evaluation #288

Comments

JefferyChen453 commented Sep 4, 2024

JefferyChen453 commented Sep 5, 2024

JefferyChen453 commented Sep 9, 2024 • edited Loading

clefourrier commented Sep 14, 2024 • edited Loading

JefferyChen453 commented Sep 9, 2024 •

edited

Loading

clefourrier commented Sep 14, 2024 •

edited

Loading