How can I evaluate specific MME subtasks #490

whyisverysmart · 2025-01-06T08:08:44Z

Hi!

This evaluation framework is awesome.

But for MME benchmark, can I get the scores of some specific subtasks, e.g. ocr, count? Will this be a future enhancement? Otherwise I can only recalculate the scores from the evaluation results...

Thanks and look forward to your suggestions.

kcz358 · 2025-01-06T15:55:30Z

Hi, I think currently it is hard to do so because right now the aggregation can only return a float. Possibly the only possible solution is to log the score directly during aggregation. You can check videomme for examples. But we might check if we can improve it or not

whyisverysmart · 2025-01-07T02:29:23Z

Hi, I think currently it is hard to do so because right now the aggregation can only return a float. Possibly the only possible solution is to log the score directly during aggregation. You can check videomme for examples. But we might check if we can improve it or not

Thanks a lot! I will try this approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I evaluate specific MME subtasks #490

How can I evaluate specific MME subtasks #490

whyisverysmart commented Jan 6, 2025 •

edited

Loading

kcz358 commented Jan 6, 2025

whyisverysmart commented Jan 7, 2025

How can I evaluate specific MME subtasks #490

How can I evaluate specific MME subtasks #490

Comments

whyisverysmart commented Jan 6, 2025 • edited Loading

kcz358 commented Jan 6, 2025

whyisverysmart commented Jan 7, 2025

whyisverysmart commented Jan 6, 2025 •

edited

Loading