-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bench: Add a benchmark for vLM: MMMU #3562
Conversation
e5f7884
to
21b3a5b
Compare
772ce3b
to
67f81f1
Compare
b88293b
to
6d44c37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mickqian, I leave some comments, all of them are related to path. I also try to run bench_hf.py
, but it seems to cause OOM, I am not sure if it is normal.
Here is qwen2vl and qwen2.5vl results in my env:
qwen2vl
{"Overall-Art and Design": {"num": 120, "acc": 0.317}, "Art": {"num": 30, "acc": 0.4}, "Art_Theory": {"num": 30, "acc": 0.367}, "Design": {"num": 30, "acc": 0.3}, "Music": {"num": 30, "acc": 0.2}, "Overall-Business": {"num": 150, "acc": 0.32}, "Accounting": {"num": 30, "acc": 0.333}, "Economics": {"num": 30, "acc": 0.3}, "Finance": {"num": 30, "acc": 0.2}, "Manage": {"num": 30, "acc": 0.267}, "Marketing": {"num": 30, "acc": 0.5}, "Overall-Science": {"num": 150, "acc": 0.333}, "Biology": {"num": 30, "acc": 0.367}, "Chemistry": {"num": 30, "acc": 0.167}, "Geography": {"num": 30, "acc": 0.333}, "Math": {"num": 30, "acc": 0.433}, "Physics": {"num": 30, "acc": 0.367}, "Overall-Health and Medicine": {"num": 150, "acc": 0.38}, "Basic_Medical_Science": {"num": 30, "acc": 0.433}, "Clinical_Medicine": {"num": 30, "acc": 0.433}, "Diagnostics_and_Laboratory_Medicine": {"num": 30, "acc": 0.133}, "Pharmacy": {"num": 30, "acc": 0.567}, "Public_Health": {"num": 30, "acc": 0.333}, "Overall-Humanities and Social Science": {"num": 120, "acc": 0.35}, "History": {"num": 30, "acc": 0.367}, "Literature": {"num": 30, "acc": 0.367}, "Sociology": {"num": 30, "acc": 0.267}, "Psychology": {"num": 30, "acc": 0.4}, "Overall-Tech and Engineering": {"num": 210, "acc": 0.267}, "Agriculture": {"num": 30, "acc": 0.233}, "Architecture_and_Engineering": {"num": 30, "acc": 0.3}, "Computer_Science": {"num": 30, "acc": 0.333}, "Electronics": {"num": 30, "acc": 0.167}, "Energy_and_Power": {"num": 30, "acc": 0.267}, "Materials": {"num": 30, "acc": 0.367}, "Mechanical_Engineering": {"num": 30, "acc": 0.2}, "Overall": {"num": 900, "acc": 0.323}}
qwen2.5vl
{"Overall-Art and Design": {"num": 120, "acc": 0.242}, "Art": {"num": 30, "acc": 0.2}, "Art_Theory": {"num": 30, "acc": 0.267}, "Design": {"num": 30, "acc": 0.3}, "Music": {"num": 30, "acc": 0.2}, "Overall-Business": {"num": 150, "acc": 0.3}, "Accounting": {"num": 30, "acc": 0.467}, "Economics": {"num": 30, "acc": 0.333}, "Finance": {"num": 30, "acc": 0.1}, "Manage": {"num": 30, "acc": 0.233}, "Marketing": {"num": 30, "acc": 0.367}, "Overall-Science": {"num": 150, "acc": 0.2}, "Biology": {"num": 30, "acc": 0.133}, "Chemistry": {"num": 30, "acc": 0.133}, "Geography": {"num": 30, "acc": 0.2}, "Math": {"num": 30, "acc": 0.3}, "Physics": {"num": 30, "acc": 0.233}, "Overall-Health and Medicine": {"num": 150, "acc": 0.267}, "Basic_Medical_Science": {"num": 30, "acc": 0.233}, "Clinical_Medicine": {"num": 30, "acc": 0.167}, "Diagnostics_and_Laboratory_Medicine": {"num": 30, "acc": 0.2}, "Pharmacy": {"num": 30, "acc": 0.367}, "Public_Health": {"num": 30, "acc": 0.367}, "Overall-Humanities and Social Science": {"num": 120, "acc": 0.242}, "History": {"num": 30, "acc": 0.167}, "Literature": {"num": 30, "acc": 0.333}, "Sociology": {"num": 30, "acc": 0.2}, "Psychology": {"num": 30, "acc": 0.267}, "Overall-Tech and Engineering": {"num": 210, "acc": 0.276}, "Agriculture": {"num": 30, "acc": 0.2}, "Architecture_and_Engineering": {"num": 30, "acc": 0.167}, "Computer_Science": {"num": 30, "acc": 0.233}, "Electronics": {"num": 30, "acc": 0.267}, "Energy_and_Power": {"num": 30, "acc": 0.367}, "Materials": {"num": 30, "acc": 0.433}, "Mechanical_Engineering": {"num": 30, "acc": 0.267}, "Overall": {"num": 900, "acc": 0.257}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can update How to Support a New vLM
in support_models.md
. Each time we post a new VLM, we need test this benchmark and compare with hf.
Yes it also leads to OOM in my case. It seems to me, that it's not very easy to apply tp for hf models without introducing any third-party libraries, any suggestions? |
updated |
I think it is cause by too large |
After hf OOM has been solved, this PR can be merged. @zhaochenyang20 can you take a look about doc? |
sure. on this know |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Will merge this today. @mickqian @yizhang2077 |
Motivation
Modifications
Checklist