Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Info] Models that natively support audio and video #446

Open
gzhihongwei opened this issue Dec 5, 2024 · 2 comments
Open

[Info] Models that natively support audio and video #446

gzhihongwei opened this issue Dec 5, 2024 · 2 comments

Comments

@gzhihongwei
Copy link

Hi there,

Are there any other models that natively support audio and video besides the Gemini API? I'm aware that VideoMME uses captions as a means of assessing the models that support video and text, but seems like there is no model besides Gemini Pro that supports audio and video natively? If there aren't any more, are there any models that you suggest benchmarking (i.e. besides the ones in VideoMME)?

Thanks,
George

@gzhihongwei
Copy link
Author

Also, is there any plan to add support for models that support audio and video?

@kcz358
Copy link
Collaborator

kcz358 commented Dec 25, 2024

There will be such plan in the future but it will be added slowly. I think the most recent progress is that #461 that supports the first video+audio benchmark and model (Gemini)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants