-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable qwen2vl video #2756
base: main
Are you sure you want to change the base?
Enable qwen2vl video #2756
Conversation
b780f00
to
6b4697e
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help if at this point we already sample the video at 1fps and resize the frames to 360x420 if those are bigger.
this 1fps sampling forces us to figure out the framerate, I think this is something we definitely want to do. Do you think we can import ffmpeg and call ffprobe for this?
We only convert to tokens the frames that we end up consuming for inference. qwen has this specific 'smart' logic and I think other models will have other logic. Where do you think is best to place the smart logic selection of qwen? I don't find it bad to be already in validation if afterwards we can launch other video models with the qwen frame selection logic, it can actually be interesting.
if we do it that way, once fetch_video selects the number of frames, when we want to estimate the number of tokens is way simpler and we do not depend on estimating the framerate.
b9707b9
to
32438fc
Compare
This PR is a work in progress that explores adding support for video inputs with Qwen2-VL. Thank you @mfarre for getting this effort started.
TODOS
video_url
supdate*
start server
send request