-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processing flow of MovieChat-1K_train #54
Comments
For each video in MovieChat-1K_train dataset, we average sample 8192 frames with eva_clip_g, set the image_size to 224 and store in hdf5. Our feature extraction data is as follow:
However, we didn't use the extracted feature to run MovieChat. I think the main difference is about frame reading in inference.py and frame encoding in moviechat.py. Hope this can be helpful to you! :) |
@Espere-1119-Song |
Hi @Espere-1119-Song ! |
Thank you for pointing out the issue! We apologize for any inconvenience caused. We are currently uploading the raw videos to Huggingface, and we expect to complete this by the weekend. |
Thanks |
We upload the raw videos of the training set :) |
Hi!
Could you provide the processing script or procedure for MovieChat-1K_train dataset? We plan to fine-tune our model on this dataset and need to ensure that pre-training phase follows the same processing procedure.
The text was updated successfully, but these errors were encountered: