-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excuse me, who can reproduce the numerical values of msvdqa data used in the paper (top 1 accuracy 60%) #27
Comments
I feel like I might have missed something somewhere, let me take a closer look |
I can roughly achieve ~60% accuracy on msvd. |
But i can only get ~42% on msrvtt. |
I think part of the reason is the way the dataset is processed. Are you using the annotations provided by the author? |
Yes, i use the annotations provided by the author. Maybe the problem is related to this. |
Many thanks for your |
You may simply reduce the total num_frames by 1 or 2 in the dataset.py for each dataset. |
Following this #3 (comment). You can update the "frame_length" to your actual extracted frame length for each video in the annotation file. |
hello, I met the same problem, may I ask if you reproduce the value in the paper now? |
I just used an A800 and changed the batch size to 32. The other parameters are consistent with the appendix of the paper. Why can I only achieve 53%
The text was updated successfully, but these errors were encountered: