-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Charades Dataset Loading #17
Comments
Hello Nemio Lipi, Sorry for late reply. Didn't pay attention to the notification of github. Yes, 45 segments instead of 128 would affect the performance. To reproduce the results of the paper, you have to randomly sample new frames each epoch. Please note that you need to sample features before training each epoch And look here to see how to sample the frames. Uniform (equdistant) sampling is done for test videos only But random sampling is done for training videos, and you have to uniformly sample segments before each epoch. Sample only segments, but don't sample frames in each segment. Each segment should contain 8 successive frames. And here how to extract the features So, to answer your question directly. Yes, if you train on pre-defined features the performance drops significantly. Because Timeception layers need to see new features of new segments each training epoch. However, there is a trick that might alleviate this overhead. Do the following:
|
Thanks a lot for the response. As the number of frames may be very large, wouldn't the last trick you mentioned cause OOM problems? |
What do you mean by OOM problem? |
Hi @noureldien, Its really nice work and a good presentation at CVPR-19 by Efstratios Gavves. And thanks for sharing the code. I have a couple of queries regarding the timeception paper and dataloading:
Thanks, |
Hi,
Thanks for sharing your code. Have you sampled all the videos of the charades dataset to have 1024 frames before loading? This procedure may take a lot of memory. Is'nt it possible to upsample the resulted feature maps of the original 25fps sampling videos on the provided pretrained I3D to have 128,7,7,1024 instead of e.g. 45,7,7,1024? Would it affect the performance of timeception afterwards?
The text was updated successfully, but these errors were encountered: