Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to download raw videos from LVU dataset? #31

Open
huaiyi66 opened this issue Aug 8, 2024 · 7 comments
Open

How to download raw videos from LVU dataset? #31

huaiyi66 opened this issue Aug 8, 2024 · 7 comments

Comments

@huaiyi66
Copy link

huaiyi66 commented Aug 8, 2024

Hello, thank you for your excellent work.

When I tried to download the dataset from the LVU official link, I found that they did not provide the raw video, and many YouTube links are no longer available, how can I download the raw video of the LVU dataset?

I would appreciate it if you could provide the LVU datasets or download method.

@YingYellow
Copy link

+1

@boheumd
Copy link
Owner

boheumd commented Aug 10, 2024

Hi, I used the YouTube-dl to download the LVU dataset and some of the videos are unavailable.
I have uploaded my downloaded video to google drive and you can download LVU raw videos through this link.

@YingYellow
Copy link

Thank you for your reply. Do you use the YouTube-dl to download the COIN dataset? I found some of them are also unavailable.

@boheumd
Copy link
Owner

boheumd commented Aug 11, 2024

Thank you for your reply. Do you use the YouTube-dl to download the COIN dataset? I found some of them are also unavailable.

Yes, I also used YouTube-dl to download the COIN dataset and only around 10500 videos are available.

@jchsun1
Copy link

jchsun1 commented Sep 10, 2024

Thanks for your excellent work.
I have encountered some problems and hope to get your help.

  1. There are two compression method (based on frame-level and token-level) in you paper, how is frame-based compression implemented? How to calculate the similarity of two frames containing multiple tokens? By calculate the average of similarity all tokens for adjacent frames?
  2. Temporal ordering information is injected into the frame-level features by a position embedding layer in the paper, but I found it does not taken effect because the weights are set to 0.
    blip2_vicuna_instruct.py - line 113 and 114

@boheumd
Copy link
Owner

boheumd commented Sep 12, 2024

Thanks for your excellent work. I have encountered some problems and hope to get your help.

  1. There are two compression method (based on frame-level and token-level) in you paper, how is frame-based compression implemented? How to calculate the similarity of two frames containing multiple tokens? By calculate the average of similarity all tokens for adjacent frames?
  2. Temporal ordering information is injected into the frame-level features by a position embedding layer in the paper, but I found it does not taken effect because the weights are set to 0.
    blip2_vicuna_instruct.py - line 113 and 114
  1. The frame-based compression is not included in the published code. The idea is to first flatten multiple tokens' features into one dimension and compute the cosine similarity between two frames.
  2. The temporal embedding weights are first initialized to 0. It will get updated through model training.

@jchsun1
Copy link

jchsun1 commented Sep 12, 2024

Thanks for your reply. I run the model successfully,but the temporal embedding weights do not work. Could you tell me how can I get the pre-trained temporal embedding weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants