How to download raw videos from LVU dataset？ #31

huaiyi66 · 2024-08-08T06:28:47Z

Hello, thank you for your excellent work.

When I tried to download the dataset from the LVU official link, I found that they did not provide the raw video, and many YouTube links are no longer available, how can I download the raw video of the LVU dataset?

I would appreciate it if you could provide the LVU datasets or download method.

YingYellow · 2024-08-08T22:58:45Z

+1

boheumd · 2024-08-10T14:55:23Z

Hi, I used the YouTube-dl to download the LVU dataset and some of the videos are unavailable.
I have uploaded my downloaded video to google drive and you can download LVU raw videos through this link.

YingYellow · 2024-08-10T20:05:14Z

Thank you for your reply. Do you use the YouTube-dl to download the COIN dataset? I found some of them are also unavailable.

boheumd · 2024-08-11T00:51:59Z

Thank you for your reply. Do you use the YouTube-dl to download the COIN dataset? I found some of them are also unavailable.

Yes, I also used YouTube-dl to download the COIN dataset and only around 10500 videos are available.

jchsun1 · 2024-09-10T03:17:24Z

Thanks for your excellent work.
I have encountered some problems and hope to get your help.

There are two compression method (based on frame-level and token-level) in you paper, how is frame-based compression implemented? How to calculate the similarity of two frames containing multiple tokens? By calculate the average of similarity all tokens for adjacent frames?
Temporal ordering information is injected into the frame-level features by a position embedding layer in the paper, but I found it does not taken effect because the weights are set to 0.
blip2_vicuna_instruct.py - line 113 and 114

boheumd · 2024-09-12T01:10:15Z

Thanks for your excellent work. I have encountered some problems and hope to get your help.

There are two compression method (based on frame-level and token-level) in you paper, how is frame-based compression implemented? How to calculate the similarity of two frames containing multiple tokens? By calculate the average of similarity all tokens for adjacent frames?

Temporal ordering information is injected into the frame-level features by a position embedding layer in the paper, but I found it does not taken effect because the weights are set to 0.
blip2_vicuna_instruct.py - line 113 and 114

The frame-based compression is not included in the published code. The idea is to first flatten multiple tokens' features into one dimension and compute the cosine similarity between two frames.
The temporal embedding weights are first initialized to 0. It will get updated through model training.

jchsun1 · 2024-09-12T02:17:08Z

Thanks for your reply. I run the model successfully，but the temporal embedding weights do not work. Could you tell me how can I get the pre-trained temporal embedding weights.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to download raw videos from LVU dataset？ #31

How to download raw videos from LVU dataset？ #31

huaiyi66 commented Aug 8, 2024

YingYellow commented Aug 8, 2024

boheumd commented Aug 10, 2024

YingYellow commented Aug 10, 2024

boheumd commented Aug 11, 2024

jchsun1 commented Sep 10, 2024

boheumd commented Sep 12, 2024

jchsun1 commented Sep 12, 2024

How to download raw videos from LVU dataset？ #31

How to download raw videos from LVU dataset？ #31

Comments

huaiyi66 commented Aug 8, 2024

YingYellow commented Aug 8, 2024

boheumd commented Aug 10, 2024

YingYellow commented Aug 10, 2024

boheumd commented Aug 11, 2024

jchsun1 commented Sep 10, 2024

boheumd commented Sep 12, 2024

jchsun1 commented Sep 12, 2024