Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between the .json and .tar files in MovieChat-1K_train dataset #55

Open
LZHgrla opened this issue Apr 29, 2024 · 4 comments
Open

Comments

@LZHgrla
Copy link

LZHgrla commented Apr 29, 2024

Hi @Espere-1119-Song

I found some pairing issues between the JSON and TAR files in the MovieChat-1K_train dataset.

There are a total of 830 JSON files (json.txt) and 769 TAR files (tar.txt). They are mismatched.
I checked and found that there are 74 missing TAR files (tar_missing.txt) and 13 extra TAR files (tar_extra.txt).

Additionally, there seem to be issues with AWB-8.tar and earth9-2.tar files in HuggingFace hub, possibly due to the compression or upload failure. (AWB-8.tar is an extra TAR file and can be deleted directly, while earth9-2.tar should be considered for re-uploading)

@Espere-1119-Song
Copy link
Collaborator

Thanks for the reminder, I will resolve this issue as soon as possible.

@LZHgrla
Copy link
Author

LZHgrla commented May 7, 2024

Hi, @Espere-1119-Song

We found another two invalid tar file: movies/s01e08-1.tar (10.6 GB), movies/S01E2-4.tar (6.24 GB)

@Espere-1119-Song
Copy link
Collaborator

thanks, we are hurry to upload them

@Espere-1119-Song
Copy link
Collaborator

We upload the raw videos of the training set :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants