TokenBench.mp4
TokenBench is a comprehensive benchmark to standardize the evaluation for Cosmos-Tokenizer, which covers a wide variety of domains including robotic manipulation, driving, egocentric, and web videos. It consists of high-resolution, long-duration videos, and is designed to evaluate the performance of video tokenizers. We resort to existing video datasets that are commonly used for various tasks, including BDD100K, EgoExo-4D, BridgeData V2, and Panda-70M. This repo provides instructions on how to download and preprocess the videos for TokenBench.
- Download the datasets from the official websites:
- EgoExo4D: https://docs.ego-exo4d-data.org/
- BridgeData V2: https://rail-berkeley.github.io/bridgedata/
- Panda70M: https://snap-research.github.io/Panda-70M/
- BDD100K: http://bdd-data.berkeley.edu/
- Pick the videos as specified in the
video/list.txt
file. - Preprocess the videos using the script
video/preprocessing_script.py
.
Tokenizer | Compression Ratio (T x H x W) | Formulation | PSNR | SSIM | rFVD |
---|---|---|---|---|---|
CogVideoX | 4 × 8 × 8 | VAE | 33.149 | 0.908 | 6.970 |
OmniTokenizer | 4 × 8 × 8 | VAE | 29.705 | 0.830 | 35.867 |
Cosmos-CV | 4 × 8 × 8 | AE | 37.270 | 0.928 | 6.849 |
Cosmos-CV | 8 × 8 × 8 | AE | 36.856 | 0.917 | 11.624 |
Cosmos-CV | 8 × 16 × 16 | AE | 35.158 | 0.875 | 43.085 |
Tokenizer | Compression Ratio (T x H x W) | Quantization | PSNR | SSIM | rFVD |
---|---|---|---|---|---|
VideoGPT | 4 × 4 × 4 | VQ | 35.119 | 0.914 | 13.855 |
OmniTokenizer | 4 × 8 × 8 | VQ | 30.152 | 0.827 | 53.553 |
Cosmos-DV | 4 × 8 × 8 | FSQ | 35.137 | 0.887 | 19.672 |
Cosmos-DV | 8 × 8 × 8 | FSQ | 34.746 | 0.872 | 43.865 |
Cosmos-DV | 8 × 16 × 16 | FSQ | 33.718 | 0.828 | 113.481 |
Fitsum Reda, Jinwei Gu, Xian Liu, Songwei Ge, Ting-Chun Wang, Haoxiang Wang, Ming-Yu Liu