@article{Diba2019LargeSH,
title={Large Scale Holistic Video Understanding},
author={Ali Diba and M. Fayyaz and Vivek Sharma and Manohar Paluri and Jurgen Gall and R. Stiefelhagen and L. Gool},
journal={arXiv: Computer Vision and Pattern Recognition},
year={2019}
}
For basic dataset information, please refer to the official project and the paper.
Before we start, please make sure that the directory is located at $MMACTION2/tools/data/hvu/
.
First of all, you can run the following script to prepare annotations.
bash download_annotations.sh
Besides, you need to run the following command to parse the tag list of HVU.
python parse_tag_list.py
Then, you can run the following script to prepare videos. The codes are adapted from the official crawler. Note that this might take a long time.
bash download_videos.sh
This part is optional if you only want to use the video loader.
Before extracting, please refer to install.md for installing denseflow.
You can use the following script to extract both RGB and Flow frames.
bash extract_frames.sh
By default, we generate frames with short edge resized to 256. More details can be found in data_preparation
You can run the follow scripts to generate file list in the format of videos and rawframes, respectively.
bash generate_videos_filelist.sh
# execute the command below when rawframes are ready
bash generate_rawframes_filelist.sh
This part is optional if you don't want to train models on HVU for a specific tag category.
The file list generated in step 4 contains labels of different categories. These file lists can only be
handled with HVUDataset and used for multi-task learning of different tag categories. The component
LoadHVULabel
is needed to load the multi-category tags, and the HVULoss
should be used to train
the model.
If you only want to train video recognition models for a specific tag category, i.e. you want to train
a recognition model on HVU which only handles tags in the category action
, we recommend you to use
the following command to generate file lists for the specific tag category. The new list, which only
contains tags of a specific category, can be handled with VideoDataset
or RawframeDataset
. The
recognition models can be trained with BCELossWithLogits
.
The following command generates file list for the tag category ${category}, note that the tag category you specified should be in the 6 tag categories available in HVU: ['action', 'attribute', 'concept', 'event', 'object', 'scene'].
python generate_sub_file_list.py path/to/filelist.json ${category}
The filename of the generated file list for ${category} is generated by replacing hvu
in the original
filename with hvu_${category}
. For example, if the original filename is hvu_train.json
, the filename
of the file list for action is hvu_action_train.json
.
After the whole data pipeline for HVU preparation. you can get the rawframes (RGB + Flow), videos and annotation files for HVU.
In the context of the whole project (for HVU only), the full folder structure will look like:
mmaction2
├── mmaction
├── tools
├── configs
├── data
│ ├── hvu
│ │ ├── hvu_train_video.json
│ │ ├── hvu_val_video.json
│ │ ├── hvu_train.json
│ │ ├── hvu_val.json
│ │ ├── annotations
│ │ ├── videos_train
│ │ │ ├── OLpWTpTC4P8_000570_000670.mp4
│ │ │ ├── xsPKW4tZZBc_002330_002430.mp4
│ │ │ ├── ...
│ │ ├── videos_val
│ │ ├── rawframes_train
│ │ ├── rawframes_val
For training and evaluating on HVU, please refer to getting_started.