Skip to content

Latest commit

 

History

History
123 lines (89 loc) · 4.29 KB

README.md

File metadata and controls

123 lines (89 loc) · 4.29 KB

Preparing HVU

Introduction

@article{Diba2019LargeSH,
  title={Large Scale Holistic Video Understanding},
  author={Ali Diba and M. Fayyaz and Vivek Sharma and Manohar Paluri and Jurgen Gall and R. Stiefelhagen and L. Gool},
  journal={arXiv: Computer Vision and Pattern Recognition},
  year={2019}
}

For basic dataset information, please refer to the official project and the paper. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/hvu/.

Step 1. Prepare Annotations

First of all, you can run the following script to prepare annotations.

bash download_annotations.sh

Besides, you need to run the following command to parse the tag list of HVU.

python parse_tag_list.py

Step 2. Prepare Videos

Then, you can run the following script to prepare videos. The codes are adapted from the official crawler. Note that this might take a long time.

bash download_videos.sh

Step 3. Extract RGB and Flow

This part is optional if you only want to use the video loader.

Before extracting, please refer to install.md for installing denseflow.

You can use the following script to extract both RGB and Flow frames.

bash extract_frames.sh

By default, we generate frames with short edge resized to 256. More details can be found in data_preparation

Step 4. Generate File List

You can run the follow scripts to generate file list in the format of videos and rawframes, respectively.

bash generate_videos_filelist.sh
# execute the command below when rawframes are ready
bash generate_rawframes_filelist.sh

Step 5. Generate File List for Each Individual Tag Categories

This part is optional if you don't want to train models on HVU for a specific tag category.

The file list generated in step 4 contains labels of different categories. These file lists can only be handled with HVUDataset and used for multi-task learning of different tag categories. The component LoadHVULabel is needed to load the multi-category tags, and the HVULoss should be used to train the model.

If you only want to train video recognition models for a specific tag category, i.e. you want to train a recognition model on HVU which only handles tags in the category action, we recommend you to use the following command to generate file lists for the specific tag category. The new list, which only contains tags of a specific category, can be handled with VideoDataset or RawframeDataset. The recognition models can be trained with BCELossWithLogits.

The following command generates file list for the tag category ${category}, note that the tag category you specified should be in the 6 tag categories available in HVU: ['action', 'attribute', 'concept', 'event', 'object', 'scene'].

python generate_sub_file_list.py path/to/filelist.json ${category}

The filename of the generated file list for ${category} is generated by replacing hvu in the original filename with hvu_${category}. For example, if the original filename is hvu_train.json, the filename of the file list for action is hvu_action_train.json.

Step 6. Folder Structure

After the whole data pipeline for HVU preparation. you can get the rawframes (RGB + Flow), videos and annotation files for HVU.

In the context of the whole project (for HVU only), the full folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── hvu
│   │   ├── hvu_train_video.json
│   │   ├── hvu_val_video.json
│   │   ├── hvu_train.json
│   │   ├── hvu_val.json
│   │   ├── annotations
│   │   ├── videos_train
│   │   │   ├── OLpWTpTC4P8_000570_000670.mp4
│   │   │   ├── xsPKW4tZZBc_002330_002430.mp4
│   │   │   ├── ...
│   │   ├── videos_val
│   │   ├── rawframes_train
│   │   ├── rawframes_val

For training and evaluating on HVU, please refer to getting_started.