Name	Name	Last commit message	Last commit date
parent directory ..
alphaction	alphaction
data	data
figs	figs
.DS_Store	.DS_Store
FINETUNE.md	FINETUNE.md
README.md	README.md
akeval.sh	akeval.sh
akfinetune.sh	akfinetune.sh
avaeval.sh	avaeval.sh
avalarge.sh	avalarge.sh
avalarge2.sh	avalarge2.sh
avatest.sh	avatest.sh
avatrain.sh	avatrain.sh
datasets.py	datasets.py
engine_for_finetuning.py	engine_for_finetuning.py
engine_for_pretraining.py	engine_for_pretraining.py
functional.py	functional.py
kinetics.py	kinetics.py
masking_generator.py	masking_generator.py
modeling_finetune.py	modeling_finetune.py
modeling_pretrain.py	modeling_pretrain.py
optim_factory.py	optim_factory.py
rand_augment.py	rand_augment.py
random_erasing.py	random_erasing.py
run_class_finetuning.py	run_class_finetuning.py
run_mae_pretraining.py	run_mae_pretraining.py
run_videomae_vis.py	run_videomae_vis.py
start.sh	start.sh
train.sh	train.sh
transforms.py	transforms.py
utils.py	utils.py
v100_config.json	v100_config.json
video_transforms.py	video_transforms.py
vis.sh	vis.sh
volume_transforms.py	volume_transforms.py

VideoMAE Installation

Environment Configuration

The codebase is mainly built with following libraries:

Python 3.6 or higher
PyTorch and torchvision.
We can successfully reproduce the main results in two settings:
Tesla A100 (40G): CUDA 11.1 + PyTorch 1.8.0 + torchvision 0.9.0 Tesla V100 (32G): CUDA 10.1 + PyTorch 1.6.0 + torchvision 0.7.0
The torch version here has a great impact on the results. It is recommended to configure the environment according to such settings or a newer version.
timm==0.4.8/0.4.12
deepspeed==0.5.8
TensorboardX
decord
einops
av
tqdm

We recommend to setup the environment with Anaconda, the step-by-step installation script is shown below.

conda create -n VideoMAE_ava python=3.7
conda activate VideoMAE_ava

#install pytorch with the same cuda version as in your environment
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113

conda install av -c conda-forge
conda install cython

Data Preparation

AVA2.2

The code combines VideoMAE and Alphaction, and the preparation of AVA data refers to the data preparation of Alphaction. If you only need to train and test on the AVA dataset, you do not need to prepare the Kinetics dataset.

Kinetics

Other_files

video_map.npy : Mapping of video id and corresponding video path

ak_val_gt.csv : The ground truth of the val-set of ava-kinetics

AVA-Kinetics file

In order to facilitate everyone to download together, we have organized the annotation files we used, which are available for download in OneDrive. It should be noted that the files we use may be slightly different from the officially provided files, especially for kinetics, the annotations we use The version may be older, and some videos may be different from what you are downloading now.

Train

Here is a script that uses the ava-kinetics dataset for training and eval on the ava dataset

MODEL_PATH='YOUR_PATH/PRETRAIN_MODEL.pth'
OUTPUT_DIR='YOUR_PATH/OUTPUT_DIR'
python -m torch.distributed.launch --nproc_per_node=8 \
      --master_port 12320 --nnodes=8 \
      --node_rank=0 --master_addr=$ip_node_0 \
      run_class_finetuning.py \
      --model vit_large_patch16_224 \
      --finetune ${MODEL_PATH} \
      --log_dir ${OUTPUT_DIR} \
      --output_dir ${OUTPUT_DIR} \
      --batch_size 8 \
      --update_freq 1 \
      --num_sample 1 \
      --input_size 224 \
      --save_ckpt_freq 1 \
      --num_frames 16 \
      --sampling_rate 4 \
      --opt adamw \
      --lr 0.00025 \
      --opt_betas 0.9 0.999 \
      --weight_decay 0.05 \
      --epochs 30 \
      --data_set "ava-kinetics" \
      --enable_deepspeed \
      --val_freq 30 \
      --drop_path 0.2\

SLURM ENV

MODEL_PATH='YOUR_PATH/PRETRAIN_MODEL.pth'
OUTPUT_DIR='YOUR_PATH/OUTPUT_DIR'
PARTITION=${PARTITION:-"video"}
GPUS=${GPUS:-32}
GPUS_PER_NODE=${GPUS_PER_NODE:-8}
CPUS_PER_TASK=${CPUS_PER_TASK:-12}
SRUN_ARGS=${SRUN_ARGS:-""}
PY_ARGS=${@:2}
srun -p video \
     --gres=gpu:${GPUS_PER_NODE} \
     --ntasks=${GPUS} \
     --ntasks-per-node=${GPUS_PER_NODE} \
     --cpus-per-task=${CPUS_PER_TASK} \
     ${SRUN_ARGS} \
     python -u run_class_finetuning.py \
      --model vit_large_patch16_224 \
      --finetune ${MODEL_PATH} \
      --log_dir ${OUTPUT_DIR} \
      --output_dir ${OUTPUT_DIR} \
      --batch_size 8 \
      --update_freq 1 \
      --num_sample 1 \
      --input_size 224 \
      --save_ckpt_freq 1 \
      --num_frames 16 \
      --sampling_rate 4 \
      --opt adamw \
      --lr 0.00025 \
      --opt_betas 0.9 0.999 \
      --weight_decay 0.05 \
      --epochs 30 \
      --data_set "ava" \
      --enable_deepspeed \
      --val_freq 30 \
      --drop_path 0.2\
      ${PY_ARGS}

Eval

SLURM ENV

DATA_PATH='YOUR_PATH/list_kinetics-400'   #it can be any string in our task
MODEL_PATH='YOUR_PATH/PRETRAIN_MODEL.pth'
OUTPUT_DIR='YOUR_PATH/OUTPUT_DIR'
PARTITION=${PARTITION:-"video"}
GPUS=${GPUS:-32}
GPUS_PER_NODE=${GPUS_PER_NODE:-8}
CPUS_PER_TASK=${CPUS_PER_TASK:-12}
SRUN_ARGS=${SRUN_ARGS:-""}
PY_ARGS=${@:2}
srun -p video \
     --gres=gpu:${GPUS_PER_NODE} \
     --ntasks=${GPUS} \
     --ntasks-per-node=${GPUS_PER_NODE} \
     --cpus-per-task=${CPUS_PER_TASK} \
     ${SRUN_ARGS} \
     python -u run_class_finetuning.py \
      --model vit_large_patch16_224 \
      --data_path ${DATA_PATH} \
      --finetune ${MODEL_PATH} \
      --log_dir ${OUTPUT_DIR} \
      --output_dir ${OUTPUT_DIR} \
      --batch_size 4 \
      --update_freq 1 \
      --num_sample 1 \
      --input_size 224 \
      --save_ckpt_freq 1 \
      --num_frames 16 \
      --sampling_rate 4 \
      --opt adamw \
      --lr 0.00025 \
      --opt_betas 0.9 0.999 \
      --weight_decay 0.05 \
      --epochs 30 \
      --data_set "ava" \
      --enable_deepspeed \
      --val_freq 30 \
      --drop_path 0.2\
      --eval \
      ${PY_ARGS}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spatial-Temporal-Action-Localization

Spatial-Temporal-Action-Localization

README.md

VideoMAE Installation

Environment Configuration

Data Preparation

AVA2.2

Kinetics

Other_files

AVA-Kinetics file

Train

Eval

Files

Spatial-Temporal-Action-Localization

Directory actions

More options

Directory actions

More options

Latest commit

History

Spatial-Temporal-Action-Localization

Folders and files

parent directory

README.md

VideoMAE Installation

Environment Configuration

Data Preparation

AVA2.2

Kinetics

Other_files

AVA-Kinetics file

Train

Eval