- python 3.7
- torch 1.11.0
- torchvision 0.12.0
- We uniformly sample 4/8/16 frames for
num_segments_L
,num_segments_M
andnum_segments_H
during training, and usenum_segments_H
to specify the number of frames during inference. - We enable Any-Frame-Inference for 2D network so that the model can be evaluated at frames which are not used in training.
- We use 1-clip 1-crop evaluation for 2D network with the resolution of 224x224.
-
lambda_act
denotes the coefficient$\lambda$ in the loss function and we set it as 1 without further fine-tuning the hyperparameter. - We train 2D network TSM, TEA with 2 NVIDIA Tesla V100 (32GB) cards and the model is pretrained on ImageNet.
-
Specify the directory of datasets with
ROOT_DATASET
inops/dataset_config.py
. -
Simply run the training scripts in exp as followed:
bash exp/tsm_sthv1/run.sh ## baseline training bash exp/tsm_sthv1_FFN/run.sh ## FFN training
-
Specify the directory of datasets with
ROOT_DATASET
inops/dataset_config.py
. -
Please download pretrained models from Google Drive.
-
Specify the directory of the pretrained model with
resume
intest.sh
. -
Run the inference scripts in exp as followed:
bash exp/tsm_sthv1/test.sh ## baseline inference bash exp/tsm_sthv1_FFN/test.sh ## FFN inference