This file documents a collection of models reported in our paper. Our experiments are trained on a DGX machine with 8 32G V100 GPUs. Most of our models use 4 GPUs.
The "Name" column contains a link to the config file. To train a model, run
python train_net.py --num-gpus 4 --config-file /path/to/config/name.yaml
To evaluate a model with a trained/ pretrained model, run
python train_net.py --config-file /path/to/config/name.yaml --eval-only MODEL.WEIGHTS /path/to/weight.pth
Name | MOTA | IDF1 | HOTA | DetA | AssA | Download |
---|---|---|---|---|---|---|
GTR_MOT_FPN | 71.3 | 75.9 | 63.0 | 60.4 | 66.2 | model |
GTR_MOT_FPN (local) | 71.1 | 74.2 | 62.1 | 60.2 | 64.4 | same as above |
Name | MOTA | IDF1 | HOTA | DetA | AssA | Download |
---|---|---|---|---|---|---|
GTR_MOTFull_FPN | 75.3 | 71.5 | 59.1 | 61.6 | 57.0 | model |
- The validation set follows the half-half training set split from CenterTrack.
- All models are finetuned from a detection-only model trained on Crowdhuman (config, model). Download or train the model and place it as
GTR_ROOT/models/CH_FPN_1x.pth
before training. Training the detection-only models takes ~12 hours on 4 GPUs. - Training GTR takes ~3 hours on 4 V100 GPUs (32G memory).
GTR_MOT_FPN
is our model with a temporal-window size of 32. It needs more than 12G GPU memory in testing. To change the temporal-window size, appendINPUT.VIDEO.TEST_LEN 16
to the command.GTR_MOT_FPN (local)
is our local tracker baseline, which applies FairMOT to our detections and features. To run it, appendVIDEO_TEST.LOCAL_TRACK True
to the command.
Name | validation mAP | Test mAP | Download |
---|---|---|---|
GTR_TAO_DR2101 | 22.5 | 20.1 | model |
- The model is evaluated on TAO keyframes only, which are sampled in ~1 frame-per-second.
- Our model is trained on LVIS+COCO only. The TAO training set is not used anywhere.
- Our model is finetuned on a detection-only CenterNet2 model trained on LVIS+COCO (config, model). Download or train the model and place it as
GTR_ROOT/models/C2_LVISCOCO_DR2101_4x.pth
before training. Training the detection-only models takes ~3 days on 8 GPUs. - Training GTR takes ~13 hours on 4 V100 GPUs (32G memory).