Skip to content
/ FASTer Public

[CVPR2025] FASTer: Focal Token Acquiring-and-Scaling Transformer for Long-term 3D Object Detection

License

Notifications You must be signed in to change notification settings

MSunDYY/FASTer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection (CVPR2025) paper

Authors: Chenxu Dang, Zaipeng Duan, Pei An, Xinmin Zhang, Jie Ma.

Introduction

Recent top-performing temporal 3D detectors based on Lidars have increasingly adopted region-based paradigms. They first generate coarse proposals, followed by encoding and fusing regional features. However, indiscriminate sampling and fusion often overlook the varying contributions of individual points and lead to exponentially increased complexity as the number of input frames grows. Moreover, arbitrary result-level concatenation limits the global information extraction. In this paper, we propose a Focal Token Acquring-and-Scaling Transformer (FASTer), which dynamically selects focal tokens and condenses token sequences in an adaptive and lightweight manner. Emphasizing the contribution of individual tokens, we propose a simple but effective Adaptive Scaling mechanism to capture geometric contexts while sifting out focal points. Adaptively storing and processing only focal points in historical frames dramatically reduces the overall complexity. Furthermore, a novel Grouped Hierarchical Fusion strategy is proposed, progressively performing sequence scaling and Intra-Group Fusion operations to facilitate the exchange of global spatial and temporal information. Experiments on the Waymo Open Dataset demonstrate that our FASTer significantly outperforms other state-of-the-art detectors in both performance and efficiency while also exhibiting improved flexibility and robustness.

Setup

This project is built on OpenPCDet. Please follow their instruction to construct the conda environment and compare the Waymo dataset.

Training

The training of FASTer is similar to MSF, first train the RPN model

bash scripts/dist_train.sh ${NUM_GPUS} --cfg_file cfgs/waymo_models/centerpoint_4frames.yaml

The ckpt will be saved in ../output/waymo_models/centerpoint_4frames/default/ckpt. Then Save the RPN model's prediction results of training and val dataset

# training
bash scripts/dist_test.sh ${NUM_GPUS}  --cfg_file cfgs/waymo_models/centerpoint_4frames.yaml \
--ckpt ../output/waymo_models/centerpoint_4frames/default/ckpt/checkpoint_epoch_36.pth \
--set DATA_CONFIG.DATA_SPLIT.test train
# val
bash scripts/dist_test.sh ${NUM_GPUS}  --cfg_file cfgs/waymo_models/centerpoint_4frames.yaml \
--ckpt ../output/waymo_models/centerpoint_4frames/default/ckpt/checkpoint_epoch_36.pth \
--set DATA_CONFIG.DATA_SPLIT.test val

The prediction results of train and val dataset will be saved in
../output/waymo_models/centerpoint_4frames/default/eval/epoch_36/train/default/result.pkl, ../output/waymo_models/centerpoint_4frames/default/eval/epoch_36/val/default/result.pkl.

After that, train FASTer with multi-gpus:

bash scripts/dist_train.sh ${NUM_GPUS} --cfg_file cfgs/waymo_models/faster_16frames.yaml --batch_size 2 

or single one gpu (advised):

python train.py --cfg_file cfgs/waymo_models/faster_16frames.yaml --batch_size 5

To facilitate training and reproducing, we have only released the version without EPA, which results in a performance drop (≈0.25) compared to the results reported in our paper. However, this is sufficient to demonstrate the effectiveness of FASTer.

Evaluation

# Single GPU for online reference
python test.py --cfg_file cfgs/waymo_models/faster_16frames.yaml  --batch_size  1 \
--ckpt  ../output/waymo_models/faster_4frames/default/ckpt/checkpoint_epoch_6.pth
# We do not support multi-GPU inference for now.

Acknowlegment

Our codes and inspirations are mainly from MPPNet and MSF. We sincerely appreciate their contributions!

Citation

If you recognize our work or find it inspiring, please cite our paper.

@article{dang2025faster,
  title={FASTer: Focal Token Acquiring-and-Scaling Transformer for Long-term 3D Object Detection},
  author={Dang, Chenxu and Duan, Zaipeng and An, Pei and Zhang, Xinmin and Hu, Xuzhong and Ma, Jie},
  journal={arXiv preprint arXiv:2503.01899},
  year={2025}
}

About

[CVPR2025] FASTer: Focal Token Acquiring-and-Scaling Transformer for Long-term 3D Object Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published