Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer

This repository hosts our PyTorch implementation of 3D-Jointsformer, a novel approach for real-time hand gesture recognition in video sequences. Traditional methods struggle with managing temporal dependencies while maintaining real-time performance. To address this, we propose a hybrid approach combining 3D-CNNs and Transformers. Our method utilizes a 3D-CNN to compute high-level semantic skeleton embeddings, capturing local spatial and temporal characteristics. A Transformer network with self-attention then efficiently captures long-range temporal dependencies. Evaluation of the Briareo and Multimodal Hand Gesture datasets yielded accuracy scores of 95.49% and 97.25%. Importantly, our approach achieves real-time performance on standard CPUs, distinguishing it from GPU-dependent methods. The hybrid 3D-CNN and Transformer approach outperforms existing methods in both accuracy and speed, effectively addressing real-time recognition challenges.

Installation

conda create -n 3DJointsformer python=3.9 -y
conda activate 3DJointsformer
conda install pytorch=1.11.0 torchvision=0.12.0 cudatoolkit=11.3 -c pytorch -y
pip install 'mmcv-full==1.5.0' -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11.0/index.html
pip install mmaction2  # tested mmaction2 v0.24.0

Data Preparation

In this work we have tested the proposed model on two datasets : the Briareo and Multi-Modal Hand Gesture Dataset . The hand keypoints are obtained by Mediapipe, we have also included code to generate these hand keypoints ( see data_preprocessing ).

Train

You can use the following command to train a model.

./tools/run.sh ${CONFIG_FILE} ${GPU_IDS} ${SEED}

Example: train the model on the joint data of Briareo dataset using 2 GPUs with seed 0.

./tools/run.sh configs/transformer/jointsformer3d_briareo.py 0,1 0

Test

You can use the following command to test a model.

python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]

Example: inference on the joint data of Briareo dataset.

python tools/test.py configs/transformer/jointsformer3d_briareo.py \
    work_dirs/jointsformer3d/best_top1_acc_epoch_475.pth \
    --eval top_k_accuracy --cfg-options "gpu_ids=[0]"

Bibtex

If this project is useful for you, please consider citing our paper.

@Article{s23167066,
AUTHOR = {Zhong, Enmin and del-Blanco, Carlos R. and Berjón, Daniel and Jaureguizar, Fernando and García, Narciso},
TITLE = {Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer},
JOURNAL = {Sensors},
VOLUME = {23},
YEAR = {2023},
NUMBER = {16},
ARTICLE-NUMBER = {7066},
URL = {https://www.mdpi.com/1424-8220/23/16/7066},
PubMedID = {37631602},
ISSN = {1424-8220},
DOI = {10.3390/s23167066}
}

Acknowledgements

Our code is based on SkelAct , MMAction2 , SlowFast Sincere thanks to their wonderful works.

License

This project is released under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
3DResNet		3DResNet
configs		configs
data_preprocessing		data_preprocessing
source		source
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
checkpoint		checkpoint
environment.yml		environment.yml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer

Installation

Data Preparation

Train

Test

Bibtex

Acknowledgements

License

About

Releases

Packages

Languages

License

Enminxo/3D-Jointsformer

Folders and files

Latest commit

History

Repository files navigation

Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer

Installation

Data Preparation

Train

Test

Bibtex

Acknowledgements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages