Skip to content

Efficient Vision Transformers for Segmentation

Girish-Anadv-07 edited this page Feb 26, 2024 · 1 revision

Proposal Abstract

Vision transformers can provide demonstrable improvements over CNN-based models; however, as the resulting models are often still complex, and existing vision-transformer applications to neuroimaging utilize a U-Net style architecture. Our previous work with MeshNet suggests that Vision Transformers can be used more efficiently by for example using dilations, or other tricks utilized by MeshNet.

About the Project

Umbrella Project

  • NeuroNeural

Emphasis:

  • ML Theory/Data Science

Expected Background

  • Graduate or exceptional undergrad. Experience with transformers and CNNs in application is highly recommended.

Primary Point of Contact

Supervisor

References and External Resources

  • Catalyst: https://github.com/catalyst-team/catalyst
  • Catalyst Neuro: https://github.com/catalyst-team/neuro
  • Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).
  • Hatamizadeh, Ali, et al. "Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images." arXiv preprint arXiv:2201.01266 (2022).
  • Wang, Dayang, Zhan Wu, and Hengyong Yu. "TED-net: Convolution-free T2T Vision Transformer-based Encoder-decoder Dilation network for Low-dose CT Denoising." International Workshop on Machine Learning in Medical Imaging. Springer, Cham, 2021.
  • Wang, Dayang, et al. "CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising." arXiv preprint arXiv:2202.13517 (2022).

Estimated Timelines

  • Semester or longer

Possible Deliverables

  • Set of Experiments with plots demonstrating the effectiveness of the model on HCP data set and others
  • 2-4 page report summarizing primary methodology and results.
  • Submission to Machine Learning Conference (e.g. MLSP)