This is an official PyTorch implementation of "ViTexNet: A Multi-Modal Vision-Text Fusion Network for Chest X-Ray Image Segmentation"
Segmentation of lung regions in chest X-ray (CXR) images plays a crucial role in assisting doctors with the diagnosis of various lung diseases. However, challenges such as extreme lung shape variations and blurred lung boundaries caused by certain conditions often lead to inaccuracies in segmentation models. Existing deep learning methods predominantly rely on image-only information, and their effectiveness is often constrained by the limited availability of high-quality annotated data. To address this limitation, ViTexNet, a Vision-Text Fusion Network, is proposed. This model integrates visual features from chest X-rays with medical text annotations, such as lesion counts and specific locations, to enhance segmentation performance. Extensive experiments conducted on the QaTa-COV19 dataset indicate that the proposed approach surpasses other state-of-the-art segmentation models, achieving a Dice score of 87.73% and an mIoU of 78.14%.
The main dependencies are as follows:
einops
linformer
monai
pandas
pytorch_lightning
timm
torch
torchmetrics
torchvision
transformers
thop
or use the following:
pip install requirements.txt
-
The images and segmentation masks of QaTa-COV19 dataset are fetched from this link.
We used QaTa-COV19-v2 in this experiment.
-
The text annotations of QaTa-COV19 dataset are taken from this GitHub repo.
Thanks to Li et al. for their contributions. If you use this text annotations, please cite their work.
We have used Swin-Tiny-Patch4-Window7-224 (vision) and BiomedVLP-CXR-BERT-Specialized (text) in this experiment.
The models can be used as follows:
url = "microsoft/swin-tiny-patch4-window7-224"
tokenizer = AutoTokenizer.from_pretrained(url,trust_remote_code=True)
model = AutoModel.from_pretrained(url, trust_remote_code=True)
ViTexNet
├── data
├── test
| ├── images
| | ├── image_1.png
| | ├── image_2.png
| | ├── ...
│ ├── masks
| | ├── mask_image_1.png
| | ├── mask_image_2.png
| | ├── ...
├── train
| ├── images
| | ├── image_3.png
| | ├── image_4.png
| | ├── ...
│ ├── masks
| | ├── mask_image_3.png
| | ├── mask_image_4.png
| | ├── ...
├── test_annotations.csv
├── train_annotations.csv
├── train.py
├── test.py
├── ...
To train the model, execute: python train_net.py
To test the model, execute: python test_net.py
after training the model or using the learned weights given below.
The learned model weights are available below:
Dataset | Model | Download link |
---|---|---|
QaTa-COV19 | ViTexNet |
- Release entire code
- Release model weights
The work is inspired from LViT and Ariadne’s Thread. Thanks for the open source contributions!
If you find our work useful, please cite our paper: