MCL: Multi-view Enhanced Contrastive Learning for Chest X-ray Report Generation
Radiology reports are crucial for planning treatment strategies and enhancing doctor-patient communication, yet manually writing these reports is burdensome for radiologists. While automatic report generation offers a solution, existing methods often rely on single-view radiographs, limiting diagnostic accuracy. To address this problem, we propose MCL, a Multi-view enhanced Contrastive Learning method for chest X-ray report generation. Specifically, we first introduce multi-view enhanced contrastive learning for visual representation by maximizing the agreement between multi-view radiographs and their corresponding report. Subsequently, to fully exploit patient-specific indications (e.g., patient's symptoms) for report generation, we add a transitional ``bridge" for missing indications to reduce embedding space discrepancies caused by their presence or absence. Additionally, we construct Multi-view CXR and Two-view CXR datasets from public sources to support research on multi-view report generation. Our proposed MCL surpasses recent state-of-the-art methods across multiple datasets, achieving a 5.0%
- The code, checkpoints, and generated radiology reports are coming soon.
Multi-view CXR aggregates studies with multiple views from both MIMIC-CXR [1] and IU X-ray [2].
- Regarding radiographs, they can be obtained from physionet and NIH. The file structure for storing these images can be represented as:
files/
├── p10
├── p11
├── p12
├── p13
├── p14
├── p15
├── p16
├── p17
├── p18
├── p19
└── NLMCXR_png
- As for radiology reports, they can be downloaded in huggingface 🤗.
Two-view CXR is a variant of Multi-view CXR that includes only two views per study. The dataset can be downloaded in huggingface 🤗.
```python
# obtain all studies of Multi-view CXR
import json
path = 'multiview_cxr_annotation.json'
multi_view_cxr_data = json.load(open(path))
# obtain all studies of Two-view CXR
ann_data = json.load(open(path))
two_view_cxr_data = {}
for key, value in ann_data.items():
two_view_cxr_data[key] = []
for item in ann_data:
## current image_num
image_num = len(item['anchor_scan']['image_path']) + len(item['auxiliary_references']['image_path'])
if image_num != 2:
two_view_cxr_data[key].append(item)
```
Statistics for the training, validation, and test sets across MIMIC-CXR, MIMIC-ABN, Multi-view CXR, and Two-view CXR.
If you use or extend our work, please cite our paper at arXiv.
@misc{liu2024mclmultiviewenhancedcontrastive,
title={MCL: Multi-view Enhanced Contrastive Learning for Chest X-ray Report Generation},
author={Kang Liu and Zhuoqi Ma and Kun Xie and Zhicheng Jiao and Qiguang Miao},
year={2024},
eprint={2411.10224},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.10224},
}
- R2Gen Some codes are adapted based on R2Gen.
- R2GenCMN Some codes are adapted based on R2GenCMN.
- MGCA Some codes are adapted based on MGCA.
[1] Johnson, Alistair EW, et al. "MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs." arXiv preprint arXiv:1901.07042 (2019).
[2] Demner-Fushman, Dina, et al. "Preparing a collection of radiology examinations for distribution and retrieval." Journal of the American Medical Informatics Association 23.2 (2016): 304-310.
[3] Ni, Jianmo, et al. "Learning Visual-Semantic Embeddings for Reporting Abnormal Findings on Chest X-rays." Findings of the Association for Computational Linguistics: EMNLP 2020. 2020.
[4] Chen, Zhihong, et al. "Generating Radiology Reports via Memory-driven Transformer." Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020.
[5] Chen, Zhihong, et al. "Cross-modal Memory Networks for Radiology Report Generation." Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.
[6] Wang, Fuying, et al. "Multi-granularity cross-modal alignment for generalized medical visual representation learning." Advances in Neural Information Processing Systems 35 (2022): 33536-33549.