SARChat

Introduction

SARChat-Bench-2M is the first large-scale multimodal dialogue dataset focusing on Synthetic Aperture Radar (SAR) imagery. It contains approximately 2 million high-quality SAR image-text pairs, supporting multiple tasks including scene classification, image captioning, visual question answering, and object localization. We conducted comprehensive evaluations on 16 state-of-the-art vision-language models (including Qwen2VL, InternVL2.5, and LLaVA), establishing the first multi-task benchmark in the SAR domain.

📑 Read more about SARChat in our paper.

Overview & Model Performance

Figure 1: Overview of SARChat's architecture (left) and comprehensive evaluation results showing model capabilities across different tasks (right)

Data Processing Workflow

Figure 2: Data processing workflow of SARChat

Key Features

🌟 2M+ high-quality SAR image-text pairs
🔍 Covers diverse scenes including marine, terrestrial and urban areas
📊 6 task-specific benchmarks with fine-grained annotations
🤖 Evaluated on 11 SOTA vision-language models
🛠️ Ready-to-use format with shape, count, location labels

Dataset Statistics

Tasks Statistics

Figure 3: Distribution of tasks in training (left) and test (right) sets

Task	Train Set	Test Set
Classification	81,788	10,024
Fine-Grained Description	46,141	6,032
Instance Counting	95,493	11,704
Spatial Grounding	94,456	11,608
Cross-Modal Identification	1,423,548	175,565
Referring	95,486	11,703

Category Analysis

Figure 4: Category distribution in training (left) and test (right) sets

Words Statistics

Metric	Value
Total Words	43,978,559
Total Sentences	4,222,143
Average Caption Length	10.66

Quick Start

🤗 Visit our Hugging Face dataset page for more details and examples.

Results Showcase

Figure 6: Example results from SARChat-InternVL2.5-8B model on various SAR vision-language tasks

The above figure demonstrates the capabilities of our SARChat-InternVL2.5-8B model across different tasks. The model shows strong performance in understanding complex SAR imagery, providing detailed descriptions, accurate counting, and precise spatial reasoning. These results highlight the model's ability to bridge the gap between SAR imagery and natural language understanding.

SARChat Models

We have trained and evaluated several models using the SARChat dataset:

Organization	Model	Size	Link
InternVL	SARChat-InternVL2.5	1B	Link
InternVL	SARChat-InternVL2.5	2B	Link
InternVL	SARChat-InternVL2.5	4B	Link
InternVL	SARChat-InternVL2.5	8B	Link
QwenVL	SARChat-Qwen2VL	2B	Link
QwenVL	SARChat-Qwen2VL	7B	Link
DeepSeek	SARChat-DeepSeekVL	1.3B	Link
DeepSeek	SARChat-DeepSeekVL	7B	Link
mPLUG-Owl	SARChat-Owl3	1B	Link
mPLUG-Owl	SARChat-Owl3	2B	Link
mPLUG-Owl	SARChat-Owl3	7B	Link
Microsoft	SARChat-Phi3V	4.3B	Link
Zhipu AI	SARChat-GLM-Edge	2B	Link
Zhipu AI	SARChat-GLM-Edge	5B	Link
LLaVA-Team	SARChat-LLaVA-1.5	7B	Link
01.AI	SARChat-Yi-VL	6B	Link

Citation

If you use this dataset or our models in your research, please cite our paper.

@inproceedings{Ma2025SARChatBench2MAM,
  title={SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation},
  author={Zhiming Ma and Xiayang Xiao and Sihao Dong and Peidong Wang and HaiPeng Wang and Qingyun Pan},
  year={2025},
  url={https://api.semanticscholar.org/CorpusID:276287423}
}

Contact

For any questions or feedback, please contact:

📧 Email: [email protected]
💬 GitHub Issues: Feel free to open an issue in this repository

If you find SARChat useful, please consider giving it a star ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
assets		assets
eval		eval
infer		infer
train		train
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SARChat

Introduction

Overview & Model Performance

Data Processing Workflow

Key Features