Skip to content

The first large-scale multimodal dialogue dataset focusing on Synthetic Aperture Radar (SAR) imagery.

License

Notifications You must be signed in to change notification settings

JimmyMa99/SARChat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SARChat

SARChat Logo

License HF Dataset HF Model ModelScope Dataset ModelScope Model arXiv

Introduction

SARChat-Bench-2M is the first large-scale multimodal dialogue dataset focusing on Synthetic Aperture Radar (SAR) imagery. It contains approximately 2 million high-quality SAR image-text pairs, supporting multiple tasks including scene classification, image captioning, visual question answering, and object localization. We conducted comprehensive evaluations on 16 state-of-the-art vision-language models (including Qwen2VL, InternVL2.5, and LLaVA), establishing the first multi-task benchmark in the SAR domain.

📑 Read more about SARChat in our paper.

Overview & Model Performance

SARChat Tasks and Model Performance
Figure 1: Overview of SARChat's architecture (left) and comprehensive evaluation results showing model capabilities across different tasks (right)

Data Processing Workflow

SARChat Data Workflow
Figure 2: Data processing workflow of SARChat

Key Features

  • 🌟 2M+ high-quality SAR image-text pairs
  • 🔍 Covers diverse scenes including marine, terrestrial and urban areas
  • 📊 6 task-specific benchmarks with fine-grained annotations
  • 🤖 Evaluated on 11 SOTA vision-language models
  • 🛠️ Ready-to-use format with shape, count, location labels

Dataset Statistics

Tasks Statistics

Train Task Distribution Test Task Distribution
Figure 3: Distribution of tasks in training (left) and test (right) sets

Task Train Set Test Set
Classification 81,788 10,024
Fine-Grained Description 46,141 6,032
Instance Counting 95,493 11,704
Spatial Grounding 94,456 11,608
Cross-Modal Identification 1,423,548 175,565
Referring 95,486 11,703

Category Analysis

Train Categories Test Categories
Figure 4: Category distribution in training (left) and test (right) sets

Words Statistics

Metric Value
Total Words 43,978,559
Total Sentences 4,222,143
Average Caption Length 10.66

Quick Start

🤗 Visit our Hugging Face dataset page for more details and examples.

Results Showcase

SARChat Results
Figure 6: Example results from SARChat-InternVL2.5-8B model on various SAR vision-language tasks

The above figure demonstrates the capabilities of our SARChat-InternVL2.5-8B model across different tasks. The model shows strong performance in understanding complex SAR imagery, providing detailed descriptions, accurate counting, and precise spatial reasoning. These results highlight the model's ability to bridge the gap between SAR imagery and natural language understanding.

SARChat Models

We have trained and evaluated several models using the SARChat dataset:

Organization Model Size Link
InternVL SARChat-InternVL2.5 1B Link
InternVL SARChat-InternVL2.5 2B Link
InternVL SARChat-InternVL2.5 4B Link
InternVL SARChat-InternVL2.5 8B Link
QwenVL SARChat-Qwen2VL 2B Link
QwenVL SARChat-Qwen2VL 7B Link
DeepSeek SARChat-DeepSeekVL 1.3B Link
DeepSeek SARChat-DeepSeekVL 7B Link
mPLUG-Owl SARChat-Owl3 1B Link
mPLUG-Owl SARChat-Owl3 2B Link
mPLUG-Owl SARChat-Owl3 7B Link
Microsoft SARChat-Phi3V 4.3B Link
Zhipu AI SARChat-GLM-Edge 2B Link
Zhipu AI SARChat-GLM-Edge 5B Link
LLaVA-Team SARChat-LLaVA-1.5 7B Link
01.AI SARChat-Yi-VL 6B Link

Citation

If you use this dataset or our models in your research, please cite our paper.

@inproceedings{Ma2025SARChatBench2MAM,
  title={SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation},
  author={Zhiming Ma and Xiayang Xiao and Sihao Dong and Peidong Wang and HaiPeng Wang and Qingyun Pan},
  year={2025},
  url={https://api.semanticscholar.org/CorpusID:276287423}
}

Contact

For any questions or feedback, please contact:

  • 📧 Email: [email protected]
  • 💬 GitHub Issues: Feel free to open an issue in this repository

If you find SARChat useful, please consider giving it a star ⭐

About

The first large-scale multimodal dialogue dataset focusing on Synthetic Aperture Radar (SAR) imagery.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •