Skip to content

Latest commit

 

History

History
12 lines (7 loc) · 2.13 KB

README.md

File metadata and controls

12 lines (7 loc) · 2.13 KB

incomplete_multimodal_fusion

Adaptive Multimodal Learning for Remote Sensing Data Fusion

In the realm of remote sensing data fusion, the efficacy of multimodal Transformer networks hinges on their adeptness at integrating diverse signals via self-attention mechanisms. Yet, prevailing methodologies assume access to all modalities throughout both training and inference phases, leading to notable performance degradation when confronted with modal-incomplete inputs in practical scenarios. To surmount this challenge, we present a pioneering approach to incomplete multimodal learning tailored specifically for remote sensing data fusion and the multimodal Transformer architecture.

Our method, adaptable to both supervised and self-supervised pre-training paradigms, introduces novel techniques to seamlessly incorporate incomplete modalities into the learning process. Central to our strategy is the utilization of learned fusion tokens, coupled with modality attention and masked self-attention mechanisms, to efficiently aggregate multimodal signals within the Transformer framework. Crucially, our approach accommodates random modality combinations during network training, enhancing adaptability and robustness.

Employing a fusion-centric training regimen bolstered by reconstruction and contrastive loss functions, our method not only facilitates effective fusion during pre-training but also excels in handling incomplete inputs during inference. Experimental validation on two diverse multimodal datasets underscores the efficacy of our approach, demonstrating state-of-the-art performance across tasks such as building instance/semantic segmentation and land-cover mapping. Notably, our method showcases remarkable resilience in scenarios where incomplete modalities are prevalent, further consolidating its utility in real-world applications of remote sensing data fusion.

By offering a comprehensive solution to the challenge of incomplete multimodal learning, our approach significantly advances the frontier of remote sensing data fusion, opening avenues for enhanced performance and adaptability in diverse operational contexts.

avatar