Title | Date | Abstract | Comment | CodeRepository |
---|---|---|---|---|
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis | 2025-02-13 | ShowThe continuous operation of Earth-orbiting satellites generates vast and ever-growing archives of Remote Sensing (RS) images. Natural language presents an intuitive interface for accessing, querying, and interpreting the data from such archives. However, existing Vision-Language Models (VLMs) are predominantly trained on web-scraped, noisy image-text data, exhibiting limited exposure to the specialized domain of RS. This deficiency results in poor performance on RS-specific tasks, as commonly used datasets often lack detailed, scientifically accurate textual descriptions and instead emphasize solely on attributes like date and location. To bridge this critical gap, we introduce GAIA, a novel dataset designed for multi-scale, multi-sensor, and multi-modal RS image analysis. GAIA comprises of 205,150 meticulously curated RS image-text pairs, representing a diverse range of RS modalities associated to different spatial resolutions. Unlike existing vision-language datasets in RS, GAIA specifically focuses on capturing a diverse range of RS applications, providing unique information about environmental changes, natural disasters, and various other dynamic phenomena. The dataset provides a spatially and temporally balanced distribution, spanning across the globe, covering the last 25 years with a balanced temporal distribution of observations. GAIA's construction involved a two-stage process: (1) targeted web-scraping of images and accompanying text from reputable RS-related sources, and (2) generation of five high-quality, scientifically grounded synthetic captions for each image using carefully crafted prompts that leverage the advanced vision-language capabilities of GPT-4o. Our extensive experiments, including fine-tuning of CLIP and BLIP2 models, demonstrate that GAIA significantly improves performance on RS image classification, cross-modal retrieval and image captioning tasks. |
22 pages, 13 figures | None |
Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation | 2025-02-06 | ShowSemantic segmentation is one of the core tasks in the field of computer vision, and its goal is to accurately classify each pixel in an image. The traditional Unet model achieves efficient feature extraction and fusion through an encoder-decoder structure, but it still has certain limitations when dealing with complex backgrounds, long-distance dependencies, and multi-scale targets. To this end, this paper proposes an improved Unet model combined with an attention mechanism, introduces channel attention and spatial attention modules, enhances the model's ability to focus on important features, and optimizes skip connections through a multi-scale feature fusion strategy, thereby improving the combination of global semantic information and fine-grained features. The experiment is based on the Cityscapes dataset and compared with classic models such as FCN, SegNet, DeepLabv3+, and PSPNet. The improved model performs well in terms of mIoU and pixel accuracy (PA), reaching 76.5% and 95.3% respectively. The experimental results verify the superiority of this method in dealing with complex scenes and blurred target boundaries. In addition, this paper discusses the potential of the improved model in practical applications and future expansion directions, indicating that it has broad application value in fields such as autonomous driving, remote sensing image analysis, and medical image processing. |
None | |
reBEN: Refined BigEarthNet Dataset for Remote Sensing Image Analysis | 2025-01-16 | ShowThis paper presents refined BigEarthNet (reBEN) that is a large-scale, multi-modal remote sensing dataset constructed to support deep learning (DL) studies for remote sensing image analysis. The reBEN dataset consists of 549,488 pairs of Sentinel-1 and Sentinel-2 image patches. To construct reBEN, we initially consider the Sentinel-1 and Sentinel-2 tiles used to construct the BigEarthNet dataset and then divide them into patches of size 1200 m x 1200 m. We apply atmospheric correction to the Sentinel-2 patches using the latest version of the sen2cor tool, resulting in higher-quality patches compared to those present in BigEarthNet. Each patch is then associated with a pixel-level reference map and scene-level multi-labels. This makes reBEN suitable for pixel- and scene-based learning tasks. The labels are derived from the most recent CORINE Land Cover (CLC) map of 2018 by utilizing the 19-class nomenclature as in BigEarthNet. The use of the most recent CLC map results in overcoming the label noise present in BigEarthNet. Furthermore, we introduce a new geographical-based split assignment algorithm that significantly reduces the spatial correlation among the train, validation, and test sets with respect to those present in BigEarthNet. This increases the reliability of the evaluation of DL models. To minimize the DL model training time, we introduce software tools that convert the reBEN dataset into a DL-optimized data format. In our experiments, we show the potential of reBEN for multi-modal multi-label image classification problems by considering several state-of-the-art DL models. The pre-trained model weights, associated code, and complete dataset are available at https://bigearth.net. |
None | |
Patch-GAN Transfer Learning with Reconstructive Models for Cloud Removal | 2025-01-09 | ShowCloud removal plays a crucial role in enhancing remote sensing image analysis, yet accurately reconstructing cloud-obscured regions remains a significant challenge. Recent advancements in generative models have made the generation of realistic images increasingly accessible, offering new opportunities for this task. Given the conceptual alignment between image generation and cloud removal tasks, generative models present a promising approach for addressing cloud removal in remote sensing. In this work, we propose a deep transfer learning approach built on a generative adversarial network (GAN) framework to explore the potential of the novel masked autoencoder (MAE) image reconstruction model in cloud removal. Due to the complexity of remote sensing imagery, we further propose using a patch-wise discriminator to determine whether each patch of the image is real or not. The proposed reconstructive transfer learning approach demonstrates significant improvements in cloud removal performance compared to other GAN-based methods. Additionally, whilst direct comparisons with some of the state-of-the-art cloud removal techniques are limited due to unclear details regarding their train/test data splits, the proposed model achieves competitive results based on available benchmarks. |
None | |
Fusion of Deep Learning and GIS for Advanced Remote Sensing Image Analysis | 2024-12-25 | ShowThis paper presents an innovative framework for remote sensing image analysis by fusing deep learning techniques, specifically Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, with Geographic Information Systems (GIS). The primary objective is to enhance the accuracy and efficiency of spatial data analysis by overcoming challenges associated with high dimensionality, complex patterns, and temporal data processing. We implemented optimization algorithms, namely Particle Swarm Optimization (PSO) and Genetic Algorithms (GA), to fine-tune model parameters, resulting in improved performance metrics. Our findings reveal a significant increase in classification accuracy from 78% to 92% and a reduction in prediction error from 12% to 6% after optimization. Additionally, the temporal accuracy of the models improved from 75% to 88%, showcasing the frameworks capability to monitor dynamic changes effectively. The integration of GIS not only enriched the spatial analysis but also facilitated a deeper understanding of the relationships between geographical features. This research demonstrates that combining advanced deep learning methods with GIS and optimization strategies can significantly advance remote sensing applications, paving the way for future developments in environmental monitoring, urban planning, and resource management. |
None | |
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis | 2024-12-19 | ShowThis paper develops a Versatile and Honest vision language Model (VHM) for remote sensing image analysis. VHM is built on a large-scale remote sensing image-text dataset with rich-content captions (VersaD), and an honest instruction dataset comprising both factual and deceptive questions (HnstD). Unlike prevailing remote sensing image-text datasets, in which image captions focus on a few prominent objects and their relationships, VersaD captions provide detailed information about image properties, object attributes, and the overall scene. This comprehensive captioning enables VHM to thoroughly understand remote sensing images and perform diverse remote sensing tasks. Moreover, different from existing remote sensing instruction datasets that only include factual questions, HnstD contains additional deceptive questions stemming from the non-existence of objects. This feature prevents VHM from producing affirmative answers to nonsense queries, thereby ensuring its honesty. In our experiments, VHM significantly outperforms various vision language models on common tasks of scene classification, visual question answering, and visual grounding. Additionally, VHM achieves competent performance on several unexplored tasks, such as building vectorizing, multi-label classification and honest question answering. We will release the code, data and model weights at https://github.com/opendatalab/VHM . |
Equal...Equal contribution: Chao Pang, Xingxing Weng, Jiang Wu; Corresponding author: Gui-Song Xia, Conghui He |
Code Link |
Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG | 2024-11-12 | ShowUltra High Resolution (UHR) remote sensing imagery (RSI) (e.g. 100,000 |
None | |
Wavelet-based Bi-dimensional Aggregation Network for SAR Image Change Detection | 2024-07-18 | ShowSynthetic aperture radar (SAR) image change detection is critical in remote sensing image analysis. Recently, the attention mechanism has been widely used in change detection tasks. However, existing attention mechanisms often employ down-sampling operations such as average pooling on the Key and Value components to enhance computational efficiency. These irreversible operations result in the loss of high-frequency components and other important information. To address this limitation, we develop Wavelet-based Bi-dimensional Aggregation Network (WBANet) for SAR image change detection. We design a wavelet-based self-attention block that includes discrete wavelet transform and inverse discrete wavelet transform operations on Key and Value components. Hence, the feature undergoes downsampling without any loss of information, while simultaneously enhancing local contextual awareness through an expanded receptive field. Additionally, we have incorporated a bi-dimensional aggregation module that boosts the non-linear representation capability by merging spatial and channel information via broadcast mechanism. Experimental results on three SAR datasets demonstrate that our WBANet significantly outperforms contemporary state-of-the-art methods. Specifically, our WBANet achieves 98.33%, 96.65%, and 96.62% of percentage of correct classification (PCC) on the respective datasets, highlighting its superior performance. Source codes are available at \url{https://github.com/summitgao/WBANet}. |
IEEE GRSL 2024 | Code Link |
Unsupervised Few-Shot Continual Learning for Remote Sensing Image Scene Classification | 2024-06-04 | ShowA continual learning (CL) model is desired for remote sensing image analysis because of varying camera parameters, spectral ranges, resolutions, etc. There exist some recent initiatives to develop CL techniques in this domain but they still depend on massive labelled samples which do not fully fit remote sensing applications because ground truths are often obtained via field-based surveys. This paper addresses this problem with a proposal of unsupervised flat-wide learning approach (UNISA) for unsupervised few-shot continual learning approaches of remote sensing image scene classifications which do not depend on any labelled samples for its model updates. UNISA is developed from the idea of prototype scattering and positive sampling for learning representations while the catastrophic forgetting problem is tackled with the flat-wide learning approach combined with a ball generator to address the data scarcity problem. Our numerical study with remote sensing image scene datasets and a hyperspectral dataset confirms the advantages of our solution. Source codes of UNISA are shared publicly in \url{https://github.com/anwarmaxsum/UNISA} to allow convenient future studies and reproductions of our numerical results. |
Under...Under Review for Publication in IEEE TGRS |
Code Link |
Vision Mamba: A Comprehensive Survey and Taxonomy | 2024-05-07 | ShowState Space Model (SSM) is a mathematical model used to describe and analyze the behavior of dynamic systems. This model has witnessed numerous applications in several fields, including control theory, signal processing, economics and machine learning. In the field of deep learning, state space models are used to process sequence data, such as time series analysis, natural language processing (NLP) and video understanding. By mapping sequence data to state space, long-term dependencies in the data can be better captured. In particular, modern SSMs have shown strong representational capabilities in NLP, especially in long sequence modeling, while maintaining linear time complexity. Notably, based on the latest state-space models, Mamba merges time-varying parameters into SSMs and formulates a hardware-aware algorithm for efficient training and inference. Given its impressive efficiency and strong long-range dependency modeling capability, Mamba is expected to become a new AI architecture that may outperform Transformer. Recently, a number of works have attempted to study the potential of Mamba in various fields, such as general vision, multi-modal, medical image analysis and remote sensing image analysis, by extending Mamba from natural language domain to visual domain. To fully understand Mamba in the visual domain, we conduct a comprehensive survey and present a taxonomy study. This survey focuses on Mamba's application to a variety of visual tasks and data types, and discusses its predecessors, recent advances and far-reaching impact on a wide range of domains. Since Mamba is now on an upward trend, please actively notice us if you have new findings, and new progress on Mamba will be included in this survey in a timely manner and updated on the Mamba project at https://github.com/lx6c78/Vision-Mamba-A-Comprehensive-Survey-and-Taxonomy. |
Code Link | |
RFL-CDNet: Towards Accurate Change Detection via Richer Feature Learning | 2024-04-27 | ShowChange Detection is a crucial but extremely challenging task of remote sensing image analysis, and much progress has been made with the rapid development of deep learning. However, most existing deep learning-based change detection methods mainly focus on intricate feature extraction and multi-scale feature fusion, while ignoring the insufficient utilization of features in the intermediate stages, thus resulting in sub-optimal results. To this end, we propose a novel framework, named RFL-CDNet, that utilizes richer feature learning to boost change detection performance. Specifically, we first introduce deep multiple supervision to enhance intermediate representations, thus unleashing the potential of backbone feature extractor at each stage. Furthermore, we design the Coarse-To-Fine Guiding (C2FG) module and the Learnable Fusion (LF) module to further improve feature learning and obtain more discriminative feature representations. The C2FG module aims to seamlessly integrate the side prediction from the previous coarse-scale into the current fine-scale prediction in a coarse-to-fine manner, while LF module assumes that the contribution of each stage and each spatial location is independent, thus designing a learnable module to fuse multiple predictions. Experiments on several benchmark datasets show that our proposed RFL-CDNet achieves state-of-the-art performance on WHU cultivated land dataset and CDD dataset, and the second-best performance on WHU building dataset. The source code and models are publicly available at https://github.com/Hhaizee/RFL-CDNet. |
Accep...Accepted by PR, volume 153 |
Code Link |
Single-temporal Supervised Remote Change Detection for Domain Generalization | 2024-04-23 | ShowChange detection is widely applied in remote sensing image analysis. Existing methods require training models separately for each dataset, which leads to poor domain generalization. Moreover, these methods rely heavily on large amounts of high-quality pair-labelled data for training, which is expensive and impractical. In this paper, we propose a multimodal contrastive learning (ChangeCLIP) based on visual-language pre-training for change detection domain generalization. Additionally, we propose a dynamic context optimization for prompt learning. Meanwhile, to address the data dependency issue of existing methods, we introduce a single-temporal and controllable AI-generated training strategy (SAIN). This allows us to train the model using a large number of single-temporal images without image pairs in the real world, achieving excellent generalization. Extensive experiments on series of real change detection datasets validate the superiority and strong generalization of ChangeCLIP, outperforming state-of-the-art change detection methods. Code will be available. |
None | |
Explainable AI for Earth Observation: Current Methods, Open Challenges, and Opportunities | 2023-11-08 | ShowDeep learning has taken by storm all fields involved in data analysis, including remote sensing for Earth observation. However, despite significant advances in terms of performance, its lack of explainability and interpretability, inherent to neural networks in general since their inception, remains a major source of criticism. Hence it comes as no surprise that the expansion of deep learning methods in remote sensing is being accompanied by increasingly intensive efforts oriented towards addressing this drawback through the exploration of a wide spectrum of Explainable Artificial Intelligence techniques. This chapter, organized according to prominent Earth observation application fields, presents a panorama of the state-of-the-art in explainable remote sensing image analysis. |
None | |
The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot | 2023-10-31 | ShowSegmentation is an essential step for remote sensing image processing. This study aims to advance the application of the Segment Anything Model (SAM), an innovative image segmentation model by Meta AI, in the field of remote sensing image analysis. SAM is known for its exceptional generalization capabilities and zero-shot learning, making it a promising approach to processing aerial and orbital images from diverse geographical contexts. Our exploration involved testing SAM across multi-scale datasets using various input prompts, such as bounding boxes, individual points, and text descriptors. To enhance the model's performance, we implemented a novel automated technique that combines a text-prompt-derived general example with one-shot training. This adjustment resulted in an improvement in accuracy, underscoring SAM's potential for deployment in remote sensing imagery and reducing the need for manual annotation. Despite the limitations encountered with lower spatial resolution images, SAM exhibits promising adaptability to remote sensing data analysis. We recommend future research to enhance the model's proficiency through integration with supplementary fine-tuning techniques and other networks. Furthermore, we provide the open-source code of our modifications on online repositories, encouraging further and broader adaptations of SAM to the remote sensing domain. |
20 pages, 9 figures | None |
Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process | 2023-09-29 | ShowUnderstanding the temporal dynamics of Earth's surface is a mission of multi-temporal remote sensing image analysis, significantly promoted by deep vision models with its fuel -- labeled multi-temporal images. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present a scalable multi-temporal remote sensing change data generator via generative modeling, which is cheap and automatic, alleviating these problems. Our main idea is to simulate a stochastic change process over time. We consider the stochastic change process as a probabilistic semantic state transition, namely generative probabilistic change model (GPCM), which decouples the complex simulation problem into two more trackable sub-problems, \ie, change event simulation and semantic change synthesis. To solve these two problems, we present the change generator (Changen), a GAN-based GPCM, enabling controllable object change data generation, including customizable object property, and change event. The extensive experiments suggest that our Changen has superior generation capability, and the change detectors with Changen pre-training exhibit excellent transferability to real-world change datasets. |
ICCV 2023 | None |
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in Remote Sensing | 2023-09-12 | ShowReal-time semantic segmentation of remote sensing imagery is a challenging task that requires a trade-off between effectiveness and efficiency. It has many applications including tracking forest fires, detecting changes in land use and land cover, crop health monitoring, and so on. With the success of efficient deep learning methods (i.e., efficient deep neural networks) for real-time semantic segmentation in computer vision, researchers have adopted these efficient deep neural networks in remote sensing image analysis. This paper begins with a summary of the fundamental compression methods for designing efficient deep neural networks and provides a brief but comprehensive survey, outlining the recent developments in real-time semantic segmentation of remote sensing imagery. We examine several seminal efficient deep learning methods, placing them in a taxonomy based on the network architecture design approach. Furthermore, we evaluate the quality and efficiency of some existing efficient deep neural networks on a publicly available remote sensing semantic segmentation benchmark dataset, the OpenEarthMap. The experimental results of an extensive comparative study demonstrate that most of the existing efficient deep neural networks have good segmentation quality, but they suffer low inference speed (i.e., high latency rate), which may limit their capability of deployment in real-time applications of remote sensing image segmentation. We provide some insights into the current trend and future research directions for real-time semantic segmentation of remote sensing imagery. |
Submi...Submitted to IEEE GRSM |
None |
STNet: Spatial and Temporal feature fusion network for change detection in remote sensing images | 2023-04-22 | ShowAs an important task in remote sensing image analysis, remote sensing change detection (RSCD) aims to identify changes of interest in a region from spatially co-registered multi-temporal remote sensing images, so as to monitor the local development. Existing RSCD methods usually formulate RSCD as a binary classification task, representing changes of interest by merely feature concatenation or feature subtraction and recovering the spatial details via densely connected change representations, whose performances need further improvement. In this paper, we propose STNet, a RSCD network based on spatial and temporal feature fusions. Specifically, we design a temporal feature fusion (TFF) module to combine bi-temporal features using a cross-temporal gating mechanism for emphasizing changes of interest; a spatial feature fusion module is deployed to capture fine-grained information using a cross-scale attention mechanism for recovering the spatial details of change representations. Experimental results on three benchmark datasets for RSCD demonstrate that the proposed method achieves the state-of-the-art performance. Code is available at https://github.com/xwmaxwma/rschange. |
Accep...Accepted by ICME 2023 |
Code Link |
Unifying Remote Sensing Image Retrieval and Classification with Robust Fine-tuning | 2023-03-07 | ShowAdvances in high resolution remote sensing image analysis are currently hampered by the difficulty of gathering enough annotated data for training deep learning methods, giving rise to a variety of small datasets and associated dataset-specific methods. Moreover, typical tasks such as classification and retrieval lack a systematic evaluation on standard benchmarks and training datasets, which make it hard to identify durable and generalizable scientific contributions. We aim at unifying remote sensing image retrieval and classification with a new large-scale training and testing dataset, SF300, including both vertical and oblique aerial images and made available to the research community, and an associated fine-tuning method. We additionally propose a new adversarial fine-tuning method for global descriptors. We show that our framework systematically achieves a boost of retrieval and classification performance on nine different datasets compared to an ImageNet pretrained baseline, with currently no other method to compare to. |
Perfo...Performance margin with the proposed method is not statistically significant. Please refer to http://alegoria.ign.fr/en/SF300_dataset if you are interested in the dataset |
None |
Transformers in Remote Sensing: A Survey | 2022-09-02 | ShowDeep learning-based algorithms have seen a massive popularity in different areas of remote sensing image analysis over the past decade. Recently, transformers-based architectures, originally introduced in natural language processing, have pervaded computer vision field where the self-attention mechanism has been utilized as a replacement to the popular convolution operator for capturing long-range dependencies. Inspired by recent advances in computer vision, remote sensing community has also witnessed an increased exploration of vision transformers for a diverse set of tasks. Although a number of surveys have focused on transformers in computer vision in general, to the best of our knowledge we are the first to present a systematic review of recent advances based on transformers in remote sensing. Our survey covers more than 60 recent transformers-based methods for different remote sensing problems in sub-areas of remote sensing: very high-resolution (VHR), hyperspectral (HSI) and synthetic aperture radar (SAR) imagery. We conclude the survey by discussing different challenges and open issues of transformers in remote sensing. Additionally, we intend to frequently update and maintain the latest transformers in remote sensing papers with their respective code at: https://github.com/VIROBO-15/Transformer-in-Remote-Sensing |
22 pages, 13 figures | Code Link |
Change Detection from Synthetic Aperture Radar Images via Graph-Based Knowledge Supplement Network | 2022-02-09 | ShowSynthetic aperture radar (SAR) image change detection is a vital yet challenging task in the field of remote sensing image analysis. Most previous works adopt a self-supervised method which uses pseudo-labeled samples to guide subsequent training and testing. However, deep networks commonly require many high-quality samples for parameter optimization. The noise in pseudo-labels inevitably affects the final change detection performance. To solve the problem, we propose a Graph-based Knowledge Supplement Network (GKSNet). To be more specific, we extract discriminative information from the existing labeled dataset as additional knowledge, to suppress the adverse effects of noisy samples to some extent. Afterwards, we design a graph transfer module to distill contextual information attentively from the labeled dataset to the target dataset, which bridges feature correlation between datasets. To validate the proposed method, we conducted extensive experiments on four SAR datasets, which demonstrated the superiority of the proposed GKSNet as compared to several state-of-the-art baselines. Our codes are available at https://github.com/summitgao/SAR_CD_GKSNet. |
Accep...Accepted by IEEE JSTARS |
Code Link |
Scale-aware Neural Network for Semantic Segmentation of Multi-resolution Remote Sensing Images | 2021-11-04 | ShowAssigning geospatial objects with specific categories at the pixel level is a fundamental task in remote sensing image analysis. Along with rapid development in sensor technologies, remotely sensed images can be captured at multiple spatial resolutions (MSR) with information content manifested at different scales. Extracting information from these MSR images represents huge opportunities for enhanced feature representation and characterisation. However, MSR images suffer from two critical issues: 1) increased scale variation of geo-objects and 2) loss of detailed information at coarse spatial resolutions. To bridge these gaps, in this paper, we propose a novel scale-aware neural network (SaNet) for semantic segmentation of MSR remotely sensed imagery. SaNet deploys a densely connected feature network (DCFFM) module to capture high-quality multi-scale context, such that the scale variation is handled properly and the quality of segmentation is increased for both large and small objects. A spatial feature recalibration (SFRM) module is further incorporated into the network to learn intact semantic content with enhanced spatial relationships, where the negative effects of information loss are removed. The combination of DCFFM and SFRM allows SaNet to learn scale-aware feature representation, which outperforms the existing multi-scale feature representation. Extensive experiments on three semantic segmentation datasets demonstrated the effectiveness of the proposed SaNet in cross-resolution segmentation. |
None | |
Weakly Supervised Few-Shot Segmentation Via Meta-Learning | 2021-09-03 | ShowSemantic segmentation is a classic computer vision task with multiple applications, which includes medical and remote sensing image analysis. Despite recent advances with deep-based approaches, labeling samples (pixels) for training models is laborious and, in some cases, unfeasible. In this paper, we present two novel meta learning methods, named WeaSeL and ProtoSeg, for the few-shot semantic segmentation task with sparse annotations. We conducted extensive evaluation of the proposed methods in different applications (12 datasets) in medical imaging and agricultural remote sensing, which are very distinct fields of knowledge and usually subject to data scarcity. The results demonstrated the potential of our method, achieving suitable results for segmenting both coffee/orange crops and anatomical parts of the human body in comparison with full dense annotation. |
None | |
Cloud and Cloud Shadow Segmentation for Remote Sensing Imagery via Filtered Jaccard Loss Function and Parametric Augmentation | 2021-04-23 | ShowCloud and cloud shadow segmentation are fundamental processes in optical remote sensing image analysis. Current methods for cloud/shadow identification in geospatial imagery are not as accurate as they should, especially in the presence of snow and haze. This paper presents a deep learning-based framework for the detection of cloud/shadow in Landsat 8 images. Our method benefits from a convolutional neural network, Cloud-Net+ (a modification of our previously proposed Cloud-Net \cite{myigarss}) that is trained with a novel loss function (Filtered Jaccard Loss). The proposed loss function is more sensitive to the absence of foreground objects in an image and penalizes/rewards the predicted mask more accurately than other common loss functions. In addition, a sunlight direction-aware data augmentation technique is developed for the task of cloud shadow detection to extend the generalization ability of the proposed model by expanding existing training sets. The combination of Cloud-Net+, Filtered Jaccard Loss function, and the proposed augmentation algorithm delivers superior results on four public cloud/shadow detection datasets. Our experiments on Pascal VOC dataset exemplifies the applicability and quality of our proposed network and loss function in other computer vision applications. |
12 pa...12 pages. This version is a bit different from the one published in IEEE JSTARS |
None |
Deep Learning for Change Detection in Remote Sensing Images: Comprehensive Review and Meta-Analysis | 2020-06-10 | ShowDeep learning (DL) algorithms are considered as a methodology of choice for remote-sensing image analysis over the past few years. Due to its effective applications, deep learning has also been introduced for automatic change detection and achieved great success. The present study attempts to provide a comprehensive review and a meta-analysis of the recent progress in this subfield. Specifically, we first introduce the fundamentals of deep learning methods which arefrequently adopted for change detection. Secondly, we present the details of the meta-analysis conducted to examine the status of change detection DL studies. Then, we focus on deep learning-based change detection methodologies for remote sensing images by giving a general overview of the existing methods. Specifically, these deep learning-based methods were classified into three groups; fully supervised learning-based methods, fully unsupervised learning-based methods and transfer learning-based techniques. As a result of these investigations, promising new directions were identified for future research. This study will contribute in several ways to our understanding of deep learning for change detection and will provide a basis for further research. |
None | |
Breaking the Limits of Remote Sensing by Simulation and Deep Learning for Flood and Debris Flow Mapping | 2020-06-09 | ShowWe propose a framework that estimates inundation depth (maximum water level) and debris-flow-induced topographic deformation from remote sensing imagery by integrating deep learning and numerical simulation. A water and debris flow simulator generates training data for various artificial disaster scenarios. We show that regression models based on Attention U-Net and LinkNet architectures trained on such synthetic data can predict the maximum water level and topographic deformation from a remote sensing-derived change detection map and a digital elevation model. The proposed framework has an inpainting capability, thus mitigating the false negatives that are inevitable in remote sensing image analysis. Our framework breaks the limits of remote sensing and enables rapid estimation of inundation depth and topographic deformation, essential information for emergency response, including rescue and relief activities. We conduct experiments with both synthetic and real data for two disaster events that caused simultaneous flooding and debris flows and demonstrate the effectiveness of our approach quantitatively and qualitatively. |
None | |
Contrast-weighted Dictionary Learning Based Saliency Detection for Remote Sensing Images | 2020-05-10 | ShowObject detection is an important task in remote sensing image analysis. To reduce the computational complexity of redundant information and improve the efficiency of image processing, visual saliency models have been widely applied in this field. In this paper, a novel saliency detection model based on Contrast-weighted Dictionary Learning (CDL) is proposed for remote sensing images. Specifically, the proposed CDL learns salient and non-salient atoms from positive and negative samples to construct a discriminant dictionary, in which a contrast-weighted term is proposed to encourage the contrast-weighted patterns to be present in the learned salient dictionary while discouraging them from being present in the non-salient dictionary. Then, we measure the saliency by combining the coefficients of the sparse representation (SR) and reconstruction errors. Furthermore, by using the proposed joint saliency measure, a variety of saliency maps are generated based on the discriminant dictionary. Finally, a fusion method based on global gradient optimization is proposed to integrate multiple saliency maps. Experimental results on four datasets demonstrate that the proposed model outperforms other state-of-the-art methods. |
None | |
A Remote Sensing Image Dataset for Cloud Removal | 2019-01-03 | ShowCloud-based overlays are often present in optical remote sensing images, thus limiting the application of acquired data. Removing clouds is an indispensable pre-processing step in remote sensing image analysis. Deep learning has achieved great success in the field of remote sensing in recent years, including scene classification and change detection. However, deep learning is rarely applied in remote sensing image removal clouds. The reason is the lack of data sets for training neural networks. In order to solve this problem, this paper first proposed the Remote sensing Image Cloud rEmoving dataset (RICE). The proposed dataset consists of two parts: RICE1 contains 500 pairs of images, each pair has images with cloud and cloudless size of 512512; RICE2 contains 450 sets of images, each set contains three 512512 size images. , respectively, the reference picture without clouds, the picture of the cloud and the mask of its cloud. The dataset is freely available at \url{https://github.com/BUPTLdy/RICE_DATASET}. |
Code Link | |
An Entropic Optimal Transport Loss for Learning Deep Neural Networks under Label Noise in Remote Sensing Images | 2018-10-02 | ShowDeep neural networks have established as a powerful tool for large scale supervised classification tasks. The state-of-the-art performances of deep neural networks are conditioned to the availability of large number of accurately labeled samples. In practice, collecting large scale accurately labeled datasets is a challenging and tedious task in most scenarios of remote sensing image analysis, thus cheap surrogate procedures are employed to label the dataset. Training deep neural networks on such datasets with inaccurate labels easily overfits to the noisy training labels and degrades the performance of the classification tasks drastically. To mitigate this effect, we propose an original solution with entropic optimal transportation. It allows to learn in an end-to-end fashion deep neural networks that are, to some extent, robust to inaccurately labeled samples. We empirically demonstrate on several remote sensing datasets, where both scene and pixel-based hyperspectral images are considered for classification. Our method proves to be highly tolerant to significant amounts of label noise and achieves favorable results against state-of-the-art methods. |
Under...Under Consideration at Computer Vision and Image Understanding |
None |
Learning Spectral-Spatial-Temporal Features via a Recurrent Convolutional Neural Network for Change Detection in Multispectral Imagery | 2018-03-07 | ShowChange detection is one of the central problems in earth observation and was extensively investigated over recent decades. In this paper, we propose a novel recurrent convolutional neural network (ReCNN) architecture, which is trained to learn a joint spectral-spatial-temporal feature representation in a unified framework for change detection in multispectral images. To this end, we bring together a convolutional neural network (CNN) and a recurrent neural network (RNN) into one end-to-end network. The former is able to generate rich spectral-spatial feature representations, while the latter effectively analyzes temporal dependency in bi-temporal images. In comparison with previous approaches to change detection, the proposed network architecture possesses three distinctive properties: 1) It is end-to-end trainable, in contrast to most existing methods whose components are separately trained or computed; 2) it naturally harnesses spatial information that has been proven to be beneficial to change detection task; 3) it is capable of adaptively learning the temporal dependency between multitemporal images, unlike most of algorithms that use fairly simple operation like image differencing or stacking. As far as we know, this is the first time that a recurrent convolutional network architecture has been proposed for multitemporal remote sensing image analysis. The proposed network is validated on real multispectral data sets. Both visual and quantitative analysis of experimental results demonstrates competitive performance in the proposed mode. |
None | |
Road Extraction by Deep Residual U-Net | 2017-11-29 | ShowRoad extraction from aerial images has been a hot research topic in the field of remote sensing image analysis. In this letter, a semantic segmentation neural network which combines the strengths of residual learning and U-Net is proposed for road area extraction. The network is built with residual units and has similar architecture to that of U-Net. The benefits of this model is two-fold: first, residual units ease training of deep networks. Second, the rich skip connections within the network could facilitate information propagation, allowing us to design networks with fewer parameters however better performance. We test our network on a public road dataset and compare it with U-Net and other two state of the art deep learning based road extraction methods. The proposed approach outperforms all the comparing methods, which demonstrates its superiority over recently developed state of the arts. |
Submi...Submitted to IEEE Geoscience and Remote Sensing Letters |
None |
An Effective Image Feature Classiffication using an improved SOM | 2015-01-08 | ShowImage feature classification is a challenging problem in many computer vision applications, specifically, in the fields of remote sensing, image analysis and pattern recognition. In this paper, a novel Self Organizing Map, termed improved SOM (iSOM), is proposed with the aim of effectively classifying Mammographic images based on their texture feature representation. The main contribution of the iSOM is to introduce a new node structure for the map representation and adopting a learning technique based on Kohonen SOM accordingly. The main idea is to control, in an unsupervised fashion, the weight updating procedure depending on the class reliability of the node, during the weight update time. Experiments held on a real Mammographic images. Results showed high accuracy compared to classical SOM and other state-of-art classifiers. |
None |