Title | Date | Abstract | Comment | CodeRepository |
---|---|---|---|---|
An Adaptive Underwater Image Enhancement Framework via Multi-Domain Fusion and Color Compensation | 2025-03-05 | ShowUnderwater optical imaging is severely degraded by light absorption, scattering, and color distortion, hindering visibility and accurate image analysis. This paper presents an adaptive enhancement framework integrating illumination compensation, multi-domain filtering, and dynamic color correction. A hybrid illumination compensation strategy combining CLAHE, Gamma correction, and Retinex enhances visibility. A two-stage filtering process, including spatial-domain (Gaussian, Bilateral, Guided) and frequency-domain (Fourier, Wavelet) methods, effectively reduces noise while preserving details. To correct color distortion, an adaptive color compensation (ACC) model estimates spectral attenuation and water type to combine RCP, DCP, and MUDCP dynamically. Finally, a perceptually guided color balance mechanism ensures natural color restoration. Experimental results on benchmark datasets demonstrate superior performance over state-of-the-art methods in contrast enhancement, color correction, and structural preservation, making the framework robust for underwater imaging applications. |
None | |
Perceptual Multi-Exposure Fusion | 2025-03-05 | ShowAs an ever-increasing demand for high dynamic range (HDR) scene shooting, multi-exposure image fusion (MEF) technology has abounded. In recent years, multi-scale exposure fusion approaches based on detail-enhancement have led the way for improvement in highlight and shadow details. Most of such methods, however, are too computationally expensive to be deployed on mobile devices. This paper presents a perceptual multi-exposure fusion method that not just ensures fine shadow/highlight details but with lower complexity than detailenhanced methods. We analyze the potential defects of three classical exposure measures in lieu of using detail-enhancement component and improve two of them, namely adaptive Wellexposedness (AWE) and the gradient of color images (3-D gradient). AWE designed in YCbCr color space considers the difference between varying exposure images. 3-D gradient is employed to extract fine details. We build a large-scale multiexposure benchmark dataset suitable for static scenes, which contains 167 image sequences all told. Experiments on the constructed dataset demonstrate that the proposed method exceeds existing eight state-of-the-art approaches in terms of visually and MEF-SSIM value. Moreover, our approach can achieve a better improvement for current image enhancement techniques, ensuring fine detail in bright light. |
None | |
Night-Voyager: Consistent and Efficient Nocturnal Vision-Aided State Estimation in Object Maps | 2025-03-04 | ShowAccurate and robust state estimation at nighttime is essential for autonomous robotic navigation to achieve nocturnal or round-the-clock tasks. An intuitive question arises: Can low-cost standard cameras be exploited for nocturnal state estimation? Regrettably, most existing visual methods may fail under adverse illumination conditions, even with active lighting or image enhancement. A pivotal insight, however, is that streetlights in most urban scenarios act as stable and salient prior visual cues at night, reminiscent of stars in deep space aiding spacecraft voyage in interstellar navigation. Inspired by this, we propose Night-Voyager, an object-level nocturnal vision-aided state estimation framework that leverages prior object maps and keypoints for versatile localization. We also find that the primary limitation of conventional visual methods under poor lighting conditions stems from the reliance on pixel-level metrics. In contrast, metric-agnostic, non-pixel-level object detection serves as a bridge between pixel-level and object-level spaces, enabling effective propagation and utilization of object map information within the system. Night-Voyager begins with a fast initialization to solve the global localization problem. By employing an effective two-stage cross-modal data association, the system delivers globally consistent state updates using map-based observations. To address the challenge of significant uncertainties in visual observations at night, a novel matrix Lie group formulation and a feature-decoupled multi-state invariant filter are introduced, ensuring consistent and efficient estimation. Through comprehensive experiments in both simulation and diverse real-world scenarios (spanning approximately 12.3 km), Night-Voyager showcases its efficacy, robustness, and efficiency, filling a critical gap in nocturnal vision-aided state estimation. |
IEEE ...IEEE Transactions on Robotics (T-RO), 2025 |
None |
ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement | 2025-03-04 | ShowLow-light image enhancement aims to restore the under-exposure image captured in dark scenarios. Under such scenarios, traditional frame-based cameras may fail to capture the structure and color information due to the exposure time limitation. Event cameras are bio-inspired vision sensors that respond to pixel-wise brightness changes asynchronously. Event cameras' high dynamic range is pivotal for visual perception in extreme low-light scenarios, surpassing traditional cameras and enabling applications in challenging dark environments. In this paper, inspired by the success of the retinex theory for traditional frame-based low-light image restoration, we introduce the first methods that combine the retinex theory with event cameras and propose a novel retinex-based low-light image restoration framework named ERetinex. Among our contributions, the first is developing a new approach that leverages the high temporal resolution data from event cameras with traditional image information to estimate scene illumination accurately. This method outperforms traditional image-only techniques, especially in low-light environments, by providing more precise lighting information. Additionally, we propose an effective fusion strategy that combines the high dynamic range data from event cameras with the color information of traditional images to enhance image quality. Through this fusion, we can generate clearer and more detail-rich images, maintaining the integrity of visual information even under extreme lighting conditions. The experimental results indicate that our proposed method outperforms state-of-the-art (SOTA) methods, achieving a gain of 1.0613 dB in PSNR while reducing FLOPS by \textbf{84.28}%. |
Accep...Accepted to ICRA 2025 |
None |
Wavelet-Enhanced Desnowing: A Novel Single Image Restoration Approach for Traffic Surveillance under Adverse Weather Conditions | 2025-03-03 | ShowImage restoration under adverse weather conditions refers to the process of removing degradation caused by weather particles while improving visual quality. Most existing deweathering methods rely on increasing the network scale and data volume to achieve better performance which requires more expensive computing power. Also, many methods lack generalization for specific applications. In the traffic surveillance screener, the main challenges are snow removal and veil effect elimination. In this paper, we propose a wavelet-enhanced snow removal method that use a Dual-Tree Complex Wavelet Transform feature enhancement module and a dynamic convolution acceleration module to address snow degradation in surveillance images. We also use a residual learning restoration module to remove veil effects caused by rain, snow, and fog. The proposed architecture extracts and analyzes information from snow-covered regions, significantly improving snow removal performance. And the residual learning restoration module removes veiling effects in images, enhancing clarity and detail. Experiments show that it performs better than some popular desnowing methods. Our approach also demonstrates effectiveness and accuracy when applied to real traffic surveillance images. |
None | |
Self-supervision via Controlled Transformation and Unpaired Self-conditioning for Low-light Image Enhancement | 2025-03-01 | ShowReal-world low-light images captured by imaging devices suffer from poor visibility and require a domain-specific enhancement to produce artifact-free outputs that reveal details. In this paper, we propose an unpaired low-light image enhancement network leveraging novel controlled transformation-based self-supervision and unpaired self-conditioning strategies. The model determines the required degrees of enhancement at the input image pixels, which are learned from the unpaired low-lit and well-lit images without any direct supervision. The self-supervision is based on a controlled transformation of the input image and subsequent maintenance of its enhancement in spite of the transformation. The self-conditioning performs training of the model on unpaired images such that it does not enhance an already-enhanced image or a well-lit input image. The inherent noise in the input low-light images is handled by employing low gradient magnitude suppression in a detail-preserving manner. In addition, our noise handling is self-conditioned by preventing the denoising of noise-free well-lit images. The training based on low-light image enhancement-specific attributes allows our model to avoid paired supervision without compromising significantly in performance. While our proposed self-supervision aids consistent enhancement, our novel self-conditioning facilitates adequate enhancement. Extensive experiments on multiple standard datasets demonstrate that our model, in general, outperforms the state-of-the-art both quantitatively and subjectively. Ablation studies show the effectiveness of our self-supervision and self-conditioning strategies, and the related loss functions. |
Copyr...Copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works |
None |
Flow Matching for Medical Image Synthesis: Bridging the Gap Between Speed and Quality | 2025-03-01 | ShowDeep learning models have emerged as a powerful tool for various medical applications. However, their success depends on large, high-quality datasets that are challenging to obtain due to privacy concerns and costly annotation. Generative models, such as diffusion models, offer a potential solution by synthesizing medical images, but their practical adoption is hindered by long inference times. In this paper, we propose the use of an optimal transport flow matching approach to accelerate image generation. By introducing a straighter mapping between the source and target distribution, our method significantly reduces inference time while preserving and further enhancing the quality of the outputs. Furthermore, this approach is highly adaptable, supporting various medical imaging modalities, conditioning mechanisms (such as class labels and masks), and different spatial dimensions, including 2D and 3D. Beyond image generation, it can also be applied to related tasks such as image enhancement. Our results demonstrate the efficiency and versatility of this framework, making it a promising advancement for medical imaging applications. Code with checkpoints and a synthetic dataset (beneficial for classification and segmentation) is now available on: https://github.com/milad1378yz/MOTFM. |
Code Link | |
SEE: See Everything Every Time -- Adaptive Brightness Adjustment for Broad Light Range Images via Events | 2025-02-28 | ShowEvent cameras, with a high dynamic range exceeding |
Code Link | |
HVI: A New Color Space for Low-light Image Enhancement | 2025-02-28 | ShowLow-Light Image Enhancement (LLIE) is a crucial computer vision task that aims to restore detailed visual information from corrupted low-light images. Many existing LLIE methods are based on standard RGB (sRGB) space, which often produce color bias and brightness artifacts due to inherent high color sensitivity in sRGB. While converting the images using Hue, Saturation and Value (HSV) color space helps resolve the brightness issue, it introduces significant red and black noise artifacts. To address this issue, we propose a new color space for LLIE, namely Horizontal/Vertical-Intensity (HVI), defined by polarized HS maps and learnable intensity. The former enforces small distances for red coordinates to remove the red artifacts, while the latter compresses the low-light regions to remove the black artifacts. To fully leverage the chromatic and intensity information, a novel Color and Intensity Decoupling Network (CIDNet) is further introduced to learn accurate photometric mapping function under different lighting conditions in the HVI space. Comprehensive results from benchmark and ablation experiments show that the proposed HVI color space with CIDNet outperforms the state-of-the-art methods on 10 datasets. The code is available at https://github.com/Fediory/HVI-CIDNet. |
Qings...Qingsen Yan, Yixu Feng, and Cheng Zhang contributed equally to this work |
Code Link |
Striving for Faster and Better: A One-Layer Architecture with Auto Re-parameterization for Low-Light Image Enhancement | 2025-02-27 | ShowDeep learning-based low-light image enhancers have made significant progress in recent years, with a trend towards achieving satisfactory visual quality while gradually reducing the number of parameters and improving computational efficiency. In this work, we aim to delving into the limits of image enhancers both from visual quality and computational efficiency, while striving for both better performance and faster processing. To be concrete, by rethinking the task demands, we build an explicit connection, i.e., visual quality and computational efficiency are corresponding to model learning and structure design, respectively. Around this connection, we enlarge parameter space by introducing the re-parameterization for ample model learning of a pre-defined minimalist network (e.g., just one layer), to avoid falling into a local solution. To strengthen the structural representation, we define a hierarchical search scheme for discovering a task-oriented re-parameterized structure, which also provides powerful support for efficiency. Ultimately, this achieves efficient low-light image enhancement using only a single convolutional layer, while maintaining excellent visual quality. Experimental results show our sensible superiority both in quality and efficiency against recently-proposed methods. Especially, our running time on various platforms (e.g., CPU, GPU, NPU, DSP) consistently moves beyond the existing fastest scheme. The source code will be released at https://github.com/vis-opt-group/AR-LLIE. |
Code Link | |
CLIP-Optimized Multimodal Image Enhancement via ISP-CNN Fusion for Coal Mine IoVT under Uneven Illumination | 2025-02-26 | ShowClear monitoring images are crucial for the safe operation of coal mine Internet of Video Things (IoVT) systems. However, low illumination and uneven brightness in underground environments significantly degrade image quality, posing challenges for enhancement methods that often rely on difficult-to-obtain paired reference images. Additionally, there is a trade-off between enhancement performance and computational efficiency on edge devices within IoVT systems.To address these issues, we propose a multimodal image enhancement method tailored for coal mine IoVT, utilizing an ISP-CNN fusion architecture optimized for uneven illumination. This two-stage strategy combines global enhancement with detail optimization, effectively improving image quality, especially in poorly lit areas. A CLIP-based multimodal iterative optimization allows for unsupervised training of the enhancement algorithm. By integrating traditional image signal processing (ISP) with convolutional neural networks (CNN), our approach reduces computational complexity while maintaining high performance, making it suitable for real-time deployment on edge devices.Experimental results demonstrate that our method effectively mitigates uneven brightness and enhances key image quality metrics, with PSNR improvements of 2.9%-4.9%, SSIM by 4.3%-11.4%, and VIF by 4.9%-17.8% compared to seven state-of-the-art algorithms. Simulated coal mine monitoring scenarios validate our method's ability to balance performance and computational demands, facilitating real-time enhancement and supporting safer mining operations. |
None | |
Improved Partial Differential Equation and Fast Approximation Algorithm for Hazy/Underwater/Dust Storm Image Enhancement | 2025-02-21 | ShowThis paper presents an improved and modified partial differential equation (PDE)-based de-hazing algorithm. The proposed method combines logarithmic image processing models in a PDE formulation refined with linear filter-based operators in either spatial or frequency domain. Additionally, a fast, simplified de-hazing function approximation of the hazy image formation model is developed in combination with fuzzy homomorphic refinement. The proposed algorithm solves the problem of image darkening and over-enhancement of edges in addition to enhancement of dark image regions encountered in previous formulations. This is in addition to avoiding enhancement of sky regions in de-hazed images while avoiding halo effect. Furthermore, the proposed algorithm is utilized for underwater and dust storm image enhancement with the incorporation of a modified global contrast enhancement algorithm. Experimental comparisons indicate that the proposed approach surpasses a majority of the algorithms from the literature based on quantitative image quality metrics. |
18 pages, 11 figures | None |
LUMINA-Net: Low-light Upgrade through Multi-stage Illumination and Noise Adaptation Network for Image Enhancement | 2025-02-21 | ShowLow-light image enhancement (LLIE) is a crucial task in computer vision aimed to enhance the visual fidelity of images captured under low-illumination conditions. Conventional methods frequently struggle to mitigate pervasive shortcomings such as noise, over-exposure, and color distortion thereby precipitating a pronounced degradation in image quality. To address these challenges, we propose LUMINA-Net an advanced deep learning framework designed specifically by integrating multi-stage illumination and reflectance modules. First, the illumination module intelligently adjusts brightness and contrast levels while meticulously preserving intricate textural details. Second, the reflectance module incorporates a noise reduction mechanism that leverages spatial attention and channel-wise feature refinement to mitigate noise contamination. Through a comprehensive suite of experiments conducted on LOL and SICE datasets using PSNR, SSIM and LPIPS metrics, surpassing state-of-the-art methodologies and showcasing its efficacy in low-light image enhancement. |
9 pages, 4 figures | None |
Optimized Pap Smear Image Enhancement: Hybrid PMD Filter-CLAHE Using Spider Monkey Optimization | 2025-02-21 | ShowPap smear image quality is crucial for cervical cancer detection. This study introduces an optimized hybrid approach that combines the Perona-Malik Diffusion (PMD) filter with contrast-limited adaptive histogram equalization (CLAHE) to enhance Pap smear image quality. The PMD filter reduces the image noise, whereas CLAHE improves the image contrast. The hybrid method was optimized using spider monkey optimization (SMO PMD-CLAHE). BRISQUE and CEIQ are the new objective functions for the PMD filter and CLAHE optimization, respectively. The simulations were conducted using the SIPaKMeD dataset. The results indicate that SMO outperforms state-of-the-art methods in optimizing the PMD filter and CLAHE. The proposed method achieved an average effective measure of enhancement (EME) of 5.45, root mean square (RMS) contrast of 60.45, Michelson's contrast (MC) of 0.995, and entropy of 6.80. This approach offers a new perspective for improving Pap smear image quality. |
None | |
EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement | 2025-02-20 | ShowOver the past decade, generative models have achieved significant success in enhancement fundus images.However, the evaluation of these models still presents a considerable challenge. A comprehensive evaluation benchmark for fundus image enhancement is indispensable for three main reasons: 1) The existing denoising metrics (e.g., PSNR, SSIM) are hardly to extend to downstream real-world clinical research (e.g., Vessel morphology consistency). 2) There is a lack of comprehensive evaluation for both paired and unpaired enhancement methods, along with the need for expert protocols to accurately assess clinical value. 3) An ideal evaluation system should provide insights to inform future developments of fundus image enhancement. To this end, we propose a novel comprehensive benchmark, EyeBench, to provide insights that align enhancement models with clinical needs, offering a foundation for future work to improve the clinical relevance and applicability of generative models for fundus image enhancement. EyeBench has three appealing properties: 1) multi-dimensional clinical alignment downstream evaluation: In addition to evaluating the enhancement task, we provide several clinically significant downstream tasks for fundus images, including vessel segmentation, DR grading, denoising generalization, and lesion segmentation. 2) Medical expert-guided evaluation design: We introduce a novel dataset that promote comprehensive and fair comparisons between paired and unpaired methods and includes a manual evaluation protocol by medical experts. 3) Valuable insights: Our benchmark study provides a comprehensive and rigorous evaluation of existing methods across different downstream tasks, assisting medical experts in making informed choices. Additionally, we offer further analysis of the challenges faced by existing methods. The code is available at \url{https://github.com/Retinal-Research/EyeBench} |
Code Link | |
IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model | 2025-02-17 | ShowInfrared image super-resolution demands long-range dependency modeling and multi-scale feature extraction to address challenges such as homogeneous backgrounds, weak edges, and sparse textures. While Mamba-based state-space models (SSMs) excel in global dependency modeling with linear complexity, their block-wise processing disrupts spatial consistency, limiting their effectiveness for IR image reconstruction. We propose IRSRMamba, a novel framework integrating wavelet transform feature modulation for multi-scale adaptation and an SSMs-based semantic consistency loss to restore fragmented contextual information. This design enhances global-local feature fusion, structural coherence, and fine-detail preservation while mitigating block-induced artifacts. Experiments on benchmark datasets demonstrate that IRSRMamba outperforms state-of-the-art methods in PSNR, SSIM, and perceptual quality. This work establishes Mamba-based architectures as a promising direction for high-fidelity IR image enhancement. Code are available at https://github.com/yongsongH/IRSRMamba. |
This ...This work has been submitted to the IEEE for possible publication |
Code Link |
Zero-Reference Lighting Estimation Diffusion Model for Low-Light Image Enhancement | 2025-02-16 | ShowDiffusion model-based low-light image enhancement methods rely heavily on paired training data, leading to limited extensive application. Meanwhile, existing unsupervised methods lack effective bridging capabilities for unknown degradation. To address these limitations, we propose a novel zero-reference lighting estimation diffusion model for low-light image enhancement called Zero-LED. It utilizes the stable convergence ability of diffusion models to bridge the gap between low-light domains and real normal-light domains and successfully alleviates the dependence on pairwise training data via zero-reference learning. Specifically, we first design the initial optimization network to preprocess the input image and implement bidirectional constraints between the diffusion model and the initial optimization network through multiple objective functions. Subsequently, the degradation factors of the real-world scene are optimized iteratively to achieve effective light enhancement. In addition, we explore a frequency-domain based and semantically guided appearance reconstruction module that encourages feature alignment of the recovered image at a fine-grained level and satisfies subjective expectations. Finally, extensive experiments demonstrate the superiority of our approach to other state-of-the-art methods and more significant generalization capabilities. We will open the source code upon acceptance of the paper. |
None | |
VIIS: Visible and Infrared Information Synthesis for Severe Low-light Image Enhancement | 2025-02-13 | ShowImages captured in severe low-light circumstances often suffer from significant information absence. Existing singular modality image enhancement methods struggle to restore image regions lacking valid information. By leveraging light-impervious infrared images, visible and infrared image fusion methods have the potential to reveal information hidden in darkness. However, they primarily emphasize inter-modal complementation but neglect intra-modal enhancement, limiting the perceptual quality of output images. To address these limitations, we propose a novel task, dubbed visible and infrared information synthesis (VIIS), which aims to achieve both information enhancement and fusion of the two modalities. Given the difficulty in obtaining ground truth in the VIIS task, we design an information synthesis pretext task (ISPT) based on image augmentation. We employ a diffusion model as the framework and design a sparse attention-based dual-modalities residual (SADMR) conditioning mechanism to enhance information interaction between the two modalities. This mechanism enables features with prior knowledge from both modalities to adaptively and iteratively attend to each modality's information during the denoising process. Our extensive experiments demonstrate that our model qualitatively and quantitatively outperforms not only the state-of-the-art methods in relevant fields but also the newly designed baselines capable of both information enhancement and fusion. The code is available at https://github.com/Chenz418/VIIS. |
Accep...Accepted to WACV 2025 |
Code Link |
Multi-Task-oriented Nighttime Haze Imaging Enhancer for Vision-driven Measurement Systems | 2025-02-11 | ShowSalient object detection (SOD) plays a critical role in vision-driven measurement systems (VMS), facilitating the detection and segmentation of key visual elements in an image. However, adverse imaging conditions such as haze during the day, low light, and haze at night severely degrade image quality, and complicating the SOD process. To address these challenges, we propose a multi-task-oriented nighttime haze imaging enhancer (MToIE), which integrates three tasks: daytime dehazing, low-light enhancement, and nighttime dehazing. The MToIE incorporates two key innovative components: First, the network employs a task-oriented node learning mechanism to handle three specific degradation types: day-time haze, low light, and night-time haze conditions, with an embedded self-attention module enhancing its performance in nighttime imaging. In addition, multi-receptive field enhancement module that efficiently extracts multi-scale features through three parallel depthwise separable convolution branches with different dilation rates, capturing comprehensive spatial information with minimal computational overhead. To ensure optimal image reconstruction quality and visual characteristics, we suggest a hybrid loss function. Extensive experiments on different types of weather/imaging conditions illustrate that MToIE surpasses existing methods, significantly enhancing the accuracy and reliability of vision systems across diverse imaging scenarios. The code is available at https://github.com/Ai-Chen-Lab/MToIE. |
Code Link | |
A Comprehensive Survey on Image Signal Processing Approaches for Low-Illumination Image Enhancement | 2025-02-09 | ShowThe usage of digital content (photos and videos) in a variety of applications has increased due to the popularity of multimedia devices. These uses include advertising campaigns, educational resources, and social networking platforms. There is an increasing need for high-quality graphic information as people become more visually focused. However, captured images frequently have poor visibility and a high amount of noise due to the limitations of image-capturing devices and lighting conditions. Improving the visual quality of images taken in low illumination is the aim of low-illumination image enhancement. This problem is addressed by traditional image enhancement techniques, which alter noise, brightness, and contrast. Deep learning-based methods, however, have dominated recently made advances in this area. These methods have effectively reduced noise while preserving important information, showing promising results in the improvement of low-illumination images. An extensive summary of image signal processing methods for enhancing low-illumination images is provided in this paper. Three categories are classified in the review for approaches: hybrid techniques, deep learning-based methods, and traditional approaches. Conventional techniques include denoising, automated white balancing, and noise reduction. Convolutional neural networks (CNNs) are used in deep learningbased techniques to recognize and extract characteristics from low-light images. To get better results, hybrid approaches combine deep learning-based methodologies with more conventional methods. The review also discusses the advantages and limitations of each approach and provides insights into future research directions in this field. |
None | |
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization | 2025-02-08 | ShowWe introduce a new task called Defeasible Visual Entailment (DVE), where the goal is to allow the modification of the entailment relationship between an image premise and a text hypothesis based on an additional update. While this concept is well-established in Natural Language Inference, it remains unexplored in visual entailment. At a high level, DVE enables models to refine their initial interpretations, leading to improved accuracy and reliability in various applications such as detecting misleading information in images, enhancing visual question answering, and refining decision-making processes in autonomous systems. Existing metrics do not adequately capture the change in the entailment relationship brought by updates. To address this, we propose a novel inference-aware evaluator designed to capture changes in entailment strength induced by updates, using pairwise contrastive learning and categorical information learning. Additionally, we introduce a reward-driven update optimization method to further enhance the quality of updates generated by multimodal models. Experimental results demonstrate the effectiveness of our proposed evaluator and optimization method. |
Accep...Accepted by AAAI 2025 |
None |
Segmentation-free integration of nuclei morphology and spatial transcriptomics for retinal images | 2025-02-08 | ShowThis study introduces SEFI (SEgmentation-Free Integration), a novel method for integrating morphological features of cell nuclei with spatial transcriptomics data. Cell segmentation poses a significant challenge in the analysis of spatial transcriptomics data, as tissue-specific structural complexities and densely packed cells in certain regions make it difficult to develop a universal approach. SEFI addresses this by utilizing self-supervised learning to extract morphological features from fluorescent nuclear staining images, enhancing the clustering of gene expression data without requiring segmentation. We demonstrate SEFI on spatially resolved gene expression profiles of the developing retina, acquired using multiplexed single molecule Fluorescence In Situ Hybridization (smFISH). SEFI is publicly available at https://github.com/eduardchelebian/sefi. |
Code Link | |
Performance Evaluation of Image Enhancement Techniques on Transfer Learning for Touchless Fingerprint Recognition | 2025-02-07 | ShowFingerprint recognition remains one of the most reliable biometric technologies due to its high accuracy and uniqueness. Traditional systems rely on contact-based scanners, which are prone to issues such as image degradation from surface contamination and inconsistent user interaction. To address these limitations, contactless fingerprint recognition has emerged as a promising alternative, providing non-intrusive and hygienic authentication. This study evaluates the impact of image enhancement tech-niques on the performance of pre-trained deep learning models using transfer learning for touchless fingerprint recognition. The IIT-Bombay Touchless and Touch-Based Fingerprint Database, containing data from 200 subjects, was employed to test the per-formance of deep learning architectures such as VGG-16, VGG-19, Inception-V3, and ResNet-50. Experimental results reveal that transfer learning methods with fingerprint image enhance-ment (indirect method) significantly outperform those without enhancement (direct method). Specifically, VGG-16 achieved an accuracy of 98% in training and 93% in testing when using the enhanced images, demonstrating superior performance compared to the direct method. This paper provides a detailed comparison of the effectiveness of image enhancement in improving the accuracy of transfer learning models for touchless fingerprint recognition, offering key insights for developing more efficient biometric systems. |
6 pages | None |
KAN See In the Dark | 2025-02-06 | ShowExisting low-light image enhancement methods are difficult to fit the complex nonlinear relationship between normal and low-light images due to uneven illumination and noise effects. The recently proposed Kolmogorov-Arnold networks (KANs) feature spline-based convolutional layers and learnable activation functions, which can effectively capture nonlinear dependencies. In this paper, we design a KAN-Block based on KANs and innovatively apply it to low-light image enhancement. This method effectively alleviates the limitations of current methods constrained by linear network structures and lack of interpretability, further demonstrating the potential of KANs in low-level vision tasks. Given the poor perception of current low-light image enhancement methods and the stochastic nature of the inverse diffusion process, we further introduce frequency-domain perception for visually oriented enhancement. Extensive experiments demonstrate the competitive performance of our method on benchmark datasets. The code will be available at: https://github.com/AXNing/KSID}{https://github.com/AXNing/KSID. |
Code Link | |
A framework for river connectivity classification using temporal image processing and attention based neural networks | 2025-02-01 | ShowMeasuring the connectivity of water in rivers and streams is essential for effective water resource management. Increased extreme weather events associated with climate change can result in alterations to river and stream connectivity. While traditional stream flow gauges are costly to deploy and limited to large river bodies, trail camera methods are a low-cost and easily deployed alternative to collect hourly data. Image capturing, however requires stream ecologists to manually curate (select and label) tens of thousands of images per year. To improve this workflow, we developed an automated instream trail camera image classification system consisting of three parts: (1) image processing, (2) image augmentation and (3) machine learning. The image preprocessing consists of seven image quality filters, foliage-based luma variance reduction, resizing and bottom-center cropping. Images are balanced using variable amount of generative augmentation using diffusion models and then passed to a machine learning classification model in labeled form. By using the vision transformer architecture and temporal image enhancement in our framework, we are able to increase the 75% base accuracy to 90% for a new unseen site image. We make use of a dataset captured and labeled by staff from the Connecticut Department of Energy and Environmental Protection between 2018-2020. Our results indicate that a combination of temporal image processing and attention-based models are effective at classifying unseen river connectivity images. |
15 pages, 8 figures | None |
DarkIR: Robust Low-Light Image Restoration | 2025-01-31 | ShowPhotography during night or in dark conditions typically suffers from noise, low light and blurring issues due to the dim environment and the common use of long exposure. Although Deblurring and Low-light Image Enhancement (LLIE) are related under these conditions, most approaches in image restoration solve these tasks separately. In this paper, we present an efficient and robust neural network for multi-task low-light image restoration. Instead of following the current tendency of Transformer-based models, we propose new attention mechanisms to enhance the receptive field of efficient CNNs. Our method reduces the computational costs in terms of parameters and MAC operations compared to previous methods. Our model, DarkIR, achieves new state-of-the-art results on the popular LOLBlur, LOLv2 and Real-LOLBlur datasets, being able to generalize on real-world night and dark images. Code and models at https://github.com/cidautai/DarkIR |
Technical Report | Code Link |
Bayesian Neural Networks for One-to-Many Mapping in Image Enhancement | 2025-01-30 | ShowIn image enhancement tasks, such as low-light and underwater image enhancement, a degraded image can correspond to multiple plausible target images due to dynamic photography conditions, such as variations in illumination. This naturally results in a one-to-many mapping challenge. To address this, we propose a Bayesian Enhancement Model (BEM) that incorporates Bayesian Neural Networks (BNNs) to capture data uncertainty and produce diverse outputs. To achieve real-time inference, we introduce a two-stage approach: Stage I employs a BNN to model the one-to-many mappings in the low-dimensional space, while Stage II refines fine-grained image details using a Deterministic Neural Network (DNN). To accelerate BNN training and convergence, we introduce a dynamic Momentum Prior. Extensive experiments on multiple low-light and underwater image enhancement benchmarks demonstrate the superiority of our method over deterministic models. |
None | |
Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment | 2025-01-29 | ShowWe introduce a novel segmentation-aware joint training framework called generative reinforcement network (GRN) that integrates segmentation loss feedback to optimize both image generation and segmentation performance in a single stage. An image enhancement technique called segmentation-guided enhancement (SGE) is also developed, where the generator produces images tailored specifically for the segmentation model. Two variants of GRN were also developed, including GRN for sample-efficient learning (GRN-SEL) and GRN for semi-supervised learning (GRN-SSL). GRN's performance was evaluated using a dataset of 69 fully annotated 3D ultrasound scans from 29 subjects. The annotations included six anatomical structures: dermis, superficial fat, superficial fascial membrane (SFM), deep fat, deep fascial membrane (DFM), and muscle. Our results show that GRN-SEL with SGE reduces labeling efforts by up to 70% while achieving a 1.98% improvement in the Dice Similarity Coefficient (DSC) compared to models trained on fully labeled datasets. GRN-SEL alone reduces labeling efforts by 60%, GRN-SSL with SGE decreases labeling requirements by 70%, and GRN-SSL alone by 60%, all while maintaining performance comparable to fully supervised models. These findings suggest the effectiveness of the GRN framework in optimizing segmentation performance with significantly less labeled data, offering a scalable and efficient solution for ultrasound image analysis and reducing the burdens associated with data annotation. |
None | |
Text-to-Image Generation for Vocabulary Learning Using the Keyword Method | 2025-01-28 | ShowThe 'keyword method' is an effective technique for learning vocabulary of a foreign language. It involves creating a memorable visual link between what a word means and what its pronunciation in a foreign language sounds like in the learner's native language. However, these memorable visual links remain implicit in the people's mind and are not easy to remember for a large set of words. To enhance the memorisation and recall of the vocabulary, we developed an application that combines the keyword method with text-to-image generators to externalise the memorable visual links into visuals. These visuals represent additional stimuli during the memorisation process. To explore the effectiveness of this approach we first run a pilot study to investigate how difficult it is to externalise the descriptions of mental visualisations of memorable links, by asking participants to write them down. We used these descriptions as prompts for text-to-image generator (DALL-E2) to convert them into images and asked participants to select their favourites. Next, we compared different text-to-image generators (DALL-E2, Midjourney, Stable and Latent Diffusion) to evaluate the perceived quality of the generated images by each. Despite heterogeneous results, participants mostly preferred images generated by DALL-E2, which was used also for the final study. In this study, we investigated whether providing such images enhances the retention of vocabulary being learned, compared to the keyword method only. Our results indicate that people did not encounter difficulties describing their visualisations of memorable links and that providing corresponding images significantly improves memory retention. |
None | |
Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration | 2025-01-27 | ShowImage restoration aims to recover details and enhance contrast in degraded images. With the growing demand for high-quality imaging (\textit{e.g.}, 4K and 8K), achieving a balance between restoration quality and computational efficiency has become increasingly critical. Existing methods, primarily based on CNNs, Transformers, or their hybrid approaches, apply uniform deep representation extraction across the image. However, these methods often struggle to effectively model long-range dependencies and largely overlook the spatial characteristics of image degradation (regions with richer textures tend to suffer more severe damage), making it hard to achieve the best trade-off between restoration quality and efficiency. To address these issues, we propose a novel texture-aware image restoration method, TAMambaIR, which simultaneously perceives image textures and achieves a trade-off between performance and efficiency. Specifically, we introduce a novel Texture-Aware State Space Model, which enhances texture awareness and improves efficiency by modulating the transition matrix of the state-space equation and focusing on regions with complex textures. Additionally, we design a {Multi-Directional Perception Block} to improve multi-directional receptive fields while maintaining low computational overhead. Extensive experiments on benchmarks for image super-resolution, deraining, and low-light image enhancement demonstrate that TAMambaIR achieves state-of-the-art performance with significantly improved efficiency, establishing it as a robust and efficient framework for image restoration. |
Technical Report | None |
PEP-GS: Perceptually-Enhanced Precise Structured 3D Gaussians for View-Adaptive Rendering | 2025-01-27 | ShowRecently, 3D Gaussian Splatting (3D-GS) has achieved significant success in real-time, high-quality 3D scene rendering. However, it faces several challenges, including Gaussian redundancy, limited ability to capture view-dependent effects, and difficulties in handling complex lighting and specular reflections. Additionally, methods that use spherical harmonics for color representation often struggle to effectively capture specular highlights and anisotropic components, especially when modeling view-dependent colors under complex lighting conditions, leading to insufficient contrast and unnatural color saturation. To address these limitations, we introduce PEP-GS, a perceptually-enhanced framework that dynamically predicts Gaussian attributes, including opacity, color, and covariance. We replace traditional spherical harmonics with a Hierarchical Granular-Structural Attention mechanism, which enables more accurate modeling of complex view-dependent color effects and specular highlights. By employing a stable and interpretable framework for opacity and covariance estimation, PEP-GS avoids the removal of essential Gaussians prematurely, ensuring a more accurate scene representation. Furthermore, perceptual optimization is applied to the final rendered images, enhancing perceptual consistency across different views and ensuring high-quality renderings with improved texture fidelity and fine-scale detail preservation. Experimental results demonstrate that PEP-GS outperforms state-of-the-art methods, particularly in challenging scenarios involving view-dependent effects, specular reflections, and fine-scale details. |
None | |
UDBE: Unsupervised Diffusion-based Brightness Enhancement in Underwater Images | 2025-01-27 | ShowActivities in underwater environments are paramount in several scenarios, which drives the continuous development of underwater image enhancement techniques. A major challenge in this domain is the depth at which images are captured, with increasing depth resulting in a darker environment. Most existing methods for underwater image enhancement focus on noise removal and color adjustment, with few works dedicated to brightness enhancement. This work introduces a novel unsupervised learning approach to underwater image enhancement using a diffusion model. Our method, called UDBE, is based on conditional diffusion to maintain the brightness details of the unpaired input images. The input image is combined with a color map and a Signal-Noise Relation map (SNR) to ensure stable training and prevent color distortion in the output images. The results demonstrate that our approach achieves an impressive accuracy rate in the datasets UIEB, SUIM and RUIE, well-established underwater image benchmarks. Additionally, the experiments validate the robustness of our approach, regarding the image quality metrics PSNR, SSIM, UIQM, and UISM, indicating the good performance of the brightness enhancement process. The source code is available here: https://github.com/gusanagy/UDBE. |
Paper...Paper presented at ICMLA 2024 |
Code Link |
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding | 2025-01-25 | ShowRemote sensing (RS) visual grounding aims to use natural language expression to locate specific objects (in the form of the bounding box or segmentation mask) in RS images, enhancing human interaction with intelligent RS interpretation systems. Early research in this area was primarily based on horizontal bounding boxes (HBBs), but as more diverse RS datasets have become available, tasks involving oriented bounding boxes (OBBs) and segmentation masks have emerged. In practical applications, different targets require different grounding types: HBB can localize an object's position, OBB provides its orientation, and mask depicts its shape. However, existing specialized methods are typically tailored to a single type of RS visual grounding task and are hard to generalize across tasks. In contrast, large vision-language models (VLMs) exhibit powerful multi-task learning capabilities but struggle to handle dense prediction tasks like segmentation. This paper proposes GeoGround, a novel framework that unifies support for HBB, OBB, and mask RS visual grounding tasks, allowing flexible output selection. Rather than customizing the architecture of VLM, our work aims to elegantly support pixel-level visual grounding output through the Text-Mask technique. We define prompt-assisted and geometry-guided learning to enhance consistency across different signals. To support model training, we present refGeo, a large-scale RS visual instruction-following dataset containing 161k image-text pairs. Experimental results show that GeoGround demonstrates strong performance across four RS visual grounding tasks, matching or surpassing the performance of specialized methods on multiple benchmarks. Code available at https://github.com/zytx121/GeoGround |
25 pages, 19 figures | Code Link |
Enhanced Confocal Laser Scanning Microscopy with Adaptive Physics Informed Deep Autoencoders | 2025-01-24 | ShowWe present a physics-informed deep learning framework to address common limitations in Confocal Laser Scanning Microscopy (CLSM), such as diffraction limited resolution, noise, and undersampling due to low laser power conditions. The optical system's point spread function (PSF) and common CLSM image degradation mechanisms namely photon shot noise, dark current noise, motion blur, speckle noise, and undersampling were modeled and were directly included into model architecture. The model reconstructs high fidelity images from heavily noisy inputs by using convolutional and transposed convolutional layers. Following the advances in compressed sensing, our approach significantly reduces data acquisition requirements without compromising image resolution. The proposed method was extensively evaluated on simulated CLSM images of diverse structures, including lipid droplets, neuronal networks, and fibrillar systems. Comparisons with traditional deconvolution algorithms such as Richardson-Lucy (RL), non-negative least squares (NNLS), and other methods like Total Variation (TV) regularization, Wiener filtering, and Wavelet denoising demonstrate the superiority of the network in restoring fine structural details with high fidelity. Assessment metrics like Structural Similarity Index (SSIM) and Peak Signal to Noise Ratio (PSNR), underlines that the AdaptivePhysicsAutoencoder achieved robust image enhancement across diverse CLSM conditions, helping faster acquisition, reduced photodamage, and reliable performance in low light and sparse sampling scenarios holding promise for applications in live cell imaging, dynamic biological studies, and high throughput material characterization. |
None | |
Where Do You Go? Pedestrian Trajectory Prediction using Scene Features | 2025-01-23 | ShowAccurate prediction of pedestrian trajectories is crucial for enhancing the safety of autonomous vehicles and reducing traffic fatalities involving pedestrians. While numerous studies have focused on modeling interactions among pedestrians to forecast their movements, the influence of environmental factors and scene-object placements has been comparatively underexplored. In this paper, we present a novel trajectory prediction model that integrates both pedestrian interactions and environmental context to improve prediction accuracy. Our approach captures spatial and temporal interactions among pedestrians within a sparse graph framework. To account for pedestrian-scene interactions, we employ advanced image enhancement and semantic segmentation techniques to extract detailed scene features. These scene and interaction features are then fused through a cross-attention mechanism, enabling the model to prioritize relevant environmental factors that influence pedestrian movements. Finally, a temporal convolutional network processes the fused features to predict future pedestrian trajectories. Experimental results demonstrate that our method significantly outperforms existing state-of-the-art approaches, achieving ADE and FDE values of 0.252 and 0.372 meters, respectively, underscoring the importance of incorporating both social interactions and environmental context in pedestrian trajectory prediction. |
Accep...Accepted by 2024 International Conference on Intelligent Computing and its Emerging Applications |
None |
Quality Enhancement of Radiographic X-ray Images by Interpretable Mapping | 2025-01-21 | ShowX-ray imaging is the most widely used medical imaging modality. However, in the common practice, inconsistency in the initial presentation of X-ray images is a common complaint by radiologists. Different patient positions, patient habitus and scanning protocols can lead to differences in image presentations, e.g., differences in brightness and contrast globally or regionally. To compensate for this, additional work will be executed by clinical experts to adjust the images to the desired presentation, which can be time-consuming. Existing deep-learning-based end-to-end solutions can automatically correct images with promising performances. Nevertheless, these methods are hard to be interpreted and difficult to be understood by clinical experts. In this manuscript, a novel interpretable mapping method by deep learning is proposed, which automatically enhances the image brightness and contrast globally and locally. Meanwhile, because the model is inspired by the workflow of the brightness and contrast manipulation, it can provide interpretable pixel maps for explaining the motivation of image enhancement. The experiment on the clinical datasets show the proposed method can provide consistent brightness and contrast correction on X-ray images with accuracy of 24.75 dB PSNR and 0.8431 SSIM. |
SPIE ...SPIE Medical Imaging 2025 |
None |
DLEN: Dual Branch of Transformer for Low-Light Image Enhancement in Dual Domains | 2025-01-21 | ShowLow-light image enhancement (LLE) aims to improve the visual quality of images captured in poorly lit conditions, which often suffer from low brightness, low contrast, noise, and color distortions. These issues hinder the performance of computer vision tasks such as object detection, facial recognition, and autonomous driving.Traditional enhancement techniques, such as multi-scale fusion and histogram equalization, fail to preserve fine details and often struggle with maintaining the natural appearance of enhanced images under complex lighting conditions. Although the Retinex theory provides a foundation for image decomposition, it often amplifies noise, leading to suboptimal image quality. In this paper, we propose the Dual Light Enhance Network (DLEN), a novel architecture that incorporates two distinct attention mechanisms, considering both spatial and frequency domains. Our model introduces a learnable wavelet transform module in the illumination estimation phase, preserving high- and low-frequency components to enhance edge and texture details. Additionally, we design a dual-branch structure that leverages the power of the Transformer architecture to enhance both the illumination and structural components of the image.Through extensive experiments, our model outperforms state-of-the-art methods on standard benchmarks.Code is available here: https://github.com/LaLaLoXX/DLEN |
10pages,6figures | Code Link |
FLOL: Fast Baselines for Real-World Low-Light Enhancement | 2025-01-16 | ShowLow-Light Image Enhancement (LLIE) is a key task in computational photography and imaging. The problem of enhancing images captured during night or in dark environments has been well-studied in the image signal processing literature. However, current deep learning-based solutions struggle with efficiency and robustness in real-world scenarios (e.g. scenes with noise, saturated pixels, bad illumination). We propose a lightweight neural network that combines image processing in the frequency and spatial domains. Our method, FLOL+, is one of the fastest models for this task, achieving state-of-the-art results on popular real scenes datasets such as LOL and LSRW. Moreover, we are able to process 1080p images under 12ms. Code and models at https://github.com/cidautai/FLOL |
Technical Report | Code Link |
When No-Reference Image Quality Models Meet MAP Estimation in Diffusion Latents | 2025-01-15 | ShowContemporary no-reference image quality assessment (NR-IQA) models can effectively quantify perceived image quality, often achieving strong correlations with human perceptual scores on standard IQA benchmarks. Yet, limited efforts have been devoted to treating NR-IQA models as natural image priors for real-world image enhancement, and consequently comparing them from a perceptual optimization standpoint. In this work, we show -- for the first time -- that NR-IQA models can be plugged into the maximum a posteriori (MAP) estimation framework for image enhancement. This is achieved by performing gradient ascent in the diffusion latent space rather than in the raw pixel domain, leveraging a pretrained differentiable and bijective diffusion process. Likely, different NR-IQA models lead to different enhanced outputs, which in turn provides a new computational means of comparing them. Unlike conventional correlation-based measures, our comparison method offers complementary insights into the respective strengths and weaknesses of the competing NR-IQA models in perceptual optimization scenarios. Additionally, we aim to improve the best-performing NR-IQA model in diffusion latent MAP estimation by incorporating the advantages of other top-performing methods. The resulting model delivers noticeably better results in enhancing real-world images afflicted by unknown and complex distortions, all preserving a high degree of image fidelity. |
None | |
AI Driven Water Segmentation with deep learning models for Enhanced Flood Monitoring | 2025-01-14 | ShowFlooding is a major natural hazard causing significant fatalities and economic losses annually, with increasing frequency due to climate change. Rapid and accurate flood detection and monitoring are crucial for mitigating these impacts. This study compares the performance of three deep learning models UNet, ResNet, and DeepLabv3 for pixelwise water segmentation to aid in flood detection, utilizing images from drones, in field observations, and social media. This study involves creating a new dataset that augments wellknown benchmark datasets with flood-specific images, enhancing the robustness of the models. The UNet, ResNet, and DeepLab v3 architectures are tested to determine their effectiveness in various environmental conditions and geographical locations, and the strengths and limitations of each model are also discussed here, providing insights into their applicability in different scenarios by predicting image segmentation masks. This fully automated approach allows these models to isolate flooded areas in images, significantly reducing processing time compared to traditional semi-automated methods. The outcome of this study is to predict segmented masks for each image effected by a flood disaster and the validation accuracy of these models. This methodology facilitates timely and continuous flood monitoring, providing vital data for emergency response teams to reduce loss of life and economic damages. It offers a significant reduction in the time required to generate flood maps, cutting down the manual processing time. Additionally, we present avenues for future research, including the integration of multimodal data sources and the development of robust deep learning architectures tailored specifically for flood detection tasks. Overall, our work contributes to the advancement of flood management strategies through innovative use of deep learning technologies. |
8 pages, 6 figures | None |
MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking | 2025-01-14 | ShowNight unmanned aerial vehicle (UAV) tracking is impeded by the challenges of poor illumination, with previous daylight-optimized methods demonstrating suboptimal performance in low-light conditions, limiting the utility of UAV applications. To this end, we propose an efficient mamba-based tracker, leveraging dual enhancement techniques to boost night UAV tracking. The mamba-based low-light enhancer, equipped with an illumination estimator and a damage restorer, achieves global image enhancement while preserving the details and structure of low-light images. Additionally, we advance a cross-modal mamba network to achieve efficient interactive learning between vision and language modalities. Extensive experiments showcase that our method achieves advanced performance and exhibits significantly improved computation and memory efficiency. For instance, our method is 2.8$\times$ faster than CiteTracker and reduces 50.2$%$ GPU memory. Our codes are available at \url{https://github.com/983632847/Awesome-Multimodal-Object-Tracking}. |
Preprint | Code Link |
PSA-VLM: Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment | 2025-01-13 | ShowBenefiting from the powerful capabilities of Large Language Models (LLMs), pre-trained visual encoder models connected to LLMs form Vision Language Models (VLMs). However, recent research shows that the visual modality in VLMs is highly vulnerable, allowing attackers to bypass safety alignment in LLMs through visually transmitted content, launching harmful attacks. To address this challenge, we propose a progressive concept-based alignment strategy, PSA-VLM, which incorporates safety modules as concept bottlenecks to enhance visual modality safety alignment. By aligning model predictions with specific safety concepts, we improve defenses against risky images, enhancing explainability and controllability while minimally impacting general performance. Our method is obtained through two-stage training. The low computational cost of the first stage brings very effective performance improvement, and the fine-tuning of the language model in the second stage further improves the safety performance. Our method achieves state-of-the-art results on popular VLM safety benchmark. |
arXiv...arXiv admin note: substantial text overlap with arXiv:2405.13581 |
None |
Natural Language Supervision for Low-light Image Enhancement | 2025-01-11 | ShowWith the development of deep learning, numerous methods for low-light image enhancement (LLIE) have demonstrated remarkable performance. Mainstream LLIE methods typically learn an end-to-end mapping based on pairs of low-light and normal-light images. However, normal-light images under varying illumination conditions serve as reference images, making it difficult to define a ``perfect'' reference image This leads to the challenge of reconciling metric-oriented and visual-friendly results. Recently, many cross-modal studies have found that side information from other related modalities can guide visual representation learning. Based on this, we introduce a Natural Language Supervision (NLS) strategy, which learns feature maps from text corresponding to images, offering a general and flexible interface for describing an image under different illumination. However, image distributions conditioned on textual descriptions are highly multimodal, which makes training difficult. To address this issue, we design a Textual Guidance Conditioning Mechanism (TCM) that incorporates the connections between image regions and sentence words, enhancing the ability to capture fine-grained cross-modal cues for images and text. This strategy not only utilizes a wider range of supervised sources, but also provides a new paradigm for LLIE based on visual and textual feature alignment. In order to effectively identify and merge features from various levels of image and textual information, we design an Information Fusion Attention (IFA) module to enhance different regions at different levels. We integrate the proposed TCM and IFA into a Natural Language Supervision network for LLIE, named NaLSuper. Finally, extensive experiments demonstrate the robustness and superior effectiveness of our proposed NaLSuper. |
12 pages, 10 figures | None |
Underwater Image Enhancement using Generative Adversarial Networks: A Survey | 2025-01-10 | ShowIn recent years, there has been a surge of research focused on underwater image enhancement using Generative Adversarial Networks (GANs), driven by the need to overcome the challenges posed by underwater environments. Issues such as light attenuation, scattering, and color distortion severely degrade the quality of underwater images, limiting their use in critical applications. Generative Adversarial Networks (GANs) have emerged as a powerful tool for enhancing underwater photos due to their ability to learn complex transformations and generate realistic outputs. These advancements have been applied to real-world applications, including marine biology and ecosystem monitoring, coral reef health assessment, underwater archaeology, and autonomous underwater vehicle (AUV) navigation. This paper explores all major approaches to underwater image enhancement, from physical and physics-free models to Convolutional Neural Network (CNN)-based models and state-of-the-art GAN-based methods. It provides a comprehensive analysis of these methods, evaluation metrics, datasets, and loss functions, offering a holistic view of the field. Furthermore, the paper delves into the limitations and challenges faced by current methods, such as generalization issues, high computational demands, and dataset biases, while suggesting potential directions for future research. |
15 pa...15 pages, 7 figures, 2 tables |
None |
HipyrNet: Hypernet-Guided Feature Pyramid network for mixed-exposure correction | 2025-01-09 | ShowRecent advancements in image translation for enhancing mixed-exposure images have demonstrated the transformative potential of deep learning algorithms. However, addressing extreme exposure variations in images remains a significant challenge due to the inherent complexity and contrast inconsistencies across regions. Current methods often struggle to adapt effectively to these variations, resulting in suboptimal performance. In this work, we propose HipyrNet, a novel approach that integrates a HyperNetwork within a Laplacian Pyramid-based framework to tackle the challenges of mixed-exposure image enhancement. The inclusion of a HyperNetwork allows the model to adapt to these exposure variations. HyperNetworks dynamically generates weights for another network, allowing dynamic changes during deployment. In our model, the HyperNetwork employed is used to predict optimal kernels for Feature Pyramid decomposition, which enables a tailored and adaptive decomposition process for each input image. Our enhanced translational network incorporates multiscale decomposition and reconstruction, leveraging dynamic kernel prediction to capture and manipulate features across varying scales. Extensive experiments demonstrate that HipyrNet outperforms existing methods, particularly in scenarios with extreme exposure variations, achieving superior results in both qualitative and quantitative evaluations. Our approach sets a new benchmark for mixed-exposure image enhancement, paving the way for future research in adaptive image translation. |
None | |
IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation | 2025-01-09 | Show3D Referring Expression Segmentation (3D-RES) aims to segment point cloud scenes based on a given expression. However, existing 3D-RES approaches face two major challenges: feature ambiguity and intent ambiguity. Feature ambiguity arises from information loss or distortion during point cloud acquisition due to limitations such as lighting and viewpoint. Intent ambiguity refers to the model's equal treatment of all queries during the decoding process, lacking top-down task-specific guidance. In this paper, we introduce an Image enhanced Prompt Decoding Network (IPDN), which leverages multi-view images and task-driven information to enhance the model's reasoning capabilities. To address feature ambiguity, we propose the Multi-view Semantic Embedding (MSE) module, which injects multi-view 2D image information into the 3D scene and compensates for potential spatial information loss. To tackle intent ambiguity, we designed a Prompt-Aware Decoder (PAD) that guides the decoding process by deriving task-driven signals from the interaction between the expression and visual features. Comprehensive experiments demonstrate that IPDN outperforms the state-ofthe-art by 1.9 and 4.2 points in mIoU metrics on the 3D-RES and 3D-GRES tasks, respectively. |
AAAI 2025 | None |
FrontierNet: Learning Visual Cues to Explore | 2025-01-08 | ShowExploration of unknown environments is crucial for autonomous robots; it allows them to actively reason and decide on what new data to acquire for tasks such as mapping, object discovery, and environmental assessment. Existing methods, such as frontier-based methods, rely heavily on 3D map operations, which are limited by map quality and often overlook valuable context from visual cues. This work aims at leveraging 2D visual cues for efficient autonomous exploration, addressing the limitations of extracting goal poses from a 3D map. We propose a image-only frontier-based exploration system, with FrontierNet as a core component developed in this work. FrontierNet is a learning-based model that (i) detects frontiers, and (ii) predicts their information gain, from posed RGB images enhanced by monocular depth priors. Our approach provides an alternative to existing 3D-dependent exploration systems, achieving a 16% improvement in early-stage exploration efficiency, as validated through extensive simulations and real-world experiments. |
None | |
DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision | 2025-01-08 | ShowLow-light image enhancement restores the colors and details of a single image and improves high-level visual tasks. However, restoring the lost details in the dark area is still a challenge relying only on the RGB domain. In this paper, we delve into frequency as a new clue into the model and propose a DCT-driven enhancement transformer (DEFormer) framework. First, we propose a learnable frequency branch (LFB) for frequency enhancement contains DCT processing and curvature-based frequency enhancement (CFE) to represent frequency features. Additionally, we propose a cross domain fusion (CDF) to reduce the differences between the RGB domain and the frequency domain. Our DEFormer has achieved superior results on the LOL and MIT-Adobe FiveK datasets, improving the dark detection performance. |
Accepted by ICASSP | None |
Recognition-Oriented Low-Light Image Enhancement based on Global and Pixelwise Optimization | 2025-01-08 | ShowIn this paper, we propose a novel low-light image enhancement method aimed at improving the performance of recognition models. Despite recent advances in deep learning, the recognition of images under low-light conditions remains a challenge. Although existing low-light image enhancement methods have been developed to improve image visibility for human vision, they do not specifically focus on enhancing recognition model performance. Our proposed low-light image enhancement method consists of two key modules: the Global Enhance Module, which adjusts the overall brightness and color balance of the input image, and the Pixelwise Adjustment Module, which refines image features at the pixel level. These modules are trained to enhance input images to improve downstream recognition model performance effectively. Notably, the proposed method can be applied as a frontend filter to improve low-light recognition performance without requiring retraining of downstream recognition models. Experimental results demonstrate that our method improves the performance of pretrained recognition models under low-light conditions and its effectiveness. |
accep...accepted to VISAPP2025 |
None |
Adaptive deep learning framework for robust unsupervised underwater image enhancement | 2025-01-07 | ShowOne of the main challenges in deep learning-based underwater image enhancement is the limited availability of high-quality training data. Underwater images are difficult to capture and are often of poor quality due to the distortion and loss of colour and contrast in water. This makes it difficult to train supervised deep learning models on large and diverse datasets, which can limit the model's performance. In this paper, we explore an alternative approach to supervised underwater image enhancement. Specifically, we propose a novel unsupervised underwater image enhancement framework that employs a conditional variational autoencoder (cVAE) to train a deep learning model with probabilistic adaptive instance normalization (PAdaIN) and statistically guided multi-colour space stretch that produces realistic underwater images. The resulting framework is composed of a U-Net as a feature extractor and a PAdaIN to encode the uncertainty, which we call UDnet. To improve the visual quality of the images generated by UDnet, we use a statistically guided multi-colour space stretch module that ensures visual consistency with the input image and provides an alternative to training using a ground truth image. The proposed model does not need manual human annotation and can learn with a limited amount of data and achieves state-of-the-art results on underwater images. We evaluated our proposed framework on eight publicly-available datasets. The results show that our proposed framework yields competitive performance compared to other state-of-the-art approaches in quantitative as well as qualitative metrics. Code available at https://github.com/alzayats/UDnet . |
25 pa...25 pages, 7 figures, 6 tables, accepted for publication in Expert Systems with Applications |
Code Link |
Conditional Consistency Guided Image Translation and Enhancement | 2025-01-03 | ShowConsistency models have emerged as a promising alternative to diffusion models, offering high-quality generative capabilities through single-step sample generation. However, their application to multi-domain image translation tasks, such as cross-modal translation and low-light image enhancement remains largely unexplored. In this paper, we introduce Conditional Consistency Models (CCMs) for multi-domain image translation by incorporating additional conditional inputs. We implement these modifications by introducing task-specific conditional inputs that guide the denoising process, ensuring that the generated outputs retain structural and contextual information from the corresponding input domain. We evaluate CCMs on 10 different datasets demonstrating their effectiveness in producing high-quality translated images across multiple domains. Code is available at https://github.com/amilbhagat/Conditional-Consistency-Models. |
6 pag...6 pages, 5 figures, 4 tables, The first two authors contributed equally |
Code Link |
Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion | 2025-01-02 | ShowThanks to the recent achievements in task-driven image quality enhancement (IQE) models like ESTR, the image enhancement model and the visual recognition model can mutually enhance each other's quantitation while producing high-quality processed images that are perceivable by our human vision systems. However, existing task-driven IQE models tend to overlook an underlying fact -- different levels of vision tasks have varying and sometimes conflicting requirements of image features. To address this problem, this paper proposes a generalized gradient promotion (GradProm) training strategy for task-driven IQE of medical images. Specifically, we partition a task-driven IQE system into two sub-models, i.e., a mainstream model for image enhancement and an auxiliary model for visual recognition. During training, GradProm updates only parameters of the image enhancement model using gradients of the visual recognition model and the image enhancement model, but only when gradients of these two sub-models are aligned in the same direction, which is measured by their cosine similarity. In case gradients of these two sub-models are not in the same direction, GradProm only uses the gradient of the image enhancement model to update its parameters. Theoretically, we have proved that the optimization direction of the image enhancement model will not be biased by the auxiliary visual recognition model under the implementation of GradProm. Empirically, extensive experimental results on four public yet challenging medical image datasets demonstrated the superior performance of GradProm over existing state-of-the-art methods. |
This ...This paper has been accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence |
None |
ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement | 2024-12-31 | ShowLow-light image enhancement is an important task in computer vision, essential for improving the visibility and quality of images captured in non-optimal lighting conditions. Inadequate illumination can lead to significant information loss and poor image quality, impacting various applications such as surveillance. photography, or even autonomous driving. In this regard, automated methods have been developed to automatically adjust illumination in the image for a better visual perception. Current enhancement techniques often use specific datasets to enhance low-light images, but still present challenges when adapting to diverse real-world conditions, where illumination degradation may be localized to specific regions. To address this challenge, the Adaptive Light Enhancement Network (ALEN) is introduced, whose main approach is the use of a classification mechanism to determine whether local or global illumination enhancement is required. Subsequently, estimator networks adjust illumination based on this classification and simultaneously enhance color fidelity. ALEN integrates the Light Classification Network (LCNet) for illuminance categorization, complemented by the Single-Channel Network (SCNet), and Multi-Channel Network (MCNet) for precise estimation of illumination and color, respectively. Extensive experiments on publicly available datasets for low-light conditions were carried out to underscore ALEN's robust generalization capabilities, demonstrating superior performance in both quantitative metrics and qualitative assessments when compared to recent state-of-the-art methods. The ALEN not only enhances image quality in terms of visual perception but also represents an advancement in high-level vision tasks, such as semantic segmentation, as presented in this work. The code of this method is available at https://github.com/xingyumex/ALEN |
Minor...Minor updates and corrections |
Code Link |
Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement | 2024-12-31 | ShowLow-light image enhancement, particularly in cross-domain tasks such as mapping from the raw domain to the sRGB domain, remains a significant challenge. Many deep learning-based methods have been developed to address this issue and have shown promising results in recent years. However, single-stage methods, which attempt to unify the complex mapping across both domains, leading to limited denoising performance. In contrast, two-stage approaches typically decompose a raw image with color filter arrays (CFA) into a four-channel RGGB format before feeding it into a neural network. However, this strategy overlooks the critical role of demosaicing within the Image Signal Processing (ISP) pipeline, leading to color distortions under varying lighting conditions, especially in low-light scenarios. To address these issues, we design a novel Mamba scanning mechanism, called RAWMamba, to effectively handle raw images with different CFAs. Furthermore, we present a Retinex Decomposition Module (RDM) grounded in Retinex prior, which decouples illumination from reflectance to facilitate more effective denoising and automatic non-linear exposure correction. By bridging demosaicing and denoising, better raw image enhancement is achieved. Experimental evaluations conducted on public datasets SID and MCR demonstrate that our proposed RAWMamba achieves state-of-the-art performance on cross-domain mapping. |
None | |
Dual Degradation-Inspired Deep Unfolding Network for Low-Light Image Enhancement | 2024-12-31 | ShowAlthough low-light image enhancement has achieved great stride based on deep enhancement models, most of them mainly stress on enhancement performance via an elaborated black-box network and rarely explore the physical significance of enhancement models. Towards this issue, we propose a Dual degrAdation-inSpired deep Unfolding network, termed DASUNet, for low-light image enhancement. Specifically, we construct a dual degradation model (DDM) to explicitly simulate the deterioration mechanism of low-light images. It learns two distinct image priors via considering degradation specificity between luminance and chrominance spaces. To make the proposed scheme tractable, we design an alternating optimization solution to solve the proposed DDM. Further, the designed solution is unfolded into a specified deep network, imitating the iteration updating rules, to form DASUNet. Based on different specificity in two spaces, we design two customized Transformer block to model different priors. Additionally, a space aggregation module (SAM) is presented to boost the interaction of two degradation models. Extensive experiments on multiple popular low-light image datasets validate the effectiveness of DASUNet compared to canonical state-of-the-art low-light image enhancement methods. Our source code and pretrained model will be publicly available. |
None | |
Low-Light Image Enhancement via Generative Perceptual Priors | 2024-12-30 | ShowAlthough significant progress has been made in enhancing visibility, retrieving texture details, and mitigating noise in Low-Light (LL) images, the challenge persists in applying current Low-Light Image Enhancement (LLIE) methods to real-world scenarios, primarily due to the diverse illumination conditions encountered. Furthermore, the quest for generating enhancements that are visually realistic and attractive remains an underexplored realm. In response to these challenges, we introduce a novel \textbf{LLIE} framework with the guidance of \textbf{G}enerative \textbf{P}erceptual \textbf{P}riors (\textbf{GPP-LLIE}) derived from vision-language models (VLMs). Specifically, we first propose a pipeline that guides VLMs to assess multiple visual attributes of the LL image and quantify the assessment to output the global and local perceptual priors. Subsequently, to incorporate these generative perceptual priors to benefit LLIE, we introduce a transformer-based backbone in the diffusion process, and develop a new layer normalization (\textit{\textbf{GPP-LN}}) and an attention mechanism (\textit{\textbf{LPP-Attn}}) guided by global and local perceptual priors. Extensive experiments demonstrate that our model outperforms current SOTA methods on paired LL datasets and exhibits superior generalization on real-world data. The code is released at \url{https://github.com/LowLevelAI/GPP-LLIE}. |
Accep...Accepted by AAAI 2025 |
Code Link |
Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems | 2024-12-28 | ShowWeakly Supervised Monitoring Anomaly Detection (WSMAD) utilizes weak supervision learning to identify anomalies, a critical task for smart city monitoring. However, existing multimodal approaches often fail to meet the real-time and interpretability requirements of edge devices due to their complexity. This paper presents TCVADS (Two-stage Cross-modal Video Anomaly Detection System), which leverages knowledge distillation and cross-modal contrastive learning to enable efficient, accurate, and interpretable anomaly detection on edge devices.TCVADS operates in two stages: coarse-grained rapid classification and fine-grained detailed analysis. In the first stage, TCVADS extracts features from video frames and inputs them into a time series analysis module, which acts as the teacher model. Insights are then transferred via knowledge distillation to a simplified convolutional network (student model) for binary classification. Upon detecting an anomaly, the second stage is triggered, employing a fine-grained multi-class classification model. This stage uses CLIP for cross-modal contrastive learning with text and images, enhancing interpretability and achieving refined classification through specially designed triplet textual relationships. Experimental results demonstrate that TCVADS significantly outperforms existing methods in model performance, detection efficiency, and interpretability, offering valuable contributions to smart city monitoring applications. |
IEEE ...IEEE TETC-CS (Under review) |
None |
ERUP-YOLO: Enhancing Object Detection Robustness for Adverse Weather Condition by Unified Image-Adaptive Processing | 2024-12-28 | ShowWe propose an image-adaptive object detection method for adverse weather conditions such as fog and low-light. Our framework employs differentiable preprocessing filters to perform image enhancement suitable for later-stage object detections. Our framework introduces two differentiable filters: a B'ezier curve-based pixel-wise (BPW) filter and a kernel-based local (KBL) filter. These filters unify the functions of classical image processing filters and improve performance of object detection. We also propose a domain-agnostic data augmentation strategy using the BPW filter. Our method does not require data-specific customization of the filter combinations, parameter ranges, and data augmentation. We evaluate our proposed approach, called Enhanced Robustness by Unified Image Processing (ERUP)-YOLO, by applying it to the YOLOv3 detector. Experiments on adverse weather datasets demonstrate that our proposed filters match or exceed the expressiveness of conventional methods and our ERUP-YOLO achieved superior performance in a wide range of adverse weather conditions, including fog and low-light conditions. |
Accep...Accepted to WACV 2025 |
None |
SDM-Car: A Dataset for Small and Dim Moving Vehicles Detection in Satellite Videos | 2024-12-24 | ShowVehicle detection and tracking in satellite video is essential in remote sensing (RS) applications. However, upon the statistical analysis of existing datasets, we find that the dim vehicles with low radiation intensity and limited contrast against the background are rarely annotated, which leads to the poor effect of existing approaches in detecting moving vehicles under low radiation conditions. In this paper, we address the challenge by building a \textbf{S}mall and \textbf{D}im \textbf{M}oving Cars (SDM-Car) dataset with a multitude of annotations for dim vehicles in satellite videos, which is collected by the Luojia 3-01 satellite and comprises 99 high-quality videos. Furthermore, we propose a method based on image enhancement and attention mechanisms to improve the detection accuracy of dim vehicles, serving as a benchmark for evaluating the dataset. Finally, we assess the performance of several representative methods on SDM-Car and present insightful findings. The dataset is openly available at https://github.com/TanedaM/SDM-Car. |
5 pag...5 pages, 7 figures, IEEE Geoscience and Remote Sensing Letters |
Code Link |
NightHaze: Nighttime Image Dehazing via Self-Prior Learning | 2024-12-23 | ShowMasked autoencoder (MAE) shows that severe augmentation during training produces robust representations for high-level tasks. This paper brings the MAE-like framework to nighttime image enhancement, demonstrating that severe augmentation during training produces strong network priors that are resilient to real-world night haze degradations. We propose a novel nighttime image dehazing method with self-prior learning. Our main novelty lies in the design of severe augmentation, which allows our model to learn robust priors. Unlike MAE that uses masking, we leverage two key challenging factors of nighttime images as augmentation: light effects and noise. During training, we intentionally degrade clear images by blending them with light effects as well as by adding noise, and subsequently restore the clear images. This enables our model to learn clear background priors. By increasing the noise values to approach as high as the pixel intensity values of the glow and light effect blended images, our augmentation becomes severe, resulting in stronger priors. While our self-prior learning is considerably effective in suppressing glow and revealing details of background scenes, in some cases, there are still some undesired artifacts that remain, particularly in the forms of over-suppression. To address these artifacts, we propose a self-refinement module based on the semi-supervised teacher-student framework. Our NightHaze, especially our MAE-like self-prior learning, shows that models trained with severe augmentation effectively improve the visibility of input haze images, approaching the clarity of clear nighttime images. Extensive experiments demonstrate that our NightHaze achieves state-of-the-art performance, outperforming existing nighttime image dehazing methods by a substantial margin of 15.5% for MUSIQ and 23.5% for ClipIQA. |
Accep...Accepted by AAAI 2025. Project page: https://bb12346.github.io/NightHaze/ |
Code Link |
Zero-Shot Low Light Image Enhancement with Diffusion Prior | 2024-12-22 | ShowBalancing aesthetic quality with fidelity when enhancing images from challenging, degraded sources is a core objective in computational photography. In this paper, we address low light image enhancement (LLIE), a task in which dark images often contain limited visible information. Diffusion models, known for their powerful image enhancement capacities, are a natural choice for this problem. However, their deep generative priors can also lead to hallucinations, introducing non-existent elements or substantially altering the visual semantics of the original scene. In this work, we introduce a novel zero-shot method for controlling and refining the generative behavior of diffusion models for dark-to-light image conversion tasks. Our method demonstrates superior performance over existing state-of-the-art methods in the task of low-light image enhancement, as evidenced by both quantitative metrics and qualitative analysis. |
None | |
ECAFormer: Low-light Image Enhancement using Cross Attention | 2024-12-22 | ShowLow-light image enhancement (LLIE) is critical in computer vision. Existing LLIE methods often fail to discover the underlying relationships between different sub-components, causing the loss of complementary information between multiple modules and network layers, ultimately resulting in the loss of image details. To beat this shortage, we design a hierarchical mutual Enhancement via a Cross Attention transformer (ECAFormer), which introduces an architecture that enables concurrent propagation and interaction of multiple features. The model preserves detailed information by introducing a Dual Multi-head self-attention (DMSA), which leverages visual and semantic features across different scales, allowing them to guide and complement each other. Besides, a Cross-Scale DMSA block is introduced to capture the residual connection, integrating cross-layer information to further enhance image detail. Experimental results show that ECAFormer reaches competitive performance across multiple benchmarks, yielding nearly a 3% improvement in PSNR over the suboptimal method, demonstrating the effectiveness of information interaction in LLIE. |
None | |
Rethinking Model Redundancy for Low-light Image Enhancement | 2024-12-21 | ShowLow-light image enhancement (LLIE) is a fundamental task in computational photography, aiming to improve illumination, reduce noise, and enhance the image quality of low-light images. While recent advancements primarily focus on customizing complex neural network models, we have observed significant redundancy in these models, limiting further performance improvement. In this paper, we investigate and rethink the model redundancy for LLIE, identifying parameter harmfulness and parameter uselessness. Inspired by the rethinking, we propose two innovative techniques to mitigate model redundancy while improving the LLIE performance: Attention Dynamic Reallocation (ADR) and Parameter Orthogonal Generation (POG). ADR dynamically reallocates appropriate attention based on original attention, thereby mitigating parameter harmfulness. POG learns orthogonal basis embeddings of parameters and prevents degradation to static parameters, thereby mitigating parameter uselessness. Experiments validate the effectiveness of our techniques. We will release the code to the public. |
None | |
Rethinking the Atmospheric Scattering-driven Attention via Channel and Gamma Correction Priors for Low-Light Image Enhancement | 2024-12-20 | ShowEnhancing low-light images remains a critical challenge in computer vision, as does designing lightweight models for edge devices that can handle the computational demands of deep learning. In this article, we introduce an extended version of the Channel-Prior and Gamma-Estimation Network (CPGA-Net), termed CPGA-Net+, which incorporates an attention mechanism driven by a reformulated Atmospheric Scattering Model and effectively addresses both global and local image processing through Plug-in Attention with gamma correction. These innovations enable CPGA-Net+ to achieve superior performance on image enhancement tasks for supervised and unsupervised learning, surpassing lightweight state-of-the-art methods with high efficiency. Furthermore, we provide a theoretical analysis showing that our approach inherently decomposes the enhancement process into restoration and lightening stages, aligning with the fundamental image degradation model. To further optimize efficiency, we introduce a block simplification technique that reduces computational costs by more than two-thirds. Experimental results validate the effectiveness of CPGA-Net+ and highlight its potential for applications in resource-constrained environments. |
This ...This work has been submitted to the IEEE for possible publication |
None |
SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild | 2024-12-20 | ShowSeagrass meadows play a crucial role in marine ecosystems, providing important services such as carbon sequestration, water quality improvement, and habitat provision. Monitoring the distribution and abundance of seagrass is essential for environmental impact assessments and conservation efforts. However, the current manual methods of analyzing underwater video transects to assess seagrass coverage are time-consuming and subjective. This work explores the use of deep learning models to automate the process of seagrass detection and coverage estimation from underwater video data. A dataset of over 8,300 annotated underwater images was created, and several deep learning architectures, including ResNet, InceptionNetV3, DenseNet, and Vision Transformer, were evaluated for the task of binary classification of |
None | |
Fed-AugMix: Balancing Privacy and Utility via Data Augmentation | 2024-12-18 | ShowGradient leakage attacks pose a significant threat to the privacy guarantees of federated learning. While distortion-based protection mechanisms are commonly employed to mitigate this issue, they often lead to notable performance degradation. Existing methods struggle to preserve model performance while ensuring privacy. To address this challenge, we propose a novel data augmentation-based framework designed to achieve a favorable privacy-utility trade-off, with the potential to enhance model performance in certain cases. Our framework incorporates the AugMix algorithm at the client level, enabling data augmentation with controllable severity. By integrating the Jensen-Shannon divergence into the loss function, we embed the distortion introduced by AugMix into the model gradients, effectively safeguarding privacy against deep leakage attacks. Moreover, the JS divergence promotes model consistency across different augmentations of the same image, enhancing both robustness and performance. Extensive experiments on benchmark datasets demonstrate the effectiveness and stability of our method in protecting privacy. Furthermore, our approach maintains, and in some cases improves, model performance, showcasing its ability to achieve a robust privacy-utility trade-off. |
None | |
Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD) | 2024-12-16 | ShowIn recent years, the application of machine learning to minimally invasive surgery (MIS) has attracted considerable interest. Datasets are critical to the use of such techniques. This paper presents a unique dataset recorded during ex vivo pseudo-cholecystectomy procedures on pig livers using the da Vinci Research Kit (dVRK). Unlike existing datasets, it addresses a critical gap by providing comprehensive kinematic data, recordings of all pedal inputs, and offers a time-stamped record of the endoscope's movements. This expanded version also includes segmentation and keypoint annotations of images, enhancing its utility for computer vision applications. Contributed by seven surgeons with varied backgrounds and experience levels that are provided as a part of this expanded version, the dataset is an important new resource for surgical robotics research. It enables the development of advanced methods for evaluating surgeon skills, tools for providing better context awareness, and automation of surgical tasks. Our work overcomes the limitations of incomplete recordings and imprecise kinematic data found in other datasets. To demonstrate the potential of the dataset for advancing automation in surgical robotics, we introduce two models that predict clutch usage and camera activation, a 3D scene reconstruction example, and the results from our keypoint and segmentation models. |
Prepr...Preprint of an article accepted in Journal of Medical Robotics Research (2024). The metadata will be updated once it is published |
None |
Deep Joint Unrolling for Deblurring and Low-Light Image Enhancement (JUDE) | 2024-12-16 | ShowLow-light and blurring issues are prevalent when capturing photos at night, often due to the use of long exposure to address dim environments. Addressing these joint problems can be challenging and error-prone if an end-to-end model is trained without incorporating an appropriate physical model. In this paper, we introduce JUDE, a Deep Joint Unrolling for Deblurring and Low-Light Image Enhancement, inspired by the image physical model. Based on Retinex theory and the blurring model, the low-light blurry input is iteratively deblurred and decomposed, producing sharp low-light reflectance and illuminance through an unrolling mechanism. Additionally, we incorporate various modules to estimate the initial blur kernel, enhance brightness, and eliminate noise in the final image. Comprehensive experiments on LOL-Blur and Real-LOL-Blur demonstrate that our method outperforms existing techniques both quantitatively and qualitatively. |
10 pages | None |
Training-and-Prompt-Free General Painterly Harmonization via Zero-Shot Disentenglement on Style and Content References | 2024-12-15 | ShowPainterly image harmonization aims at seamlessly blending disparate visual elements within a single image. However, previous approaches often struggle due to limitations in training data or reliance on additional prompts, leading to inharmonious and content-disrupted output. To surmount these hurdles, we design a Training-and-prompt-Free General Painterly Harmonization method (TF-GPH). TF-GPH incorporates a novel |
Code Link | |
AllWeatherNet:Unified Image Enhancement for Autonomous Driving under Adverse Weather and Lowlight-conditions | 2024-12-14 | ShowAdverse conditions like snow, rain, nighttime, and fog, pose challenges for autonomous driving perception systems. Existing methods have limited effectiveness in improving essential computer vision tasks, such as semantic segmentation, and often focus on only one specific condition, such as removing rain or translating nighttime images into daytime ones. To address these limitations, we propose a method to improve the visual quality and clarity degraded by such adverse conditions. Our method, AllWeather-Net, utilizes a novel hierarchical architecture to enhance images across all adverse conditions. This architecture incorporates information at three semantic levels: scene, object, and texture, by discriminating patches at each level. Furthermore, we introduce a Scaled Illumination-aware Attention Mechanism (SIAM) that guides the learning towards road elements critical for autonomous driving perception. SIAM exhibits robustness, remaining unaffected by changes in weather conditions or environmental scenes. AllWeather-Net effectively transforms images into normal weather and daytime scenes, demonstrating superior image enhancement results and subsequently enhancing the performance of semantic segmentation, with up to a 5.3% improvement in mIoU in the trained domain. We also show our model's generalization ability by applying it to unseen domains without re-training, achieving up to 3.9% mIoU improvement. Code can be accessed at: https://github.com/Jumponthemoon/AllWeatherNet. |
ICPR ...ICPR 2024, Piero Zamperoni Overall Best Student Paper Award |
Code Link |
Enhanced Low-Dose CT Image Reconstruction by Domain and Task Shifting Gaussian Denoisers | 2024-12-13 | ShowComputed tomography from a low radiation dose (LDCT) is challenging due to high noise in the projection data. Popular approaches for LDCT image reconstruction are two-stage methods, typically consisting of the filtered backprojection (FBP) algorithm followed by a neural network for LDCT image enhancement. Two-stage methods are attractive for their simplicity and potential for computational efficiency, typically requiring only a single FBP and a neural network forward pass for inference. However, the best reconstruction quality is currently achieved by unrolled iterative methods (Learned Primal-Dual and ItNet), which are more complex and thus have a higher computational cost for training and inference. We propose a method combining the simplicity and efficiency of two-stage methods with state-of-the-art reconstruction quality. Our strategy utilizes a neural network pretrained for Gaussian noise removal from natural grayscale images, fine-tuned for LDCT image enhancement. We call this method FBP-DTSGD (Domain and Task Shifted Gaussian Denoisers) as the fine-tuning is a task shift from Gaussian denoising to enhancing LDCT images and a domain shift from natural grayscale to LDCT images. An ablation study with three different pretrained Gaussian denoisers indicates that the performance of FBP-DTSGD does not depend on a specific denoising architecture, suggesting future advancements in Gaussian denoising could benefit the method. The study also shows that pretraining on natural images enhances LDCT reconstruction quality, especially with limited training data. Notably, pretraining involves no additional cost, as existing pretrained models are used. The proposed method currently holds the top mean position in the LoDoPaB-CT challenge. |
13 pages, 4 figures | None |
Physics-Inspired Synthesized Underwater Image Dataset | 2024-12-11 | ShowThis paper introduces the physics-inspired synthesized underwater image dataset (PHISWID), a dataset tailored for enhancing underwater image processing through physics-inspired image synthesis. For underwater image enhancement, data-driven approaches (e.g., deep neural networks) typically demand extensive datasets, yet acquiring paired clean and degraded underwater images poses significant challenges. Existing datasets have limited contributions to image enhancement due to lack of physics models, publicity, and ground-truth images. PHISWID addresses these issues by offering a set of paired ground-truth (atmospheric) and underwater images synthetically degraded by color degradation and marine snow artifacts. Generating underwater images from atmospheric RGB-D images based on physical models provides pairs of real-world ground-truth and degraded images. Our synthetic approach generates a large quantity of the pairs, enabling effective training of deep neural networks and objective image quality assessment. Through benchmark experiment with some datasets and image enhance methods, we validate that our dataset can improve the image enhancement performance. Our dataset, which is publicly available, contributes to the development in underwater image processing. |
None | |
Leveraging Content and Context Cues for Low-Light Image Enhancement | 2024-12-10 | ShowLow-light conditions have an adverse impact on machine cognition, limiting the performance of computer vision systems in real life. Since low-light data is limited and difficult to annotate, we focus on image processing to enhance low-light images and improve the performance of any downstream task model, instead of fine-tuning each of the models which can be prohibitively expensive. We propose to improve the existing zero-reference low-light enhancement by leveraging the CLIP model to capture image prior and for semantic guidance. Specifically, we propose a data augmentation strategy to learn an image prior via prompt learning, based on image sampling, to learn the image prior without any need for paired or unpaired normal-light data. Next, we propose a semantic guidance strategy that maximally takes advantage of existing low-light annotation by introducing both content and context cues about the image training patches. We experimentally show, in a qualitative study, that the proposed prior and semantic guidance help to improve the overall image contrast and hue, as well as improve background-foreground discrimination, resulting in reduced over-saturation and noise over-amplification, common in related zero-reference methods. As we target machine cognition, rather than rely on assuming the correlation between human perception and downstream task performance, we conduct and present an ablation study and comparison with related zero-reference methods in terms of task-based performance across many low-light datasets, including image classification, object and face detection, showing the effectiveness of our proposed method. |
Accep...Accepted to the IEEE Transactions on Multimedia |
None |
Analytical-Heuristic Modeling and Optimization for Low-Light Image Enhancement | 2024-12-10 | ShowLow-light image enhancement remains an open problem, and the new wave of artificial intelligence is at the center of this problem. This work describes the use of genetic algorithms for optimizing analytical models that can improve the visualization of images with poor light. Genetic algorithms are part of metaheuristic approaches, which proved helpful in solving challenging optimization tasks. We propose two analytical methods combined with optimization reasoning to approach a solution to the physical and computational aspects of transforming dark images into visible ones. The experiments demonstrate that the proposed approach ranks at the top among 26 state-of-the-art algorithms in the LOL benchmark. The results show evidence that a simple genetic algorithm combined with analytical reasoning can defeat the current mainstream in a challenging computer vision task through controlled experiments and objective comparisons. This work opens interesting new research avenues for the swarm and evolutionary computation community and others interested in analytical and heuristic reasoning. |
26 pa...26 pages, 6 figures, 6 tables, 34 references |
None |
Modeling Dual-Exposure Quad-Bayer Patterns for Joint Denoising and Deblurring | 2024-12-10 | ShowImage degradation caused by noise and blur remains a persistent challenge in imaging systems, stemming from limitations in both hardware and methodology. Single-image solutions face an inherent tradeoff between noise reduction and motion blur. While short exposures can capture clear motion, they suffer from noise amplification. Long exposures reduce noise but introduce blur. Learning-based single-image enhancers tend to be over-smooth due to the limited information. Multi-image solutions using burst mode avoid this tradeoff by capturing more spatial-temporal information but often struggle with misalignment from camera/scene motion. To address these limitations, we propose a physical-model-based image restoration approach leveraging a novel dual-exposure Quad-Bayer pattern sensor. By capturing pairs of short and long exposures at the same starting point but with varying durations, this method integrates complementary noise-blur information within a single image. We further introduce a Quad-Bayer synthesis method (B2QB) to simulate sensor data from Bayer patterns to facilitate training. Based on this dual-exposure sensor model, we design a hierarchical convolutional neural network called QRNet to recover high-quality RGB images. The network incorporates input enhancement blocks and multi-level feature extraction to improve restoration quality. Experiments demonstrate superior performance over state-of-the-art deblurring and denoising methods on both synthetic and real-world datasets. The code, model, and datasets are publicly available at https://github.com/zhaoyuzhi/QRNet. |
accep...accepted by IEEE Transactions on Image Processing (TIP) |
Code Link |
EchoSim4D: A Proof-of-Concept Gamified XR Echocardiography Training Simulator for Neonates using 4D Ultrasound Volume | 2024-12-09 | ShowNeonatal echocardiography is vital for early detection of heart anomalies in newborns, enabling timely, non-invasive interventions where 4D ultrasound, adds the dimension of time to 3D imaging, enhances diagnostic capabilities by visualizing real-time heart dynamics. However, training for 4D neonatal echocardiography is limited by the lack of simulators that support 4D Ultrasound volume visualization within gamified environments. This paper introduces EchoSim4D, an XR-based simulator leveraging novel pipeline for visualizing 4D volume data in Unity, incorporating real-time volume reconstruction, and a preloaded version optimized for low-end systems. EchoSim4D integrates a sensor-equipped manikin and a custom 3D-printed transducer with a 6-DOF sensor, replicating the precise probe maneuvers necessary for neonatal echocardiography. In a validation study with postgraduate medical students (0-5 years of experience), supervised by a domain expert, EchoSim4D demonstrated high visual fidelity and training efficacy. Findings suggest that 4D visualization techniques hold significant potential for advancing medical training in neonatal echocardiography. |
12 pa...12 pages, 10 figures, work in progress |
None |
PUGAN: Physical Model-Guided Underwater Image Enhancement Using GAN with Dual-Discriminators | 2024-12-07 | ShowDue to the light absorption and scattering induced by the water medium, underwater images usually suffer from some degradation problems, such as low contrast, color distortion, and blurring details, which aggravate the difficulty of downstream underwater understanding tasks. Therefore, how to obtain clear and visually pleasant images has become a common concern of people, and the task of underwater image enhancement (UIE) has also emerged as the times require. Among existing UIE methods, Generative Adversarial Networks (GANs) based methods perform well in visual aesthetics, while the physical model-based methods have better scene adaptability. Inheriting the advantages of the above two types of models, we propose a physical model-guided GAN model for UIE in this paper, referred to as PUGAN. The entire network is under the GAN architecture. On the one hand, we design a Parameters Estimation subnetwork (Par-subnet) to learn the parameters for physical model inversion, and use the generated color enhancement image as auxiliary information for the Two-Stream Interaction Enhancement sub-network (TSIE-subnet). Meanwhile, we design a Degradation Quantization (DQ) module in TSIE-subnet to quantize scene degradation, thereby achieving reinforcing enhancement of key regions. On the other hand, we design the Dual-Discriminators for the style-content adversarial constraint, promoting the authenticity and visual aesthetics of the results. Extensive experiments on three benchmark datasets demonstrate that our PUGAN outperforms state-of-the-art methods in both qualitative and quantitative metrics. |
8 pag...8 pages, 4 figures, Accepted by IEEE Transactions on Image Processing 2023 |
None |
Noise Self-Regression: A New Learning Paradigm to Enhance Low-Light Images Without Task-Related Data | 2024-12-06 | ShowDeep learning-based low-light image enhancement (LLIE) is a task of leveraging deep neural networks to enhance the image illumination while keeping the image content unchanged. From the perspective of training data, existing methods complete the LLIE task driven by one of the following three data types: paired data, unpaired data and zero-reference data. Each type of these data-driven methods has its own advantages, e.g., zero-reference data-based methods have very low requirements on training data and can meet the human needs in many scenarios. In this paper, we leverage pure Gaussian noise to complete the LLIE task, which further reduces the requirements for training data in LLIE tasks and can be used as another alternative in practical use. Specifically, we propose Noise SElf-Regression (NoiSER) without access to any task-related data, simply learns a convolutional neural network equipped with an instance-normalization layer by taking a random noise image, |
None | |
Joint Image De-noising and Enhancement for Satellite-Based SAR | 2024-12-03 | ShowThe reconstructed images from the Synthetic Aperture Radar (SAR) data suffer from multiplicative noise as well as low contrast level. These two factors impact the quality of the SAR images significantly and prevent any attempt to extract valuable information from the processed data. The necessity for mitigating these effects in the field of SAR imaging is of high importance. Therefore, in this paper, we address the aforementioned issues and propose a technique to handle these shortcomings simultaneously. In fact, we combine the de-noising and contrast enhancement processes into a unified algorithm. The image enhancement is performed based on the Contrast Limited Adaptive Histogram Equalization (CLAHE) technique. The verification of the proposed algorithm is performed by experimental results based on the data that has been collected from the European Space Agency's ERS-2 satellite which operates in strip-map mode. |
None | |
Phaseformer: Phase-based Attention Mechanism for Underwater Image Restoration and Beyond | 2024-12-02 | ShowQuality degradation is observed in underwater images due to the effects of light refraction and absorption by water, leading to issues like color cast, haziness, and limited visibility. This degradation negatively affects the performance of autonomous underwater vehicles used in marine applications. To address these challenges, we propose a lightweight phase-based transformer network with 1.77M parameters for underwater image restoration (UIR). Our approach focuses on effectively extracting non-contaminated features using a phase-based self-attention mechanism. We also introduce an optimized phase attention block to restore structural information by propagating prominent attentive features from the input. We evaluate our method on both synthetic (UIEB, UFO-120) and real-world (UIEB, U45, UCCS, SQUID) underwater image datasets. Additionally, we demonstrate its effectiveness for low-light image enhancement using the LOL dataset. Through extensive ablation studies and comparative analysis, it is clear that the proposed approach outperforms existing state-of-the-art (SOTA) methods. |
8 pag...8 pages, 8 figures, conference |
None |
LUIEO: A Lightweight Model for Integrating Underwater Image Enhancement and Object Detection | 2024-12-01 | ShowUnderwater optical images inevitably suffer from various degradation factors such as blurring, low contrast, and color distortion, which hinder the accuracy of object detection tasks. Due to the lack of paired underwater/clean images, most research methods adopt a strategy of first enhancing and then detecting, resulting in a lack of feature communication between the two learning tasks. On the other hand, due to the contradiction between the diverse degradation factors of underwater images and the limited number of samples, existing underwater enhancement methods are difficult to effectively enhance degraded images of unknown water bodies, thereby limiting the improvement of object detection accuracy. Therefore, most underwater target detection results are still displayed on degraded images, making it difficult to visually judge the correctness of the detection results. To address the above issues, this paper proposes a multi-task learning method that simultaneously enhances underwater images and improves detection accuracy. Compared with single-task learning, the integrated model allows for the dynamic adjustment of information communication and sharing between different tasks. Due to the fact that real underwater images can only provide annotated object labels, this paper introduces physical constraints to ensure that object detection tasks do not interfere with image enhancement tasks. Therefore, this article introduces a physical module to decompose underwater images into clean images, background light, and transmission images and uses a physical model to calculate underwater images for self-supervision. Numerical experiments demonstrate that the proposed model achieves satisfactory results in visual performance, object detection accuracy, and detection efficiency compared to state-of-the-art comparative methods. |
None | |
BiCo-Fusion: Bidirectional Complementary LiDAR-Camera Fusion for Semantic- and Spatial-Aware 3D Object Detection | 2024-12-01 | Show3D object detection is an important task that has been widely applied in autonomous driving. To perform this task, a new trend is to fuse multi-modal inputs, i.e., LiDAR and camera. Under such a trend, recent methods fuse these two modalities by unifying them in the same 3D space. However, during direct fusion in a unified space, the drawbacks of both modalities (LiDAR features struggle with detailed semantic information and the camera lacks accurate 3D spatial information) are also preserved, diluting semantic and spatial awareness of the final unified representation. To address the issue, this letter proposes a novel bidirectional complementary LiDAR-camera fusion framework, called BiCo-Fusion that can achieve robust semantic- and spatial-aware 3D object detection. The key insight is to fuse LiDAR and camera features in a bidirectional complementary way to enhance the semantic awareness of the LiDAR and the 3D spatial awareness of the camera. The enhanced features from both modalities are then adaptively fused to build a semantic- and spatial-aware unified representation. Specifically, we introduce Pre-Fusion consisting of a Voxel Enhancement Module (VEM) to enhance the semantic awareness of voxel features from 2D camera features and Image Enhancement Module (IEM) to enhance the 3D spatial awareness of camera features from 3D voxel features. We then introduce Unified Fusion (U-Fusion) to adaptively fuse the enhanced features from the last stage to build a unified representation. Extensive experiments demonstrate the superiority of our BiCo-Fusion against the prior arts. Project page: https://t-ys.github.io/BiCo-Fusion/. |
8 pages, 5 figures | Code Link |
DMFourLLIE: Dual-Stage and Multi-Branch Fourier Network for Low-Light Image Enhancement | 2024-12-01 | ShowIn the Fourier frequency domain, luminance information is primarily encoded in the amplitude component, while spatial structure information is significantly contained within the phase component. Existing low-light image enhancement techniques using Fourier transform have mainly focused on amplifying the amplitude component and simply replicating the phase component, an approach that often leads to color distortions and noise issues. In this paper, we propose a Dual-Stage Multi-Branch Fourier Low-Light Image Enhancement (DMFourLLIE) framework to address these limitations by emphasizing the phase component's role in preserving image structure and detail. The first stage integrates structural information from infrared images to enhance the phase component and employs a luminance-attention mechanism in the luminance-chrominance color space to precisely control amplitude enhancement. The second stage combines multi-scale and Fourier convolutional branches for robust image reconstruction, effectively recovering spatial structures and textures. This dual-branch joint optimization process ensures that complex image information is retained, overcoming the limitations of previous methods that neglected the interplay between amplitude and phase. Extensive experiments across multiple datasets demonstrate that DMFourLLIE outperforms current state-of-the-art methods in low-light image enhancement. Our code is available at https://github.com/bywlzts/DMFourLLIE. |
Accep...Accepted to ACM Multimedia 2024 |
Code Link |
MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning | 2024-12-01 | ShowHarnessing low-light enhancement and domain adaptation, nighttime UAV tracking has made substantial strides. However, over-reliance on image enhancement, scarcity of high-quality nighttime data, and neglecting the relationship between daytime and nighttime trackers, which hinders the development of an end-to-end trainable framework. Moreover, current CNN-based trackers have limited receptive fields, leading to suboptimal performance, while ViT-based trackers demand heavy computational resources due to their reliance on the self-attention mechanism. In this paper, we propose a novel pure Mamba-based tracking framework (\textbf{MambaNUT}) that employs a state space model with linear complexity as its backbone, incorporating a single-stream architecture that integrates feature learning and template-search coupling within Vision Mamba. We introduce an adaptive curriculum learning (ACL) approach that dynamically adjusts sampling strategies and loss weights, thereby improving the model's ability of generalization. Our ACL is composed of two levels of curriculum schedulers: (1) sampling scheduler that transforms the data distribution from imbalanced to balanced, as well as from easier (daytime) to harder (nighttime) samples; (2) loss scheduler that dynamically assigns weights based on data frequency and the IOU. Exhaustive experiments on multiple nighttime UAV tracking benchmarks demonstrate that the proposed MambaNUT achieves state-of-the-art performance while requiring lower computational costs. The code will be available. |
None | |
HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning | 2024-11-29 | ShowUnderwater images are often affected by light refraction and absorption, reducing visibility and interfering with subsequent applications. Existing underwater image enhancement methods primarily focus on improving visual quality while overlooking practical implications. To strike a balance between visual quality and application, we propose a heuristic invertible network for underwater perception enhancement, dubbed HUPE, which enhances visual quality and demonstrates flexibility in handling other downstream tasks. Specifically, we introduced an information-preserving reversible transformation with embedded Fourier transform to establish a bidirectional mapping between underwater images and their clear images. Additionally, a heuristic prior is incorporated into the enhancement process to better capture scene information. To further bridge the feature gap between vision-based enhancement images and application-oriented images, a semantic collaborative learning module is applied in the joint optimization process of the visual enhancement task and the downstream task, which guides the proposed enhancement model to extract more task-oriented semantic features while obtaining visually pleasing images. Extensive experiments, both quantitative and qualitative, demonstrate the superiority of our HUPE over state-of-the-art methods. The source code is available at https://github.com/ZengxiZhang/HUPE. |
22 pages, 21 figures | Code Link |
Self-Supervised Denoiser Framework | 2024-11-29 | ShowReconstructing images using Computed Tomography (CT) in an industrial context leads to specific challenges that differ from those encountered in other areas, such as clinical CT. Indeed, non-destructive testing with industrial CT will often involve scanning multiple similar objects while maintaining high throughput, requiring short scanning times, which is not a relevant concern in clinical CT. Under-sampling the tomographic data (sinograms) is a natural way to reduce the scanning time at the cost of image quality since the latter depends on the number of measurements. In such a scenario, post-processing techniques are required to compensate for the image artifacts induced by the sinogram sparsity. We introduce the Self-supervised Denoiser Framework (SDF), a self-supervised training method that leverages pre-training on highly sampled sinogram data to enhance the quality of images reconstructed from undersampled sinogram data. The main contribution of SDF is that it proposes to train an image denoiser in the sinogram space by setting the learning task as the prediction of one sinogram subset from another. As such, it does not require ground-truth image data, leverages the abundant data modality in CT, the sinogram, and can drastically enhance the quality of images reconstructed from a fraction of the measurements. We demonstrate that SDF produces better image quality, in terms of peak signal-to-noise ratio, than other analytical and self-supervised frameworks in both 2D fan-beam or 3D cone-beam CT settings. Moreover, we show that the enhancement provided by SDF carries over when fine-tuning the image denoiser on a few examples, making it a suitable pre-training technique in a context where there is little high-quality image data. Our results are established on experimental datasets, making SDF a strong candidate for being the building block of foundational image-enhancement models in CT. |
None | |
Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination | 2024-11-28 | ShowThe recently introduced Consistency models pose an efficient alternative to diffusion algorithms, enabling rapid and good quality image synthesis. These methods overcome the slowness of diffusion models by directly mapping noise to data, while maintaining a (relatively) simpler training. Consistency models enable a fast one- or few-step generation, but they typically fall somewhat short in sample quality when compared to their diffusion origins. In this work we propose a novel and highly effective technique for post-processing Consistency-based generated images, enhancing their perceptual quality. Our approach utilizes a joint classifier-discriminator model, in which both portions are trained adversarially. While the classifier aims to grade an image based on its assignment to a designated class, the discriminator portion of the very same network leverages the softmax values to assess the proximity of the input image to the targeted data manifold, thereby serving as an Energy-based Model. By employing example-specific projected gradient iterations under the guidance of this joint machine, we refine synthesized images and achieve an improved FID scores on the ImageNet 64x64 dataset for both Consistency-Training and Consistency-Distillation techniques. |
None | |
Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography | 2024-11-28 | ShowThe integration of miniaturized spectrometers into mobile devices offers new avenues for image quality enhancement and facilitates novel downstream tasks. However, the broader application of spectral sensors in mobile photography is hindered by the inherent complexity of spectral images and the constraints of spectral imaging capabilities. To overcome these challenges, we propose a joint RGB-Spectral decomposition model guided enhancement framework, which consists of two steps: joint decomposition and prior-guided enhancement. Firstly, we leverage the complementarity between RGB and Low-resolution Multi-Spectral Images (Lr-MSI) to predict shading, reflectance, and material semantic priors. Subsequently, these priors are seamlessly integrated into the established HDRNet to promote dynamic range enhancement, color mapping, and grid expert learning, respectively. Additionally, we construct a high-quality Mobile-Spec dataset to support our research, and our experiments validate the effectiveness of Lr-MSI in the tone enhancement task. This work aims to establish a solid foundation for advancing spectral vision in mobile photography. The code is available at \url{https://github.com/CalayZhou/JDM-HDRNet}. |
Code Link | |
Evaluating the Impact of Underwater Image Enhancement on Object Detection Performance: A Comprehensive Study | 2024-11-26 | ShowUnderwater imagery often suffers from severe degradation that results in low visual quality and object detection performance. This work aims to evaluate state-of-the-art image enhancement models, investigate their impact on underwater object detection, and explore their potential to improve detection performance. To this end, we selected representative underwater image enhancement models covering major enhancement categories and applied them separately to two recent datasets: 1) the Real-World Underwater Object Detection Dataset (RUOD), and 2) the Challenging Underwater Plant Detection Dataset (CUPDD). Following this, we conducted qualitative and quantitative analyses on the enhanced images and developed a quality index (Q-index) to compare the quality distribution of the original and enhanced images. Subsequently, we compared the performance of several YOLO-NAS detection models that are separately trained and tested on the original and enhanced image sets. Then, we performed a correlation study to examine the relationship between enhancement metrics and detection performance. We also analyzed the inference results from the trained detectors presenting cases where enhancement increased the detection performance as well as cases where enhancement revealed missed objects by human annotators. This study suggests that although enhancement generally deteriorates the detection performance, it can still be harnessed in some cases for increased detection performance and more accurate human annotation. |
None | |
Semi-supervised Underwater Image Enhancement Using A Physics-Aware Triple-Stream Network | 2024-11-25 | ShowUnderwater images normally suffer from degradation due to the transmission medium of water bodies. Both traditional prior-based approaches and deep learning-based methods have been used to address this problem. However, the inflexible assumption of the former often impairs their effectiveness in handling diverse underwater scenes, while the generalization of the latter to unseen images is usually weakened by insufficient data. In this study, we leverage both the physics-based Image Formation Model (IFM) and deep learning techniques for Underwater Image Enhancement (UIE). To this end, we propose a novel Physics-Aware Triple-Stream Underwater Image Enhancement Network, i.e., PATS-UIENet, which comprises a Direct Signal Transmission Estimation Steam (D-Stream), a Backscatter Signal Transmission Estimation Steam (B-Stream) and an Ambient Light Estimation Stream (A-Stream). This network fulfills the UIE task by explicitly estimating the degradation parameters of a revised IFM. We also adopt an IFM-inspired semi-supervised learning framework, which exploits both the labeled and unlabeled images, to address the issue of insufficient data. To our knowledge, such a physics-aware deep network and the IFM-inspired semi-supervised learning framework have not been used for the UIE task before. Our method performs better than, or at least comparably to, sixteen baselines across six testing sets in the degradation estimation and UIE tasks. These promising results should be due to the fact that the proposed method can not only model the degradation but also learn the characteristics of diverse underwater scenes. |
13 pages, 10 figures | None |
Learn2Synth: Learning Optimal Data Synthesis Using Hypergradients | 2024-11-23 | ShowDomain randomization through synthesis is a powerful strategy to train networks that are unbiased with respect to the domain of the input images. Randomization allows networks to see a virtually infinite range of intensities and artifacts during training, thereby minimizing overfitting to appearance and maximizing generalization to unseen data. While powerful, this approach relies on the accurate tuning of a large set of hyper-parameters governing the probabilistic distribution of the synthesized images. Instead of manually tuning these parameters, we introduce Learn2Synth, a novel procedure in which synthesis parameters are learned using a small set of real labeled data. Unlike methods that impose constraints to align synthetic data with real data (e.g., contrastive or adversarial techniques), which risk misaligning the image and its label map, we tune an augmentation engine such that a segmentation network trained on synthetic data has optimal accuracy when applied to real data. This approach allows the training procedure to benefit from real labeled examples, without ever using these real examples to train the segmentation network, which avoids biasing the network towards the properties of the training set. Specifically, we develop both parametric and nonparametric strategies to augment the synthetic images, enhancing the segmentation network's performance. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of this learning strategy. Code is available at: https://github.com/HuXiaoling/Learn2Synth. |
14 pages, 5 figures | Code Link |
CE-VAE: Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement | 2024-11-22 | ShowUnmanned underwater image analysis for marine monitoring faces two key challenges: (i) degraded image quality due to light attenuation and (ii) hardware storage constraints limiting high-resolution image collection. Existing methods primarily address image enhancement with approaches that hinge on storing the full-size input. In contrast, we introduce the Capsule Enhanced Variational AutoEncoder (CE-VAE), a novel architecture designed to efficiently compress and enhance degraded underwater images. Our attention-aware image encoder can project the input image onto a latent space representation while being able to run online on a remote device. The only information that needs to be stored on the device or sent to a beacon is a compressed representation. There is a dual-decoder module that performs offline, full-size enhanced image generation. One branch reconstructs spatial details from the compressed latent space, while the second branch utilizes a capsule-clustering layer to capture entity-level structures and complex spatial relationships. This parallel decoding strategy enables the model to balance fine-detail preservation with context-aware enhancements. CE-VAE achieves state-of-the-art performance in underwater image enhancement on six benchmark datasets, providing up to 3x higher compression efficiency than existing approaches. Code available at \url{https://github.com/iN1k1/ce-vae-underwater-image-enhancement}. |
Accep...Accepted for publication at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) |
Code Link |
Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion | 2024-11-21 | ShowDue to the singularity of real-world paired datasets and the complexity of low-light environments, this leads to supervised methods lacking a degree of scene generalisation. Meanwhile, limited by poor lighting and content guidance, existing zero-shot methods cannot handle unknown severe degradation well. To address this problem, we will propose a new zero-shot low-light enhancement method to compensate for the lack of light and structural information in the diffusion sampling process by effectively combining the wavelet and Fourier frequency domains to construct rich a priori information. The key to the inspiration comes from the similarity between the wavelet and Fourier frequency domains: both light and structure information are closely related to specific frequency domain regions, respectively. Therefore, by transferring the diffusion process to the wavelet low-frequency domain and combining the wavelet and Fourier frequency domains by continuously decomposing them in the inverse process, the constructed rich illumination prior is utilised to guide the image generation enhancement process. Sufficient experiments show that the framework is robust and effective in various scenarios. The code will be available at: \href{https://github.com/hejh8/Joint-Wavelet-and-Fourier-priors-guided-diffusion}{https://github.com/hejh8/Joint-Wavelet-and-Fourier-priors-guided-diffusion}. |
Code Link | |
Efficient Diffusion as Low Light Enhancer | 2024-11-21 | ShowThe computational burden of the iterative sampling process remains a major challenge in diffusion-based Low-Light Image Enhancement (LLIE). Current acceleration methods, whether training-based or training-free, often lead to significant performance degradation, highlighting the trade-off between performance and efficiency. In this paper, we identify two primary factors contributing to performance degradation: fitting errors and the inference gap. Our key insight is that fitting errors can be mitigated by linearly extrapolating the incorrect score functions, while the inference gap can be reduced by shifting the Gaussian flow to a reflectance-aware residual space. Based on the above insights, we design Reflectance-Aware Trajectory Refinement (RATR) module, a simple yet effective module to refine the teacher trajectory using the reflectance component of images. Following this, we introduce \textbf{Re}flectance-aware \textbf{D}iffusion with \textbf{Di}stilled \textbf{T}rajectory (\textbf{ReDDiT}), an efficient and flexible distillation framework tailored for LLIE. Our framework achieves comparable performance to previous diffusion-based methods with redundant steps in just 2 steps while establishing new state-of-the-art (SOTA) results with 8 or 4 steps. Comprehensive experimental evaluations on 10 benchmark datasets validate the effectiveness of our method, consistently outperforming existing SOTA methods. |
8 pages | None |
Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution | 2024-11-19 | ShowImage super-resolution (SR) is a classical yet still active low-level vision problem that aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts, serving as a key technique for image enhancement. Current approaches to address SR tasks, such as transformer-based and diffusion-based methods, are either dedicated to extracting RGB image features or assuming similar degradation patterns, neglecting the inherent modal disparities between infrared and visible images. When directly applied to infrared image SR tasks, these methods inevitably distort the infrared spectral distribution, compromising the machine perception in downstream tasks. In this work, we emphasize the infrared spectral distribution fidelity and propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity. Our approach captures high-pass subbands from multi-scale and multi-directional infrared spectral decomposition to recover infrared-degraded information through a gate architecture. The proposed Spectral Fidelity Loss regularizes the spectral frequency distribution during reconstruction, which ensures the preservation of both high- and low-frequency components and maintains the fidelity of infrared-specific features. We propose a two-stage prompt-learning optimization to guide the model in learning infrared HR characteristics from LR degradation. Extensive experiments demonstrate that our approach outperforms existing image SR models in both visual and perceptual tasks while notably enhancing machine perception in downstream tasks. Our code is available at https://github.com/hey-it-s-me/CoRPLE. |
13 figures, 6 tables | Code Link |
Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method | 2024-11-17 | ShowWe explore the oscillatory behavior observed in inversion methods applied to large-scale text-to-image diffusion models, with a focus on the "Flux" model. By employing a fixed-point-inspired iterative approach to invert real-world images, we observe that the solution does not achieve convergence, instead oscillating between distinct clusters. Through both toy experiments and real-world diffusion models, we demonstrate that these oscillating clusters exhibit notable semantic coherence. We offer theoretical insights, showing that this behavior arises from oscillatory dynamics in rectified flow models. Building on this understanding, we introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing. Furthermore, we provide quantitative results demonstrating the effectiveness of our method for tasks such as image enhancement, makeup transfer, reconstruction quality, and guided sampling quality. Higher-quality examples of videos and images are available at \href{https://yanyanzheng96.github.io/oscillation_inversion/}{this link}. |
Code Link | |
Underwater Image Enhancement with Cascaded Contrastive Learning | 2024-11-16 | ShowUnderwater image enhancement (UIE) is a highly challenging task due to the complexity of underwater environment and the diversity of underwater image degradation. Due to the application of deep learning, current UIE methods have made significant progress. Most of the existing deep learning-based UIE methods follow a single-stage network which cannot effectively address the diverse degradations simultaneously. In this paper, we propose to address this issue by designing a two-stage deep learning framework and taking advantage of cascaded contrastive learning to guide the network training of each stage. The proposed method is called CCL-Net in short. Specifically, the proposed CCL-Net involves two cascaded stages, i.e., a color correction stage tailored to the color deviation issue and a haze removal stage tailored to improve the visibility and contrast of underwater images. To guarantee the underwater image can be progressively enhanced, we also apply contrastive loss as an additional constraint to guide the training of each stage. In the first stage, the raw underwater images are used as negative samples for building the first contrastive loss, ensuring the enhanced results of the first color correction stage are better than the original inputs. While in the second stage, the enhanced results rather than the raw underwater images of the first color correction stage are used as the negative samples for building the second contrastive loss, thus ensuring the final enhanced results of the second haze removal stage are better than the intermediate color corrected results. Extensive experiments on multiple benchmark datasets demonstrate that our CCL-Net can achieve superior performance compared to many state-of-the-art methods. The source code of CCL-Net will be released at https://github.com/lewis081/CCL-Net. |
Accep...Accepted by IEEE Transacitons on MultiMedia |
Code Link |
Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration | 2024-11-10 | ShowWith the popularization of high-end mobile devices, Ultra-high-definition (UHD) images have become ubiquitous in our lives. The restoration of UHD images is a highly challenging problem due to the exaggerated pixel count, which often leads to memory overflow during processing. Existing methods either downsample UHD images at a high rate before processing or split them into multiple patches for separate processing. However, high-rate downsampling leads to significant information loss, while patch-based approaches inevitably introduce boundary artifacts. In this paper, we propose a novel design paradigm to solve the UHD image restoration problem, called D2Net. D2Net enables direct full-resolution inference on UHD images without the need for high-rate downsampling or dividing the images into several patches. Specifically, we ingeniously utilize the characteristics of the frequency domain to establish long-range dependencies of features. Taking into account the richer local patterns in UHD images, we also design a multi-scale convolutional group to capture local features. Additionally, during the decoding stage, we dynamically incorporate features from the encoding stage to reduce the flow of irrelevant information. Extensive experiments on three UHD image restoration tasks, including low-light image enhancement, image dehazing, and image deblurring, show that our model achieves better quantitative and qualitative results than state-of-the-art methods. |
WACV2025 | None |
Real-world Instance-specific Image Goal Navigation: Bridging Domain Gaps via Contrastive Learning | 2024-11-03 | ShowImproving instance-specific image goal navigation (InstanceImageNav), which locates the identical object in a real-world environment from a query image, is essential for robotic systems to assist users in finding desired objects. The challenge lies in the domain gap between low-quality images observed by the moving robot, characterized by motion blur and low-resolution, and high-quality query images provided by the user. Such domain gaps could significantly reduce the task success rate but have not been the focus of previous work. To address this, we propose a novel method called Few-shot Cross-quality Instance-aware Adaptation (CrossIA), which employs contrastive learning with an instance classifier to align features between massive low- and few high-quality images. This approach effectively reduces the domain gap by bringing the latent representations of cross-quality images closer on an instance basis. Additionally, the system integrates an object image collection with a pre-trained deblurring model to enhance the observed image quality. Our method fine-tunes the SimSiam model, pre-trained on ImageNet, using CrossIA. We evaluated our method's effectiveness through an InstanceImageNav task with 20 different types of instances, where the robot identifies the same instance in a real-world environment as a high-quality query image. Our experiments showed that our method improves the task success rate by up to three times compared to the baseline, a conventional approach based on SuperGlue. These findings highlight the potential of leveraging contrastive learning and image enhancement techniques to bridge the domain gap and improve object localization in robotic applications. The project website is https://emergentsystemlabstudent.github.io/DomainBridgingNav/. |
See w...See website at https://emergentsystemlabstudent.github.io/DomainBridgingNav/. Accepted to IEEE IRC2024 |
Code Link |
TPOT: Topology Preserving Optimal Transport in Retinal Fundus Image Enhancement | 2024-11-03 | ShowRetinal fundus photography enhancement is important for diagnosing and monitoring retinal diseases. However, early approaches to retinal image enhancement, such as those based on Generative Adversarial Networks (GANs), often struggle to preserve the complex topological information of blood vessels, resulting in spurious or missing vessel structures. The persistence diagram, which captures topological features based on the persistence of topological structures under different filtrations, provides a promising way to represent the structure information. In this work, we propose a topology-preserving training paradigm that regularizes blood vessel structures by minimizing the differences of persistence diagrams. We call the resulting framework Topology Preserving Optimal Transport (TPOT). Experimental results on a large-scale dataset demonstrate the superiority of the proposed method compared to several state-of-the-art supervised and unsupervised techniques, both in terms of image quality and performance in the downstream blood vessel segmentation task. The code is available at https://github.com/Retinal-Research/TPOT. |
Code Link | |
Medical X-Ray Image Enhancement Using Global Contrast-Limited Adaptive Histogram Equalization | 2024-11-02 | ShowIn medical imaging, accurate diagnosis heavily relies on effective image enhancement techniques, particularly for X-ray images. Existing methods often suffer from various challenges such as sacrificing global image characteristics over local image characteristics or vice versa. In this paper, we present a novel approach, called G-CLAHE (Global-Contrast Limited Adaptive Histogram Equalization), which perfectly suits medical imaging with a focus on X-rays. This method adapts from Global Histogram Equalization (GHE) and Contrast Limited Adaptive Histogram Equalization (CLAHE) to take both advantages and avoid weakness to preserve local and global characteristics. Experimental results show that it can significantly improve current state-of-the-art algorithms to effectively address their limitations and enhance the contrast and quality of X-ray images for diagnostic accuracy. |
None | |
DPEC: Dual-Path Error Compensation Method for Enhanced Low-Light Image Clarity | 2024-11-01 | ShowFor the task of low-light image enhancement, deep learning-based algorithms have demonstrated superiority and effectiveness compared to traditional methods. However, these methods, primarily based on Retinex theory, tend to overlook the noise and color distortions in input images, leading to significant noise amplification and local color distortions in enhanced results. To address these issues, we propose the Dual-Path Error Compensation (DPEC) method, designed to improve image quality under low-light conditions by preserving local texture details while restoring global image brightness without amplifying noise. DPEC incorporates precise pixel-level error estimation to capture subtle differences and an independent denoising mechanism to prevent noise amplification. We introduce the HIS-Retinex loss to guide DPEC's training, ensuring the brightness distribution of enhanced images closely aligns with real-world conditions. To balance computational speed and resource efficiency while training DPEC for a comprehensive understanding of the global context, we integrated the VMamba architecture into its backbone. Comprehensive quantitative and qualitative experimental results demonstrate that our algorithm significantly outperforms state-of-the-art methods in low-light image enhancement. The code is publicly available online at https://github.com/wangshuang233/DPEC. |
Code Link | |
Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation | 2024-10-31 | ShowSurgical scene segmentation is essential for enhancing surgical precision, yet it is frequently compromised by the scarcity and imbalance of available data. To address these challenges, semantic image synthesis methods based on generative adversarial networks and diffusion models have been developed. However, these models often yield non-diverse images and fail to capture small, critical tissue classes, limiting their effectiveness. In response, we propose the Class-Aware Semantic Diffusion Model (CASDM), a novel approach which utilizes segmentation maps as conditions for image synthesis to tackle data scarcity and imbalance. Novel class-aware mean squared error and class-aware self-perceptual loss functions have been defined to prioritize critical, less visible classes, thereby enhancing image quality and relevance. Furthermore, to our knowledge, we are the first to generate multi-class segmentation maps using text prompts in a novel fashion to specify their contents. These maps are then used by CASDM to generate surgical scene images, enhancing datasets for training and validating segmentation models. Our evaluation, which assesses both image quality and downstream segmentation performance, demonstrates the strong effectiveness and generalisability of CASDM in producing realistic image-map pairs, significantly advancing surgical scene segmentation across diverse and challenging datasets. |
None | |
Analyzing Noise Models and Advanced Filtering Algorithms for Image Enhancement | 2024-10-30 | ShowNoise, an unwanted component in an image, can be the reason for the degradation of Image at the time of transmission or capturing. Noise reduction from images is still a challenging task. Digital Image Processing is a component of Digital signal processing. A wide variety of algorithms can be used in image processing to apply to an image or an input dataset and obtain important outcomes. In image processing research, removing noise from images before further analysis is essential. Post-noise removal of images improves clarity, enabling better interpretation and analysis across medical imaging, satellite imagery, and radar applications. While numerous algorithms exist, each comes with its own assumptions, strengths, and limitations. The paper aims to evaluate the effectiveness of different filtering techniques on images with eight types of noise. It evaluates methodologies like Wiener, Median, Gaussian, Mean, Low pass, High pass, Laplacian and bilateral filtering, using the performance metric Peak signal to noise ratio. It shows us the impact of different filters on noise models by applying a variety of filters to various kinds of noise. Additionally, it also assists us in determining which filtering strategy is most appropriate for a certain noise model based on the circumstances. |
None | |
Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement | 2024-10-27 | ShowFrequency information (e.g., Discrete Wavelet Transform and Fast Fourier Transform) has been widely applied to solve the issue of Low-Light Image Enhancement (LLIE). However, existing frequency-based models primarily operate in the simple wavelet or Fourier space of images, which lacks utilization of valid global and local information in each space. We found that wavelet frequency information is more sensitive to global brightness due to its low-frequency component while Fourier frequency information is more sensitive to local details due to its phase component. In order to achieve superior preliminary brightness enhancement by optimally integrating spatial channel information with low-frequency components in the wavelet transform, we introduce channel-wise Mamba, which compensates for the long-range dependencies of CNNs and has lower complexity compared to Diffusion and Transformer models. So in this work, we propose a novel Wavelet-based Mamba with Fourier Adjustment model called WalMaFa, consisting of a Wavelet-based Mamba Block (WMB) and a Fast Fourier Adjustment Block (FFAB). We employ an Encoder-Latent-Decoder structure to accomplish the end-to-end transformation. Specifically, WMB is adopted in the Encoder and Decoder to enhance global brightness while FFAB is adopted in the Latent to fine-tune local texture details and alleviate ambiguity. Extensive experiments demonstrate that our proposed WalMaFa achieves state-of-the-art performance with fewer computational resources and faster speed. Code is now available at: https://github.com/mcpaulgeorge/WalMaFa. |
18 pa...18 pages, 8 figures, ACCV2024 |
Code Link |
Deep Learning, Machine Learning -- Digital Signal and Image Processing: From Theory to Application | 2024-10-27 | ShowDigital Signal Processing (DSP) and Digital Image Processing (DIP) with Machine Learning (ML) and Deep Learning (DL) are popular research areas in Computer Vision and related fields. We highlight transformative applications in image enhancement, filtering techniques, and pattern recognition. By integrating frameworks like the Discrete Fourier Transform (DFT), Z-Transform, and Fourier Transform methods, we enable robust data manipulation and feature extraction essential for AI-driven tasks. Using Python, we implement algorithms that optimize real-time data processing, forming a foundation for scalable, high-performance solutions in computer vision. This work illustrates the potential of ML and DL to advance DSP and DIP methodologies, contributing to artificial intelligence, automated feature extraction, and applications across diverse domains. |
293 pages | None |
HUE Dataset: High-Resolution Event and Frame Sequences for Low-Light Vision | 2024-10-24 | ShowLow-light environments pose significant challenges for image enhancement methods. To address these challenges, in this work, we introduce the HUE dataset, a comprehensive collection of high-resolution event and frame sequences captured in diverse and challenging low-light conditions. Our dataset includes 106 sequences, encompassing indoor, cityscape, twilight, night, driving, and controlled scenarios, each carefully recorded to address various illumination levels and dynamic ranges. Utilizing a hybrid RGB and event camera setup. we collect a dataset that combines high-resolution event data with complementary frame data. We employ both qualitative and quantitative evaluations using no-reference metrics to assess state-of-the-art low-light enhancement and event-based image reconstruction methods. Additionally, we evaluate these methods on a downstream object detection task. Our findings reveal that while event-based methods perform well in specific metrics, they may produce false positives in practical applications. This dataset and our comprehensive analysis provide valuable insights for future research in low-light vision and hybrid camera systems. |
18 pa...18 pages, 4 figures. Has been accepted for publication at the European Conference on Computer Vision Workshops (ECCVW), Milano, 2024. The project page can be found at https://ercanburak.github.io/HUE.html |
Code Link |
Capsule Endoscopy Image Enhancement for Small Intestinal Villi Clarity | 2024-10-24 | ShowThis paper presents, for the first time, an image enhancement methodology designed to enhance the clarity of small intestinal villi in Wireless Capsule Endoscopy (WCE) images. This method first separates the low-frequency and high-frequency components of small intestinal villi images using guided filtering. Subsequently, an adaptive light gain factor is generated based on the low-frequency component, and an adaptive gradient gain factor is derived from the convolution results of the Laplacian operator in different regions of small intestinal villi images. The obtained light gain factor and gradient gain factor are then combined to enhance the high-frequency components. Finally, the enhanced high-frequency component is fused with the original image to achieve adaptive sharpening of the edges of WCE small intestinal villi images. The experiments affirm that, compared to established WCE image enhancement methods, our approach not only accentuates the edge details of WCE small intestine villi images but also skillfully suppresses noise amplification, thereby preventing the occurrence of edge overshooting. |
None | |
Dynamic Test-Time Augmentation via Differentiable Functions | 2024-10-22 | ShowDistribution shifts, which often occur in the real world, degrade the accuracy of deep learning systems, and thus improving robustness to distribution shifts is essential for practical applications. To improve robustness, we study an image enhancement method that generates recognition-friendly images without retraining the recognition model. We propose a novel image enhancement method, DynTTA, which is based on differentiable data augmentation techniques and generates a blended image from many augmented images to improve the recognition accuracy under distribution shifts. In addition to standard data augmentations, DynTTA also incorporates deep neural network-based image transformation, further improving the robustness. Because DynTTA is composed of differentiable functions, it can be directly trained with the classification loss of the recognition model. In experiments with widely used image recognition datasets using various classification models, DynTTA improves the robustness with almost no reduction in classification accuracy for clean images, thus outperforming the existing methods. Furthermore, the results show that robustness is significantly improved by estimating the training-time augmentations for distribution-shifted datasets using DynTTA and retraining the recognition model with the estimated augmentations. DynTTA is a promising approach for applications that require both clean accuracy and robustness. Our code is available at \url{https://github.com/s-enmt/DynTTA}. |
IEEE Access | Code Link |
SLLEN: Semantic-aware Low-light Image Enhancement Network | 2024-10-21 | ShowHow to effectively explore semantic feature is vital for low-light image enhancement (LLE). Existing methods usually utilize the semantic feature that is only drawn from the output produced by high-level semantic segmentation (SS) network. However, if the output is not accurately estimated, it would affect the high-level semantic feature (HSF) extraction, which accordingly interferes with LLE. To this end, we develop a simple and effective semantic-aware LLE network (SSLEN) composed of a LLE main-network (LLEmN) and a SS auxiliary-network (SSaN). In SLLEN, LLEmN integrates the random intermediate embedding feature (IEF), i.e., the information extracted from the intermediate layer of SSaN, together with the HSF into a unified framework for better LLE. SSaN is designed to act as a SS role to provide HSF and IEF. Moreover, thanks to a shared encoder between LLEmN and SSaN, we further propose an alternating training mechanism to facilitate the collaboration between them. Unlike currently available approaches, the proposed SLLEN is able to fully lever the semantic information, e.g., IEF, HSF, and SS dataset, to assist LLE, thereby leading to a more promising enhancement performance. Comparisons between the proposed SLLEN and other state-of-the-art techniques demonstrate the superiority of SLLEN with respect to LLE quality over all the comparable alternatives. |
None | |
An Improved Chicken Swarm Optimization Algorithm for Handwritten Document Image Enhancement | 2024-10-20 | ShowChicken swarm optimization is a new meta-heuristic algorithm which mimics the foraging hierarchical behavior of chicken. In this paper, we describe the preprocessing of handwritten document by contrast enhancement while preserving detail with an improved chicken swarm optimization algorithm.The results of the algorithm are compared with other existing meta heuristic algorithms like Cuckoo Search, Firefly Algorithm and the Artificial bee colony. The proposed algorithm considerably outperforms all the above by giving good results. |
6 pag...6 pages, 2 figures, conference |
None |
LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection | 2024-10-14 | ShowDue to the nature of enhancement--the absence of paired ground-truth information, high-level vision tasks have been recently employed to evaluate the performance of low-light image enhancement. A widely-used manner is to see how accurately an object detector trained on enhanced low-light images by different candidates can perform with respect to annotated semantic labels. In this paper, we first demonstrate that the mentioned approach is generally prone to overfitting, and thus diminishes its measurement reliability. In search of a proper evaluation metric, we propose LIME-Bench, the first online benchmark platform designed to collect human preferences for low-light enhancement, providing a valuable dataset for validating the correlation between human perception and automated evaluation metrics. We then customize LIME-Eval, a novel evaluation framework that utilizes detectors pre-trained on standard-lighting datasets without object annotations, to judge the quality of enhanced images. By adopting an energy-based strategy to assess the accuracy of output confidence maps, our LIME-Eval can simultaneously bypass biases associated with retraining detectors and circumvent the reliance on annotations for dim images. Comprehensive experiments are provided to reveal the effectiveness of our LIME-Eval. Our benchmark platform (https://huggingface.co/spaces/lime-j/eval) and code (https://github.com/lime-j/lime-eval) are available online. |
Code Link | |
LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond | 2024-10-13 | ShowLow-light image enhancement (LLIE) is essential for numerous computer vision tasks, including object detection, tracking, segmentation, and scene understanding. Despite substantial research on improving low-quality images captured in underexposed conditions, clear vision remains critical for autonomous vehicles, which often struggle with low-light scenarios, signifying the need for continuous research. However, paired datasets for LLIE are scarce, particularly for street scenes, limiting the development of robust LLIE methods. Despite using advanced transformers and/or diffusion-based models, current LLIE methods struggle in real-world low-light conditions and lack training on street-scene datasets, limiting their effectiveness for autonomous vehicles. To bridge these gaps, we introduce a new dataset LoLI-Street (Low-Light Images of Streets) with 33k paired low-light and well-exposed images from street scenes in developed cities, covering 19k object classes for object detection. LoLI-Street dataset also features 1,000 real low-light test images for testing LLIE models under real-life conditions. Furthermore, we propose a transformer and diffusion-based LLIE model named "TriFuse". Leveraging the LoLI-Street dataset, we train and evaluate our TriFuse and SOTA models to benchmark on our dataset. Comparing various models, our dataset's generalization feasibility is evident in testing across different mainstream datasets by significantly enhancing images and object detection for practical applications in autonomous driving and surveillance systems. The complete code and dataset is available on https://github.com/tanvirnwu/TriFuse. |
Accep...Accepted by the Asian Conference on Computer Vision (ACCV 2024) |
Code Link |
FogGuard: guarding YOLO against fog using perceptual loss | 2024-10-11 | ShowIn this paper, we present FogGuard, a novel fog-aware object detection network designed to address the challenges posed by foggy weather conditions. Autonomous driving systems heavily rely on accurate object detection algorithms, but adverse weather conditions can significantly impact the reliability of deep neural networks (DNNs). Existing approaches include image enhancement techniques like IA-YOLO and domain adaptation methods. While image enhancement aims to generate clear images from foggy ones, which is more challenging than object detection in foggy images, domain adaptation does not require labeled data in the target domain. Our approach involves fine-tuning on a specific dataset to address these challenges efficiently. FogGuard compensates for foggy conditions in the scene, ensuring robust performance by incorporating YOLOv3 as the baseline algorithm and introducing a unique Teacher-Student Perceptual loss for accurate object detection in foggy environments. Through comprehensive evaluations on standard datasets like PASCAL VOC and RTTS, our network significantly improves performance, achieving a 69.43% mAP compared to YOLOv3's 57.78% on the RTTS dataset. Additionally, we demonstrate that while our training method slightly increases time complexity, it doesn't add overhead during inference compared to the regular YOLO network. |
8 pag...8 pages, 1 figures, submitted on Robotic and Automation Letters |
None |
CDAN: Convolutional dense attention-guided network for low-light image enhancement | 2024-10-11 | ShowLow-light images, characterized by inadequate illumination, pose challenges of diminished clarity, muted colors, and reduced details. Low-light image enhancement, an essential task in computer vision, aims to rectify these issues by improving brightness, contrast, and overall perceptual quality, thereby facilitating accurate analysis and interpretation. This paper introduces the Convolutional Dense Attention-guided Network (CDAN), a novel solution for enhancing low-light images. CDAN integrates an autoencoder-based architecture with convolutional and dense blocks, complemented by an attention mechanism and skip connections. This architecture ensures efficient information propagation and feature learning. Furthermore, a dedicated post-processing phase refines color balance and contrast. Our approach demonstrates notable progress compared to state-of-the-art results in low-light image enhancement, showcasing its robustness across a wide range of challenging scenarios. Our model performs remarkably on benchmark datasets, effectively mitigating under-exposure and proficiently restoring textures and colors in diverse low-light scenarios. This achievement underscores CDAN's potential for diverse computer vision tasks, notably enabling robust object detection and recognition in challenging low-light conditions. |
Publi...Published in the Digital Signal Processing journal, 15 Pages, 13 Figures |
None |
Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression | 2024-10-07 | ShowDeep learning-based image compression algorithms typically focus on designing encoding and decoding networks and improving the accuracy of entropy model estimation to enhance the rate-distortion (RD) performance. However, few algorithms leverage the compression distortion prior from existing compression algorithms to improve RD performance. In this paper, we propose a latent diffusion model-based remote sensing image compression (LDM-RSIC) method, which aims to enhance the final decoding quality of RS images by utilizing the generated distortion prior from a LDM. Our approach consists of two stages. In the first stage, a self-encoder learns prior from the high-quality input image. In the second stage, the prior is generated through an LDM, conditioned on the decoded image of an existing learning-based image compression algorithm, to be used as auxiliary information for generating the texture-rich enhanced image. To better utilize the prior, a channel attention and gate-based dynamic feature attention module (DFAM) is embedded into a Transformer-based multi-scale enhancement network (MEN) for image enhancement. Extensive experiments demonstrate the proposed LDM-RSIC significantly outperforms existing state-of-the-art traditional and learning-based image compression algorithms in terms of both subjective perception and objective metrics. Additionally, we use the LDM-based scheme to improve the traditional image compression algorithm JPEG2000 and obtain 32.00% bit savings on the DOTA testing set. The code will be available at https://github.com/mlkk518/LDM-RSIC. |
Code Link | |
Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli | 2024-10-06 | ShowThe purpose of this work is to investigate the soundness and utility of a neural network-based approach as a framework for exploring the impact of image enhancement techniques on visual cortex activation. In a preliminary study, we prepare a set of state-of-the-art brain encoding models, selected among the top 10 methods that participated in The Algonauts Project 2023 Challenge [16]. We analyze their ability to make valid predictions about the effects of various image enhancement techniques on neural responses. Given the impossibility of acquiring the actual data due to the high costs associated with brain imaging procedures, our investigation builds up on a series of experiments. Specifically, we analyze the ability of brain encoders to estimate the cerebral reaction to various augmentations by evaluating the response to augmentations targeting objects (i.e., faces and words) with known impact on specific areas. Moreover, we study the predicted activation in response to objects unseen during training, exploring the impact of semantically out-of-distribution stimuli. We provide relevant evidence for the generalization ability of the models forming the proposed framework, which appears to be promising for the identification of the optimal visual augmentation filter for a given task, model-driven design strategies as well as for AR and VR applications. |
None | |
ATTIQA: Generalizable Image Quality Feature Extractor using Attribute-aware Pretraining | 2024-10-05 | ShowIn no-reference image quality assessment (NR-IQA), the challenge of limited dataset sizes hampers the development of robust and generalizable models. Conventional methods address this issue by utilizing large datasets to extract rich representations for IQA. Also, some approaches propose vision language models (VLM) based IQA, but the domain gap between generic VLM and IQA constrains their scalability. In this work, we propose a novel pretraining framework that constructs a generalizable representation for IQA by selectively extracting quality-related knowledge from VLM and leveraging the scalability of large datasets. Specifically, we select optimal text prompts for five representative image quality attributes and use VLM to generate pseudo-labels. Numerous attribute-aware pseudo-labels can be generated with large image datasets, allowing our IQA model to learn rich representations about image quality. Our approach achieves state-of-the-art performance on multiple IQA datasets and exhibits remarkable generalization capabilities. Leveraging these strengths, we propose several applications, such as evaluating image generation models and training image enhancement models, demonstrating our model's real-world applicability. |
None | |
Can Capacitive Touch Images Enhance Mobile Keyboard Decoding? | 2024-10-03 | ShowCapacitive touch sensors capture the two-dimensional spatial profile (referred to as a touch heatmap) of a finger's contact with a mobile touchscreen. However, the research and design of touchscreen mobile keyboards -- one of the most speed and accuracy demanding touch interfaces -- has focused on the location of the touch centroid derived from the touch image heatmap as the input, discarding the rest of the raw spatial signals. In this paper, we investigate whether touch heatmaps can be leveraged to further improve the tap decoding accuracy for mobile touchscreen keyboards. Specifically, we developed and evaluated machine-learning models that interpret user taps by using the centroids and/or the heatmaps as their input and studied the contribution of the heatmaps to model performance. The results show that adding the heatmap into the input feature set led to 21.4% relative reduction of character error rates on average, compared to using the centroid alone. Furthermore, we conducted a live user study with the centroid-based and heatmap-based decoders built into Pixel 6 Pro devices and observed lower error rate, faster typing speed, and higher self-reported satisfaction score based on the heatmap-based decoder than the centroid-based decoder. These findings underline the promise of utilizing touch heatmaps for improving typing experience in mobile keyboards. |
Accep...Accepted to UIST 2024 |
None |
GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer | 2024-10-01 | ShowNovel view synthesis (NVS) aims to generate images at arbitrary viewpoints using multi-view images, and recent insights from neural radiance fields (NeRF) have contributed to remarkable improvements. Recently, studies on generalizable NeRF (G-NeRF) have addressed the challenge of per-scene optimization in NeRFs. The construction of radiance fields on-the-fly in G-NeRF simplifies the NVS process, making it well-suited for real-world applications. Meanwhile, G-NeRF still struggles in representing fine details for a specific scene due to the absence of per-scene optimization, even with texture-rich multi-view source inputs. As a remedy, we propose a Geometry-driven Multi-reference Texture transfer network (GMT) available as a plug-and-play module designed for G-NeRF. Specifically, we propose ray-imposed deformable convolution (RayDCN), which aligns input and reference features reflecting scene geometry. Additionally, the proposed texture preserving transformer (TP-Former) aggregates multi-view source features while preserving texture information. Consequently, our module enables direct interaction between adjacent pixels during the image enhancement process, which is deficient in G-NeRF models with an independent rendering process per pixel. This addresses constraints that hinder the ability to capture high-frequency details. Experiments show that our plug-and-play module consistently improves G-NeRF models on various benchmark datasets. |
Accep...Accepted at ECCV 2024. Code available at https://github.com/yh-yoon/GMT |
Code Link |
Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation | 2024-09-29 | ShowUnderwater images often suffer from quality degradation due to absorption and scattering effects. Most existing underwater image enhancement algorithms produce a single, fixed-color image, limiting user flexibility and application. To address this limitation, we propose a method called \textit{ColorCode}, which enhances underwater images while offering a range of controllable color outputs. Our approach involves recovering an underwater image to a reference enhanced image through supervised training and decomposing it into color and content codes via self-reconstruction and cross-reconstruction. The color code is explicitly constrained to follow a Gaussian distribution, allowing for efficient sampling and interpolation during inference. ColorCode offers three key features: 1) color enhancement, producing an enhanced image with a fixed color; 2) color adaptation, enabling controllable adjustments of long-wavelength color components using guidance images; and 3) color interpolation, allowing for the smooth generation of multiple colors through continuous sampling of the color code. Quantitative and visual evaluations on popular and challenging benchmark datasets demonstrate the superiority of ColorCode over existing methods in providing diverse, controllable, and color-realistic enhancement results. The source code is available at https://github.com/Xiaofeng-life/ColorCode. |
Code Link | |
MixNet: Efficient Global Modeling for Ultra-High-Definition Image Restoration | 2024-09-29 | ShowRecent advancements in image restoration methods employing global modeling have shown promising results. However, these approaches often incur substantial memory requirements, particularly when processing ultra-high-definition (UHD) images. In this paper, we propose a novel image restoration method called MixNet, which introduces an alternative approach to global modeling approaches and is more effective for UHD image restoration. To capture the longrange dependency of features without introducing excessive computational complexity, we present the Global Feature Modulation Layer (GFML). GFML associates features from different views by permuting the feature maps, enabling efficient modeling of long-range dependency. In addition, we also design the Local Feature Modulation Layer (LFML) and Feed-forward Layer (FFL) to capture local features and transform features into a compact representation. This way, our MixNetachieves effective restoration with low inference time overhead and computational complexity. We conduct extensive experiments on four UHD image restoration tasks, including low-light image enhancement, underwater image enhancement, image deblurring and image demoireing, and the comprehensive results demonstrate that our proposed method surpasses the performance of current state-of-the-art methods. The code will be available at \url{https://github.com/5chen/MixNet}. |
under review | Code Link |
Three-stage binarization of color document images based on discrete wavelet transform and generative adversarial networks | 2024-09-28 | ShowThe efficient extraction of text information from the background in degraded color document images is an important challenge in the preservation of ancient manuscripts. The imperfect preservation of ancient manuscripts has led to different types of degradation over time, such as page yellowing, staining, and ink bleeding, seriously affecting the results of document image binarization. This work proposes an effective three-stage network method to image enhancement and binarization of degraded documents using generative adversarial networks (GANs). Specifically, in Stage-1, we first split the input images into multiple patches, and then split these patches into four single-channel patch images (gray, red, green, and blue). Then, three single-channel patch images (red, green, and blue) are processed by the discrete wavelet transform (DWT) with normalization. In Stage-2, we use four independent generators to separately train GAN models based on the four channels on the processed patch images to extract color foreground information. Finally, in Stage-3, we train two independent GAN models on the outputs of Stage-2 and the resized original input images (512x512) as the local and global predictions to obtain the final outputs. The experimental results show that the Avg-Score metrics of the proposed method are 77.64, 77.95, 79.05, 76.38, 75.34, and 77.00 on the (H)-DIBCO 2011, 2013, 2014, 2016, 2017, and 2018 datasets, which are at the state-of-the-art level. The implementation code for this work is available at https://github.com/abcpp12383/ThreeStageBinarization. |
Accep...Accepted by Knowledge-Based Systems |
Code Link |
PDCFNet: Enhancing Underwater Images through Pixel Difference Convolution | 2024-09-28 | ShowMajority of deep learning methods utilize vanilla convolution for enhancing underwater images. While vanilla convolution excels in capturing local features and learning the spatial hierarchical structure of images, it tends to smooth input images, which can somewhat limit feature expression and modeling. A prominent characteristic of underwater degraded images is blur, and the goal of enhancement is to make the textures and details (high-frequency features) in the images more visible. Therefore, we believe that leveraging high-frequency features can improve enhancement performance. To address this, we introduce Pixel Difference Convolution (PDC), which focuses on gradient information with significant changes in the image, thereby improving the modeling of enhanced images. We propose an underwater image enhancement network, PDCFNet, based on PDC and cross-level feature fusion. Specifically, we design a detail enhancement module based on PDC that employs parallel PDCs to capture high-frequency features, leading to better detail and texture enhancement. The designed cross-level feature fusion module performs operations such as concatenation and multiplication on features from different levels, ensuring sufficient interaction and enhancement between diverse features. Our proposed PDCFNet achieves a PSNR of 27.37 and an SSIM of 92.02 on the UIEB dataset, attaining the best performance to date. Our code is available at https://github.com/zhangsong1213/PDCFNet. |
Code Link | |
Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors | 2024-09-27 | ShowLow-light image enhancement (LIE) aims at precisely and efficiently recovering an image degraded in poor illumination environments. Recent advanced LIE techniques are using deep neural networks, which require lots of low-normal light image pairs, network parameters, and computational resources. As a result, their practicality is limited. In this work, we devise a novel unsupervised LIE framework based on diffusion priors and lookup tables (DPLUT) to achieve efficient low-light image recovery. The proposed approach comprises two critical components: a light adjustment lookup table (LLUT) and a noise suppression lookup table (NLUT). LLUT is optimized with a set of unsupervised losses. It aims at predicting pixel-wise curve parameters for the dynamic range adjustment of a specific image. NLUT is designed to remove the amplified noise after the light brightens. As diffusion models are sensitive to noise, diffusion priors are introduced to achieve high-performance noise suppression. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in terms of visual quality and efficiency. |
13 pages, 10 figures | None |
Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models | 2024-09-27 | ShowUnderwater vision is crucial for autonomous underwater vehicles (AUVs), and enhancing degraded underwater images in real-time on a resource-constrained AUV is a key challenge due to factors like light absorption and scattering, or the sufficient model computational complexity to resolve such factors. Traditional image enhancement techniques lack adaptability to varying underwater conditions, while learning-based methods, particularly those using convolutional neural networks (CNNs) and generative adversarial networks (GANs), offer more robust solutions but face limitations such as inadequate enhancement, unstable training, or mode collapse. Denoising diffusion probabilistic models (DDPMs) have emerged as a state-of-the-art approach in image-to-image tasks but require intensive computational complexity to achieve the desired underwater image enhancement (UIE) using the recent UW-DDPM solution. To address these challenges, this paper introduces UW-DiffPhys, a novel physical-based and diffusion-based UIE approach. UW-DiffPhys combines light-computation physical-based UIE network components with a denoising U-Net to replace the computationally intensive distribution transformation U-Net in the existing UW-DDPM framework, reducing complexity while maintaining performance. Additionally, the Denoising Diffusion Implicit Model (DDIM) is employed to accelerate the inference process through non-Markovian sampling. Experimental results demonstrate that UW-DiffPhys achieved a substantial reduction in computational complexity and inference time compared to UW-DDPM, with competitive performance in key metrics such as PSNR, SSIM, UCIQE, and an improvement in the overall underwater image quality UIQM metric. The implementation code can be found at the following repository: https://github.com/bachzz/UW-DiffPhys |
Code Link | |
SinoSynth: A Physics-based Domain Randomization Approach for Generalizable CBCT Image Enhancement | 2024-09-27 | ShowCone Beam Computed Tomography (CBCT) finds diverse applications in medicine. Ensuring high image quality in CBCT scans is essential for accurate diagnosis and treatment delivery. Yet, the susceptibility of CBCT images to noise and artifacts undermines both their usefulness and reliability. Existing methods typically address CBCT artifacts through image-to-image translation approaches. These methods, however, are limited by the artifact types present in the training data, which may not cover the complete spectrum of CBCT degradations stemming from variations in imaging protocols. Gathering additional data to encompass all possible scenarios can often pose a challenge. To address this, we present SinoSynth, a physics-based degradation model that simulates various CBCT-specific artifacts to generate a diverse set of synthetic CBCT images from high-quality CT images without requiring pre-aligned data. Through extensive experiments, we demonstrate that several different generative networks trained on our synthesized data achieve remarkable results on heterogeneous multi-institutional datasets, outperforming even the same networks trained on actual data. We further show that our degradation model conveniently provides an avenue to enforce anatomical constraints in conditional generative models, yielding high-quality and structure-preserving synthetic CT images. |
MICCAI 2024 | None |
InstructIR: High-Quality Image Restoration Following Human Instructions | 2024-09-25 | ShowImage restoration is a fundamental problem that involves recovering a high-quality clean image from its degraded observation. All-In-One image restoration models can effectively restore images from various types and levels of degradation using degradation-specific information as prompts to guide the restoration model. In this work, we present the first approach that uses human-written instructions to guide the image restoration model. Given natural language prompts, our model can recover high-quality images from their degraded counterparts, considering multiple degradation types. Our method, InstructIR, achieves state-of-the-art results on several restoration tasks including image denoising, deraining, deblurring, dehazing, and (low-light) image enhancement. InstructIR improves +1dB over previous all-in-one restoration methods. Moreover, our dataset and results represent a novel benchmark for new research on text-guided image restoration and enhancement. Our code, datasets and models are available at: https://github.com/mv-lab/InstructIR |
Europ...European Conference on Computer Vision (ECCV) 2024 |
Code Link |
Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement | 2024-09-25 | ShowDespite the impressive advancements made in recent low-light image enhancement techniques, the scarcity of paired data has emerged as a significant obstacle to further advancements. This work proposes a mean-teacher-based semi-supervised low-light enhancement (Semi-LLIE) framework that integrates the unpaired data into model training. The mean-teacher technique is a prominent semi-supervised learning method, successfully adopted for addressing high-level and low-level vision tasks. However, two primary issues hinder the naive mean-teacher method from attaining optimal performance in low-light image enhancement. Firstly, pixel-wise consistency loss is insufficient for transferring realistic illumination distribution from the teacher to the student model, which results in color cast in the enhanced images. Secondly, cutting-edge image enhancement approaches fail to effectively cooperate with the mean-teacher framework to restore detailed information in dark areas due to their tendency to overlook modeling structured information within local regions. To mitigate the above issues, we first introduce a semantic-aware contrastive loss to faithfully transfer the illumination distribution, contributing to enhancing images with natural colors. Then, we design a Mamba-based low-light image enhancement backbone to effectively enhance Mamba's local region pixel relationship representation ability with a multi-scale feature learning scheme, facilitating the generation of images with rich textural details. Further, we propose novel perceptive loss based on the large-scale vision-language Recognize Anything Model (RAM) to help generate enhanced images with richer textual details. The experimental results indicate that our Semi-LLIE surpasses existing methods in both quantitative and qualitative metrics. |
None | |
Proactive Schemes: A Survey of Adversarial Attacks for Social Good | 2024-09-24 | ShowAdversarial attacks in computer vision exploit the vulnerabilities of machine learning models by introducing subtle perturbations to input data, often leading to incorrect predictions or classifications. These attacks have evolved in sophistication with the advent of deep learning, presenting significant challenges in critical applications, which can be harmful for society. However, there is also a rich line of research from a transformative perspective that leverages adversarial techniques for social good. Specifically, we examine the rise of proactive schemes-methods that encrypt input data using additional signals termed templates, to enhance the performance of deep learning models. By embedding these imperceptible templates into digital media, proactive schemes are applied across various applications, from simple image enhancements to complicated deep learning frameworks to aid performance, as compared to the passive schemes, which don't change the input data distribution for their framework. The survey delves into the methodologies behind these proactive schemes, the encryption and learning processes, and their application to modern computer vision and natural language processing applications. Additionally, it discusses the challenges, potential vulnerabilities, and future directions for proactive schemes, ultimately highlighting their potential to foster the responsible and secure advancement of deep learning technologies. |
Submitted for review | None |
Low-Light Enhancement Effect on Classification and Detection: An Empirical Study | 2024-09-22 | ShowLow-light images are commonly encountered in real-world scenarios, and numerous low-light image enhancement (LLIE) methods have been proposed to improve the visibility of these images. The primary goal of LLIE is to generate clearer images that are more visually pleasing to humans. However, the impact of LLIE methods in high-level vision tasks, such as image classification and object detection, which rely on high-quality image datasets, is not well {explored}. To explore the impact, we comprehensively evaluate LLIE methods on these high-level vision tasks by utilizing an empirical investigation comprising image classification and object detection experiments. The evaluation reveals a dichotomy: {\textit{While Low-Light Image Enhancement (LLIE) methods enhance human visual interpretation, their effect on computer vision tasks is inconsistent and can sometimes be harmful. }} Our findings suggest a disconnect between image enhancement for human visual perception and for machine analysis, indicating a need for LLIE methods tailored to support high-level vision tasks effectively. This insight is crucial for the development of LLIE techniques that align with the needs of both human and machine vision. |
8 pages,8 figures | None |
Generative AI for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations | 2024-09-21 | ShowThis review introduces the transformative potential of generative Artificial Intelligence (AI) and foundation models, including large language models (LLMs), for health technology assessment (HTA). We explore their applications in four critical areas, evidence synthesis, evidence generation, clinical trials and economic modeling: (1) Evidence synthesis: Generative AI has the potential to assist in automating literature reviews and meta-analyses by proposing search terms, screening abstracts, and extracting data with notable accuracy; (2) Evidence generation: These models can potentially facilitate automating the process and analyze the increasingly available large collections of real-world data (RWD), including unstructured clinical notes and imaging, enhancing the speed and quality of real-world evidence (RWE) generation; (3) Clinical trials: Generative AI can be used to optimize trial design, improve patient matching, and manage trial data more efficiently; and (4) Economic modeling: Generative AI can also aid in the development of health economic models, from conceptualization to validation, thus streamlining the overall HTA process. Despite their promise, these technologies, while rapidly improving, are still nascent and continued careful evaluation in their applications to HTA is required. To ensure their responsible use and implementation, both developers and users of research incorporating these tools, should familiarize themselves with their current limitations, including the issues related to scientific validity, risk of bias, and consider equity and ethical implications. We also surveyed the current policy landscape and provide suggestions for HTA agencies on responsibly integrating generative AI into their workflows, emphasizing the importance of human oversight and the fast-evolving nature of these tools. |
24 pa...24 pages, 1 figure, 1 table, 2 boxes, 103 references |
None |
Deep Learning-Based Detection of Referable Diabetic Retinopathy and Macular Edema Using Ultra-Widefield Fundus Imaging | 2024-09-19 | ShowDiabetic retinopathy and diabetic macular edema are significant complications of diabetes that can lead to vision loss. Early detection through ultra-widefield fundus imaging enhances patient outcomes but presents challenges in image quality and analysis scale. This paper introduces deep learning solutions for automated UWF image analysis within the framework of the MICCAI 2024 UWF4DR challenge. We detail methods and results across three tasks: image quality assessment, detection of referable DR, and identification of DME. Employing advanced convolutional neural network architectures such as EfficientNet and ResNet, along with preprocessing and augmentation strategies, our models demonstrate robust performance in these tasks. Results indicate that deep learning can significantly aid in the automated analysis of UWF images, potentially improving the efficiency and accuracy of DR and DME detection in clinical settings. |
None | |
Fundus image enhancement through direct diffusion bridges | 2024-09-19 | ShowWe propose FD3, a fundus image enhancement method based on direct diffusion bridges, which can cope with a wide range of complex degradations, including haze, blur, noise, and shadow. We first propose a synthetic forward model through a human feedback loop with board-certified ophthalmologists for maximal quality improvement of low-quality in-vivo images. Using the proposed forward model, we train a robust and flexible diffusion-based image enhancement network that is highly effective as a stand-alone method, unlike previous diffusion model-based approaches which act only as a refiner on top of pre-trained models. Through extensive experiments, we show that FD3 establishes \add{superior quality} not only on synthetic degradations but also on in vivo studies with low-quality fundus photos taken from patients with cataracts or small pupils. To promote further research in this area, we open-source all our code and data used for this research at https://github.com/heeheee888/FD3 |
Publi...Published at IEEE JBHI. 12 pages, 10 figures. Code and Data: https://github.com/heeheee888/FD3 |
Code Link |
Ultrasound Image Enhancement with the Variance of Diffusion Models | 2024-09-17 | ShowUltrasound imaging, despite its widespread use in medicine, often suffers from various sources of noise and artifacts that impact the signal-to-noise ratio and overall image quality. Enhancing ultrasound images requires a delicate balance between contrast, resolution, and speckle preservation. This paper introduces a novel approach that integrates adaptive beamforming with denoising diffusion-based variance imaging to address this challenge. By applying Eigenspace-Based Minimum Variance (EBMV) beamforming and employing a denoising diffusion model fine-tuned on ultrasound data, our method computes the variance across multiple diffusion-denoised samples to produce high-quality despeckled images. This approach leverages both the inherent multiplicative noise of ultrasound and the stochastic nature of diffusion models. Experimental results on a publicly available dataset demonstrate the effectiveness of our method in achieving superior image reconstructions from single plane-wave acquisitions. The code is available at: https://github.com/Yuxin-Zhang-Jasmine/IUS2024_Diffusion. |
Accep...Accepted by the IEEE International Ultrasonics Symposium (IUS) 2024 |
Code Link |
LYT-NET: Lightweight YUV Transformer-based Network for Low-light Image Enhancement | 2024-09-17 | ShowThis letter introduces LYT-Net, a novel lightweight transformer-based model for low-light image enhancement (LLIE). LYT-Net consists of several layers and detachable blocks, including our novel blocks--Channel-Wise Denoiser (CWD) and Multi-Stage Squeeze & Excite Fusion (MSEF)--along with the traditional Transformer block, Multi-Headed Self-Attention (MHSA). In our method we adopt a dual-path approach, treating chrominance channels U and V and luminance channel Y as separate entities to help the model better handle illumination adjustment and corruption restoration. Our comprehensive evaluation on established LLIE datasets demonstrates that, despite its low complexity, our model outperforms recent LLIE methods. The source code and pre-trained models are available at https://github.com/albrateanu/LYT-Net |
5 pages | Code Link |
CUNSB-RFIE: Context-aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement | 2024-09-17 | ShowRetinal fundus photography is significant in diagnosing and monitoring retinal diseases. However, systemic imperfections and operator/patient-related factors can hinder the acquisition of high-quality retinal images. Previous efforts in retinal image enhancement primarily relied on GANs, which are limited by the trade-off between training stability and output diversity. In contrast, the Schr"odinger Bridge (SB), offers a more stable solution by utilizing Optimal Transport (OT) theory to model a stochastic differential equation (SDE) between two arbitrary distributions. This allows SB to effectively transform low-quality retinal images into their high-quality counterparts. In this work, we leverage the SB framework to propose an image-to-image translation pipeline for retinal image enhancement. Additionally, previous methods often fail to capture fine structural details, such as blood vessels. To address this, we enhance our pipeline by introducing Dynamic Snake Convolution, whose tortuous receptive field can better preserve tubular structures. We name the resulting retinal fundus image enhancement framework the Context-aware Unpaired Neural Schr"{o}dinger Bridge (CUNSB-RFIE). To the best of our knowledge, this is the first endeavor to use the SB approach for retinal image enhancement. Experimental results on a large-scale dataset demonstrate the advantage of the proposed method compared to several state-of-the-art supervised and unsupervised methods in terms of image quality and performance on downstream tasks.The code is available at https://github.com/Retinal-Research/CUNSB-RFIE . |
Code Link | |
Underwater Image Enhancement via Dehazing and Color Restoration | 2024-09-15 | ShowWith the rapid development of marine engineering projects such as marine resource extraction and oceanic surveys, underwater visual imaging and analysis has become a critical technology. Unfortunately, due to the inevitable non-linear attenuation of light in underwater environments, underwater images and videos often suffer from low contrast, blurriness, and color degradation, which significantly complicate the subsequent research. Existing underwater image enhancement methods often treat the haze and color cast as a unified degradation process and disregard their independence and interdependence, which limits the performance improvement. Here, we propose a Vision Transformer (ViT)-based network (referred to as WaterFormer) to improve the underwater image quality. WaterFormer contains three major components: a dehazing block (DehazeFormer Block) to capture the self-correlated haze features and extract deep-level features, a Color Restoration Block (CRB) to capture self-correlated color cast features, and a Channel Fusion Block (CFB) to capture fusion features within the network. To ensure authenticity, a soft reconstruction layer based on the underwater imaging physics model is included. To improve the quality of the enhanced images, we introduce the Chromatic Consistency Loss and Sobel Color Loss to train the network. Comprehensive experimental results demonstrate that WaterFormer outperforms other state-of-the-art methods in enhancing underwater images. |
None | |
Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement | 2024-09-12 | ShowRetinal fundus photography offers a non-invasive way to diagnose and monitor a variety of retinal diseases, but is prone to inherent quality glitches arising from systemic imperfections or operator/patient-related factors. However, high-quality retinal images are crucial for carrying out accurate diagnoses and automated analyses. The fundus image enhancement is typically formulated as a distribution alignment problem, by finding a one-to-one mapping between a low-quality image and its high-quality counterpart. This paper proposes a context-informed optimal transport (OT) learning framework for tackling unpaired fundus image enhancement. In contrast to standard generative image enhancement methods, which struggle with handling contextual information (e.g., over-tampered local structures and unwanted artifacts), the proposed context-aware OT learning paradigm better preserves local structures and minimizes unwanted artifacts. Leveraging deep contextual features, we derive the proposed context-aware OT using the earth mover's distance and show that the proposed context-OT has a solid theoretical guarantee. Experimental results on a large-scale dataset demonstrate the superiority of the proposed method over several state-of-the-art supervised and unsupervised methods in terms of signal-to-noise ratio, structural similarity index, as well as two downstream tasks. The code is available at \url{https://github.com/Retinal-Research/Contextual-OT}. |
Code Link | |
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process | 2024-09-11 | ShowThe emergence of text-to-image generation models has led to the recognition that image enhancement, performed as post-processing, would significantly improve the visual quality of the generated images. Exploring diffusion models to enhance the generated images nevertheless is not trivial and necessitates to delicately enrich plentiful details while preserving the visual appearance of key content in the original image. In this paper, we propose a novel framework, namely FreeEnhance, for content-consistent image enhancement using the off-the-shelf image diffusion models. Technically, FreeEnhance is a two-stage process that firstly adds random noise to the input image and then capitalizes on a pre-trained image diffusion model (i.e., Latent Diffusion Models) to denoise and enhance the image details. In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns (e.g., edge, corner) in the original image. In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality. Extensive experiments conducted on the HPDv2 dataset demonstrate that our FreeEnhance outperforms the state-of-the-art image enhancement models in terms of quantitative metrics and human preference. More remarkably, FreeEnhance also shows higher human preference compared to the commercial image enhancement solution of Magnific AI. |
ACM Multimedia 2024 | None |
Modeling Image Tone Dichotomy with the Power Function | 2024-09-10 | ShowThe primary purpose of this paper is to present the concept of dichotomy in image illumination modeling based on the power function. In particular, we review several mathematical properties of the power function to identify the limitations and propose a new mathematical model capable of abstracting illumination dichotomy. The simplicity of the equation opens new avenues for classical and modern image analysis and processing. The article provides practical and illustrative image examples to explain how the new model manages dichotomy in image perception. The article shows dichotomy image space as a viable way to extract rich information from images despite poor contrast linked to tone, lightness, and color perception. Moreover, a comparison with state-of-the-art methods in image enhancement provides evidence of the method's value. |
49 pa...49 pages, 11 figures and 36 references |
None |
Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models | 2024-09-10 | ShowLearning-based methods for underwater image enhancement (UWIE) have undergone extensive exploration. However, learning-based models are usually vulnerable to adversarial examples so as the UWIE models. To the best of our knowledge, there is no comprehensive study on the adversarial robustness of UWIE models, which indicates that UWIE models are potentially under the threat of adversarial attacks. In this paper, we propose a general adversarial attack protocol. We make a first attempt to conduct adversarial attacks on five well-designed UWIE models on three common underwater image benchmark datasets. Considering the scattering and absorption of light in the underwater environment, there exists a strong correlation between color correction and underwater image enhancement. On the basis of that, we also design two effective UWIE-oriented adversarial attack methods Pixel Attack and Color Shift Attack targeting different color spaces. The results show that five models exhibit varying degrees of vulnerability to adversarial attacks and well-designed small perturbations on degraded images are capable of preventing UWIE models from generating enhanced results. Further, we conduct adversarial training on these models and successfully mitigated the effectiveness of adversarial attacks. In summary, we reveal the adversarial vulnerability of UWIE models and propose a new evaluation dimension of UWIE models. |
None | |
Inhomogeneous illumination image enhancement under ex-tremely low visibility condition | 2024-09-10 | ShowImaging through dense fog presents unique challenges, with essential visual information crucial for applications like object detection and recognition obscured, thereby hindering conventional image processing methods. Despite improvements through neural network-based approaches, these techniques falter under extremely low visibility conditions exacerbated by inhomogeneous illumination, which degrades deep learning performance due to inconsistent signal intensities. We introduce in this paper a novel method that adaptively filters background illumination based on Structural Differential and Integral Filtering (SDIF) to enhance only vital signal information. The grayscale banding is eliminated by incorporating a visual optimization strategy based on image gradients. Maximum Histogram Equalization (MHE) is used to achieve high contrast while maintaining fidelity to the original content. We evaluated our algorithm using data collected from both a fog chamber and outdoor environments, and performed comparative analyses with existing methods. Our findings demonstrate that our proposed method significantly enhances signal clarity under extremely low visibility conditions and out-performs existing techniques, offering substantial improvements for deep fog imaging applications. |
None | |
Underwater Variable Zoom: Depth-Guided Perception Network for Underwater Image Enhancement | 2024-09-07 | ShowUnderwater scenes intrinsically involve degradation problems owing to heterogeneous ocean elements. Prevailing underwater image enhancement (UIE) methods stick to straightforward feature modeling to learn the mapping function, which leads to limited vision gain as it lacks more explicit physical cues (e.g., depth). In this work, we investigate injecting the depth prior into the deep UIE model for more precise scene enhancement capability. To this end, we present a novel depth-guided perception UIE framework, dubbed underwater variable zoom (UVZ). Specifically, UVZ resorts to a two-stage pipeline. First, a depth estimation network is designed to generate critical depth maps, combined with an auxiliary supervision network introduced to suppress estimation differences during training. Second, UVZ parses near-far scenarios by harnessing the predicted depth maps, enabling local and non-local perceiving in different regions. Extensive experiments on five benchmark datasets demonstrate that UVZ achieves superior visual gain and delivers promising quantitative metrics. Besides, UVZ is confirmed to exhibit good generalization in some visual tasks, especially in unusual lighting conditions. The code, models and results are available at: https://github.com/WindySprint/UVZ. |
Code Link | |
RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement | 2024-09-06 | ShowScene observation from multiple perspectives would bring a more comprehensive visual experience. However, in the context of acquiring multiple views in the dark, the highly correlated views are seriously alienated, making it challenging to improve scene understanding with auxiliary views. Recent single image-based enhancement methods may not be able to provide consistently desirable restoration performance for all views due to the ignorance of potential feature correspondence among different views. To alleviate this issue, we make the first attempt to investigate multi-view low-light image enhancement. First, we construct a new dataset called Multi-View Low-light Triplets (MVLT), including 1,860 pairs of triple images with large illumination ranges and wide noise distribution. Each triplet is equipped with three different viewpoints towards the same scene. Second, we propose a deep multi-view enhancement framework based on the Recurrent Collaborative Network (RCNet). Specifically, in order to benefit from similar texture correspondence across different views, we design the recurrent feature enhancement, alignment and fusion (ReEAF) module, in which intra-view feature enhancement (Intra-view EN) followed by inter-view feature alignment and fusion (Inter-view AF) is performed to model the intra-view and inter-view feature propagation sequentially via multi-view collaboration. In addition, two different modules from enhancement to alignment (E2A) and from alignment to enhancement (A2E) are developed to enable the interactions between Intra-view EN and Inter-view AF, which explicitly utilize attentive feature weighting and sampling for enhancement and alignment, respectively. Experimental results demonstrate that our RCNet significantly outperforms other state-of-the-art methods. All of our dataset, code, and model will be available at https://github.com/hluo29/RCNet. |
14 Pa...14 Pages, 10 Figures, Under Review |
Code Link |
Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement | 2024-09-06 | ShowLow-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range correlations in such images. On the other hand, transformer-based methods that focus on global information face high computational complexities when processing high-resolution remote sensing images. From another perspective, Fourier transform can compute global information without introducing a large number of parameters, enabling the network to more efficiently capture the overall image structure and establish long-range correlations. Therefore, we propose a Dual-Domain Feature Fusion Network (DFFN) for low-light remote sensing image enhancement. Specifically, this challenging task of low-light enhancement is divided into two more manageable sub-tasks: the first phase learns amplitude information to restore image brightness, and the second phase learns phase information to refine details. To facilitate information exchange between the two phases, we designed an information fusion affine block that combines data from different phases and scales. Additionally, we have constructed two dark light remote sensing datasets to address the current lack of datasets in dark light remote sensing image enhancement. Extensive evaluations show that our method outperforms existing state-of-the-art methods. The code is available at https://github.com/iijjlk/DFFN. |
14 page | Code Link |
Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement | 2024-09-03 | ShowPrevious low-light image enhancement (LLIE) approaches, while employing frequency decomposition techniques to address the intertwined challenges of low frequency (e.g., illumination recovery) and high frequency (e.g., noise reduction), primarily focused on the development of dedicated and complex networks to achieve improved performance. In contrast, we reveal that an advanced disentanglement paradigm is sufficient to consistently enhance state-of-the-art methods with minimal computational overhead. Leveraging the image Laplace decomposition scheme, we propose a novel low-frequency consistency method, facilitating improved frequency disentanglement optimization. Our method, seamlessly integrating with various models such as CNNs, Transformers, and flow-based and diffusion models, demonstrates remarkable adaptability. Noteworthy improvements are showcased across five popular benchmarks, with up to 7.68dB gains on PSNR achieved for six state-of-the-art models. Impressively, our approach maintains efficiency with only 88K extra parameters, setting a new standard in the challenging realm of low-light image enhancement. |
Accep...Accepted to ECCV 2024, Github \url{https://github.com/redrock303/ADF-LLIE} |
Code Link |
A Deep-Learning-Based Label-free No-Reference Image Quality Assessment Metric: Application in Sodium MRI Denoising | 2024-09-02 | ShowNew multinuclear MRI techniques, such as sodium MRI, generally suffer from low image quality due to an inherently low signal. Postprocessing methods, such as image denoising, have been developed for image enhancement. However, the assessment of these enhanced images is challenging especially considering when there is a lack of high resolution and high signal images as reference, such as in sodium MRI. No-reference Image Quality Assessment (NR-IQA) metrics are approaches to solve this problem. Existing learning-based NR-IQA metrics rely on labels derived from subjective human opinions or metrics like Signal-to-Noise Ratio (SNR), which are either time-consuming or lack accurate ground truths, resulting in unreliable assessment. We note that deep learning (DL) models have a unique characteristic in that they are specialized to a characteristic training set, meaning that deviations between the input testing data from the training data will reduce prediction accuracy. Therefore, we propose a novel DL-based NR-IQA metric, the Model Specialization Metric (MSM), which does not depend on ground-truth images or labels. MSM measures the difference between the input image and the model's prediction for evaluating the quality of the input image. Experiments conducted on both simulated distorted proton T1-weighted MR images and denoised sodium MR images demonstrate that MSM exhibits a superior evaluation performance on various simulated noises and distortions. MSM also has a substantial agreement with the expert evaluations, achieving an averaged Cohen's Kappa coefficient of 0.6528, outperforming the existing NR-IQA metrics. |
13 pages, 3 figures | None |
SINET: Sparsity-driven Interpretable Neural Network for Underwater Image Enhancement | 2024-09-02 | ShowImproving the quality of underwater images is essential for advancing marine research and technology. This work introduces a sparsity-driven interpretable neural network (SINET) for the underwater image enhancement (UIE) task. Unlike pure deep learning methods, our network architecture is based on a novel channel-specific convolutional sparse coding (CCSC) model, ensuring good interpretability of the underlying image enhancement process. The key feature of SINET is that it estimates the salient features from the three color channels using three sparse feature estimation blocks (SFEBs). The architecture of SFEB is designed by unrolling an iterative algorithm for solving the |
None | |
A Generative Adversarial Network-based Method for LiDAR-Assisted Radar Image Enhancement | 2024-08-30 | ShowThis paper presents a generative adversarial network (GAN) based approach for radar image enhancement. Although radar sensors remain robust for operations under adverse weather conditions, their application in autonomous vehicles (AVs) is commonly limited by the low-resolution data they produce. The primary goal of this study is to enhance the radar images to better depict the details and features of the environment, thereby facilitating more accurate object identification in AVs. The proposed method utilizes high-resolution, two-dimensional (2D) projected light detection and ranging (LiDAR) point clouds as ground truth images and low-resolution radar images as inputs to train the GAN. The ground truth images were obtained through two main steps. First, a LiDAR point cloud map was generated by accumulating raw LiDAR scans. Then, a customized LiDAR point cloud cropping and projection method was employed to obtain 2D projected LiDAR point clouds. The inference process of the proposed method relies solely on radar images to generate an enhanced version of them. The effectiveness of the proposed method is demonstrated through both qualitative and quantitative results. These results show that the proposed method can generate enhanced images with clearer object representation compared to the input radar images, even under adverse weather conditions. |
None | |
Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method | 2024-08-30 | ShowIn this paper, we delve into the realm of 4-D light fields (LFs) to enhance underwater imaging plagued by light absorption, scattering, and other challenges. Contrasting with conventional 2-D RGB imaging, 4-D LF imaging excels in capturing scenes from multiple perspectives, thereby indirectly embedding geometric information. This intrinsic property is anticipated to effectively address the challenges associated with underwater imaging. By leveraging both explicit and implicit depth cues present in 4-D LF images, we propose a progressive, mutually reinforcing framework for underwater 4-D LF image enhancement and depth estimation. Specifically, our framework explicitly utilizes estimated depth information alongside implicit depth-related dynamic convolutional kernels to modulate output features. The entire framework decomposes this complex task, iteratively optimizing the enhanced image and depth information to progressively achieve optimal enhancement results. More importantly, we construct the first 4-D LF-based underwater image dataset for quantitative evaluation and supervised training of learning-based methods, comprising 75 underwater scenes and 3675 high-resolution 2K pairs. To craft vibrant and varied underwater scenes, we build underwater environments with various objects and adopt several types of degradation. Through extensive experimentation, we showcase the potential and superiority of 4-D LF-based underwater imaging vis-a-vis traditional 2-D RGB-based approaches. Moreover, our method effectively corrects color bias and achieves state-of-the-art performance. The dataset and code will be publicly available at https://github.com/linlos1234/LFUIE. |
14 pages, 14 figures | Code Link |
LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement | 2024-08-29 | ShowWhile recent low-light image enhancement (LLIE) methods have made significant advancements, they still face challenges in terms of low visual quality and weak generalization ability when applied to complex scenarios. To address these issues, we propose a semi-supervised method based on latent mean-teacher and Gaussian process, named LMT-GP. We first design a latent mean-teacher framework that integrates both labeled and unlabeled data, as well as their latent vectors, into model training. Meanwhile, we use a mean-teacher-assisted Gaussian process learning strategy to establish a connection between the latent and pseudo-latent vectors obtained from the labeled and unlabeled data. To guide the learning process, we utilize an assisted Gaussian process regression (GPR) loss function. Furthermore, we design a pseudo-label adaptation module (PAM) to ensure the reliability of the network learning. To demonstrate our method's generalization ability and effectiveness, we apply it to multiple LLIE datasets and high-level vision tasks. Experiment results demonstrate that our method achieves high generalization performance and image quality. The code is available at https://github.com/HFUT-CV/LMT-GP. |
Code Link | |
O-Mamba: O-shape State-Space Model for Underwater Image Enhancement | 2024-08-23 | ShowUnderwater image enhancement (UIE) face significant challenges due to complex underwater lighting conditions. Recently, mamba-based methods have achieved promising results in image enhancement tasks. However, these methods commonly rely on Vmamba, which focuses only on spatial information modeling and struggles to deal with the cross-color channel dependency problem in underwater images caused by the differential attenuation of light wavelengths, limiting the effective use of deep networks. In this paper, we propose a novel UIE framework called O-mamba. O-mamba employs an O-shaped dual-branch network to separately model spatial and cross-channel information, utilizing the efficient global receptive field of state-space models optimized for underwater images. To enhance information interaction between the two branches and effectively utilize multi-scale information, we design a Multi-scale Bi-mutual Promotion Module. This branch includes MS-MoE for fusing multi-scale information within branches, Mutual Promotion module for interaction between spatial and channel information across branches, and Cyclic Multi-scale optimization strategy to maximize the use of multi-scale information. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) results.The code is available at https://github.com/chenydong/O-Mamba. |
Code Link | |
Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement | 2024-08-22 | ShowObtaining pairs of low/normal-light videos, with motions, is more challenging than still images, which raises technical issues and poses the technical route of unpaired learning as a critical role. This paper makes endeavors in the direction of learning for low-light video enhancement without using paired ground truth. Compared to low-light image enhancement, enhancing low-light videos is more difficult due to the intertwined effects of noise, exposure, and contrast in the spatial domain, jointly with the need for temporal coherence. To address the above challenge, we propose the Unrolled Decomposed Unpaired Network (UDU-Net) for enhancing low-light videos by unrolling the optimization functions into a deep network to decompose the signal into spatial and temporal-related factors, which are updated iteratively. Firstly, we formulate low-light video enhancement as a Maximum A Posteriori estimation (MAP) problem with carefully designed spatial and temporal visual regularization. Then, via unrolling the problem, the optimization of the spatial and temporal constraints can be decomposed into different steps and updated in a stage-wise manner. From the spatial perspective, the designed Intra subnet leverages unpair prior information from expert photography retouched skills to adjust the statistical distribution. Additionally, we introduce a novel mechanism that integrates human perception feedback to guide network optimization, suppressing over/under-exposure conditions. Meanwhile, to address the issue from the temporal perspective, the designed Inter subnet fully exploits temporal cues in progressive optimization, which helps achieve improved temporal consistency in enhancement results. Consequently, the proposed method achieves superior performance to state-of-the-art methods in video illumination, noise suppression, and temporal consistency across outdoor and indoor scenes. |
None | |
Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement | 2024-08-20 | ShowIn this paper, we delve into the concept of interpretable image enhancement, a technique that enhances image quality by adjusting filter parameters with easily understandable names such as "Exposure" and "Contrast". Unlike using predefined image editing filters, our framework utilizes learnable filters that acquire interpretable names through training. Our contribution is two-fold. Firstly, we introduce a novel filter architecture called an image-adaptive neural implicit lookup table, which uses a multilayer perceptron to implicitly define the transformation from input feature space to output color space. By incorporating image-adaptive parameters directly into the input features, we achieve highly expressive filters. Secondly, we introduce a prompt guidance loss to assign interpretable names to each filter. We evaluate visual impressions of enhancement results, such as exposure and contrast, using a vision and language model along with guiding prompts. We define a constraint to ensure that each filter affects only the targeted visual impression without influencing other attributes, which allows us to obtain the desired filter effects. Experimental results show that our method outperforms existing predefined filter-based methods, thanks to the filters optimized to predict target results. Our source code is available at https://github.com/satoshi-kosugi/PG-IA-NILUT. |
Accep...Accepted to ACM Multimedia 2024 |
Code Link |
SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement | 2024-08-20 | ShowCurrently, most low-light image enhancement methods only consider information from a single view, neglecting the correlation between cross-view information. Therefore, the enhancement results produced by these methods are often unsatisfactory. In this context, there have been efforts to develop methods specifically for low-light stereo image enhancement. These methods take into account the cross-view disparities and enable interaction between the left and right views, leading to improved performance. However, these methods still do not fully exploit the interaction between left and right view information. To address this issue, we propose a model called Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement (SDI-Net). The backbone structure of SDI-Net is two encoder-decoder pairs, which are used to learn the mapping function from low-light images to normal-light images. Among the encoders and the decoders, we design a module named Cross-View Sufficient Interaction Module (CSIM), aiming to fully exploit the correlations between the binocular views via the attention mechanism. The quantitative and visual results on public datasets validate the superiority of our method over other related methods. Ablation studies also demonstrate the effectiveness of the key elements in our model. |
None | |
UIE-UnFold: Deep Unfolding Network with Color Priors and Vision Transformer for Underwater Image Enhancement | 2024-08-20 | ShowUnderwater image enhancement (UIE) plays a crucial role in various marine applications, but it remains challenging due to the complex underwater environment. Current learning-based approaches frequently lack explicit incorporation of prior knowledge about the physical processes involved in underwater image formation, resulting in limited optimization despite their impressive enhancement results. This paper proposes a novel deep unfolding network (DUN) for UIE that integrates color priors and inter-stage feature transformation to improve enhancement performance. The proposed DUN model combines the iterative optimization and reliability of model-based methods with the flexibility and representational power of deep learning, offering a more explainable and stable solution compared to existing learning-based UIE approaches. The proposed model consists of three key components: a Color Prior Guidance Block (CPGB) that establishes a mapping between color channels of degraded and original images, a Nonlinear Activation Gradient Descent Module (NAGDM) that simulates the underwater image degradation process, and an Inter Stage Feature Transformer (ISF-Former) that facilitates feature exchange between different network stages. By explicitly incorporating color priors and modeling the physical characteristics of underwater image formation, the proposed DUN model achieves more accurate and reliable enhancement results. Extensive experiments on multiple underwater image datasets demonstrate the superiority of the proposed model over state-of-the-art methods in both quantitative and qualitative evaluations. The proposed DUN-based approach offers a promising solution for UIE, enabling more accurate and reliable scientific analysis in marine research. The code is available at https://github.com/CXH-Research/UIE-UnFold. |
Accep...Accepted by DSAA CIVIL 2024 |
Code Link |
Harnessing Multi-resolution and Multi-scale Attention for Underwater Image Restoration | 2024-08-19 | ShowUnderwater imagery is often compromised by factors such as color distortion and low contrast, posing challenges for high-level vision tasks. Recent underwater image restoration (UIR) methods either analyze the input image at full resolution, resulting in spatial richness but contextual weakness, or progressively from high to low resolution, yielding reliable semantic information but reduced spatial accuracy. Here, we propose a lightweight multi-stage network called Lit-Net that focuses on multi-resolution and multi-scale image analysis for restoring underwater images while retaining original resolution during the first stage, refining features in the second, and focusing on reconstruction in the final stage. Our novel encoder block utilizes parallel |
Code Link | |
ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement | 2024-08-19 | ShowLow-light image enhancement remains a challenging task in computer vision, with existing state-of-the-art models often limited by hardware constraints and computational inefficiencies, particularly in handling high-resolution images. Recent foundation models, such as transformers and diffusion models, despite their efficacy in various domains, are limited in use on edge devices due to their computational complexity and slow inference times. We introduce ExpoMamba, a novel architecture that integrates components of the frequency state space within a modified U-Net, offering a blend of efficiency and effectiveness. This model is specifically optimized to address mixed exposure challenges, a common issue in low-light image enhancement, while ensuring computational efficiency. Our experiments demonstrate that ExpoMamba enhances low-light images up to 2-3x faster than traditional models with an inference time of 36.6 ms and achieves a PSNR improvement of approximately 15-20% over competing models, making it highly suitable for real-time image processing applications. |
None | |
DFT-Based Adversarial Attack Detection in MRI Brain Imaging: Enhancing Diagnostic Accuracy in Alzheimer's Case Studies | 2024-08-16 | ShowRecent advancements in deep learning, particularly in medical imaging, have significantly propelled the progress of healthcare systems. However, examining the robustness of medical images against adversarial attacks is crucial due to their real-world applications and profound impact on individuals' health. These attacks can result in misclassifications in disease diagnosis, potentially leading to severe consequences. Numerous studies have explored both the implementation of adversarial attacks on medical images and the development of defense mechanisms against these threats, highlighting the vulnerabilities of deep neural networks to such adversarial activities. In this study, we investigate adversarial attacks on images associated with Alzheimer's disease and propose a defensive method to counteract these attacks. Specifically, we examine adversarial attacks that employ frequency domain transformations on Alzheimer's disease images, along with other well-known adversarial attacks. Our approach utilizes a convolutional neural network (CNN)-based autoencoder architecture in conjunction with the two-dimensional Fourier transform of images for detection purposes. The simulation results demonstrate that our detection and defense mechanism effectively mitigates several adversarial attacks, thereby enhancing the robustness of deep neural networks against such vulnerabilities. |
10 pa...10 pages, 4 figures, conference |
None |
Latent Disentanglement for Low Light Image Enhancement | 2024-08-12 | ShowMany learning-based low-light image enhancement (LLIE) algorithms are based on the Retinex theory. However, the Retinex-based decomposition techniques in such models introduce corruptions which limit their enhancement performance. In this paper, we propose a Latent Disentangle-based Enhancement Network (LDE-Net) for low light vision tasks. The latent disentanglement module disentangles the input image in latent space such that no corruption remains in the disentangled Content and Illumination components. For LLIE task, we design a Content-Aware Embedding (CAE) module that utilizes Content features to direct the enhancement of the Illumination component. For downstream tasks (e.g. nighttime UAV tracking and low-light object detection), we develop an effective light-weight enhancer based on the latent disentanglement framework. Comprehensive quantitative and qualitative experiments demonstrate that our LDE-Net significantly outperforms state-of-the-art methods on various LLIE benchmarks. In addition, the great results obtained by applying our framework on the downstream tasks also demonstrate the usefulness of our latent disentanglement design. |
None | |
Underwater Object Detection Enhancement via Channel Stabilization | 2024-08-02 | ShowThe complex marine environment exacerbates the challenges of object detection manifold. Marine trash endangers the aquatic ecosystem, presenting a persistent challenge. Accurate detection of marine deposits is crucial for mitigating this harm. Our work addresses underwater object detection by enhancing image quality and evaluating detection methods. We use Detectron2's backbone with various base models and configurations for this task. We propose a novel channel stabilization technique alongside a simplified image enhancement model to reduce haze and color cast in training images, improving multi-scale object detection. Following image processing, we test different Detectron2 backbones for optimal detection accuracy. Additionally, we apply a sharpening filter with augmentation techniques to highlight object profiles for easier recognition. Results are demonstrated on the TrashCan Dataset, both instance and material versions. The best-performing backbone method incorporates our channel stabilization and augmentation techniques. We also compare our Detectron2 detection results with the Deformable Transformer. In the instance version of TrashCan 1.0, our method achieves a 9.53% absolute increase in average precision for small objects and a 7% absolute gain in bounding box detection compared to the baseline. The code will be available on Code: https://github.com/aliman80/Underwater- Object-Detection-via-Channel-Stablization |
Code Link | |
Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement | 2024-08-02 | ShowUltra-high-definition (UHD) technology has attracted widespread attention due to its exceptional visual quality, but it also poses new challenges for low-light image enhancement (LLIE) techniques. UHD images inherently possess high computational complexity, leading existing UHD LLIE methods to employ high-magnification downsampling to reduce computational costs, which in turn results in information loss. The wavelet transform not only allows downsampling without loss of information, but also separates the image content from the noise. It enables state space models (SSMs) to avoid being affected by noise when modeling long sequences, thus making full use of the long-sequence modeling capability of SSMs. On this basis, we propose Wave-Mamba, a novel approach based on two pivotal insights derived from the wavelet domain: 1) most of the content information of an image exists in the low-frequency component, less in the high-frequency component. 2) The high-frequency component exerts a minimal influence on the outcomes of low-light enhancement. Specifically, to efficiently model global content information on UHD images, we proposed a low-frequency state space block (LFSSBlock) by improving SSMs to focus on restoring the information of low-frequency sub-bands. Moreover, we propose a high-frequency enhance block (HFEBlock) for high-frequency sub-band information, which uses the enhanced low-frequency information to correct the high-frequency information and effectively restore the correct high-frequency details. Through comprehensive evaluation, our method has demonstrated superior performance, significantly outshining current leading techniques while maintaining a more streamlined architecture. The code is available at https://github.com/AlexZou14/Wave-Mamba. |
10 pa...10 pages, 8 figures, ACMMM2024 accepted |
Code Link |
Mamba-UIE: Enhancing Underwater Images with Physical Model Constraint | 2024-07-31 | ShowIn underwater image enhancement (UIE), convolutional neural networks (CNN) have inherent limitations in modeling long-range dependencies and are less effective in recovering global features. While Transformers excel at modeling long-range dependencies, their quadratic computational complexity with increasing image resolution presents significant efficiency challenges. Additionally, most supervised learning methods lack effective physical model constraint, which can lead to insufficient realism and overfitting in generated images. To address these issues, we propose a physical model constraint-based underwater image enhancement framework, Mamba-UIE. Specifically, we decompose the input image into four components: underwater scene radiance, direct transmission map, backscatter transmission map, and global background light. These components are reassembled according to the revised underwater image formation model, and the reconstruction consistency constraint is applied between the reconstructed image and the original image, thereby achieving effective physical constraint on the underwater image enhancement process. To tackle the quadratic computational complexity of Transformers when handling long sequences, we introduce the Mamba-UIE network based on linear complexity state space models. By incorporating the Mamba in Convolution block, long-range dependencies are modeled at both the channel and spatial levels, while the CNN backbone is retained to recover local features and details. Extensive experiments on three public datasets demonstrate that our proposed Mamba-UIE outperforms existing state-of-the-art methods, achieving a PSNR of 27.13 and an SSIM of 0.93 on the UIEB dataset. Our method is available at https://github.com/zhangsong1213/Mamba-UIE. |
Code Link | |
UniProcessor: A Text-induced Unified Low-level Image Processor | 2024-07-30 | ShowImage processing, including image restoration, image enhancement, etc., involves generating a high-quality clean image from a degraded input. Deep learning-based methods have shown superior performance for various image processing tasks in terms of single-task conditions. However, they require to train separate models for different degradations and levels, which limits the generalization abilities of these models and restricts their applications in real-world. In this paper, we propose a text-induced unified image processor for low-level vision tasks, termed UniProcessor, which can effectively process various degradation types and levels, and support multimodal control. Specifically, our UniProcessor encodes degradation-specific information with the subject prompt and process degradations with the manipulation prompt. These context control features are injected into the UniProcessor backbone via cross-attention to control the processing procedure. For automatic subject-prompt generation, we further build a vision-language model for general-purpose low-level degradation perception via instruction tuning techniques. Our UniProcessor covers 30 degradation types, and extensive experiments demonstrate that our UniProcessor can well process these degradations without additional training or tuning and outperforms other competing methods. Moreover, with the help of degradation-aware context control, our UniProcessor first shows the ability to individually handle a single distortion in an image with multiple degradations. |
None | |
JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement | 2024-07-29 | ShowLow-light image enhancement (LLIE) has achieved promising performance by employing conditional diffusion models. Despite the success of some conditional methods, previous methods may neglect the importance of a sufficient formulation of task-specific condition strategy, resulting in suboptimal visual outcomes. In this study, we propose JoReS-Diff, a novel approach that incorporates Retinex- and semantic-based priors as the additional pre-processing condition to regulate the generating capabilities of the diffusion model. We first leverage pre-trained decomposition network to generate the Retinex prior, which is updated with better quality by an adjustment network and integrated into a refinement network to implement Retinex-based conditional generation at both feature- and image-levels. Moreover, the semantic prior is extracted from the input image with an off-the-shelf semantic segmentation model and incorporated through semantic attention layers. By treating Retinex- and semantic-based priors as the condition, JoReS-Diff presents a unique perspective for establishing an diffusion model for LLIE and similar image enhancement tasks. Extensive experiments validate the rationality and superiority of our approach. |
Accep...Accepted by ACM MM 2024 |
None |
S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks | 2024-07-23 | ShowVision Transformer (ViT) is becoming widely popular in automating accurate disease diagnosis in medical imaging owing to its robust self-attention mechanism. However, ViTs remain vulnerable to adversarial attacks that may thwart the diagnosis process by leading it to intentional misclassification of critical disease. In this paper, we propose a novel image classification pipeline, namely, S-E Pipeline, that performs multiple pre-processing steps that allow ViT to be trained on critical features so as to reduce the impact of input perturbations by adversaries. Our method uses a combination of segmentation and image enhancement techniques such as Contrast Limited Adaptive Histogram Equalization (CLAHE), Unsharp Masking (UM), and High-Frequency Emphasis filtering (HFE) as preprocessing steps to identify critical features that remain intact even after adversarial perturbations. The experimental study demonstrates that our novel pipeline helps in reducing the effect of adversarial attacks by 72.22% for the ViT-b32 model and 86.58% for the ViT-l32 model. Furthermore, we have shown an end-to-end deployment of our proposed method on the NVIDIA Jetson Orin Nano board to demonstrate its practical use case in modern hand-held devices that are usually resource-constrained. |
None | |
AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement | 2024-07-23 | ShowExisting low-light image enhancement (LIE) methods have achieved noteworthy success in solving synthetic distortions, yet they often fall short in practical applications. The limitations arise from two inherent challenges in real-world LIE: 1) the collection of distorted/clean image pairs is often impractical and sometimes even unavailable, and 2) accurately modeling complex degradations presents a non-trivial problem. To overcome them, we propose the Attribute Guidance Diffusion framework (AGLLDiff), a training-free method for effective real-world LIE. Instead of specifically defining the degradation process, AGLLDiff shifts the paradigm and models the desired attributes, such as image exposure, structure and color of normal-light images. These attributes are readily available and impose no assumptions about the degradation process, which guides the diffusion sampling process to a reliable high-quality solution space. Extensive experiments demonstrate that our approach outperforms the current leading unsupervised LIE methods across benchmarks in terms of distortion-based and perceptual-based metrics, and it performs well even in sophisticated wild degradation. |
21 pages, 9 figures | None |
ERD: Exponential Retinex decomposition based on weak space and hybrid nonconvex regularization and its denoising application | 2024-07-21 | ShowThe Retinex theory models the image as a product of illumination and reflection components, which has received extensive attention and is widely used in image enhancement, segmentation and color restoration. However, it has been rarely used in additive noise removal due to the inclusion of both multiplication and addition operations in the Retinex noisy image modeling. In this paper, we propose an exponential Retinex decomposition model based on hybrid non-convex regularization and weak space oscillation-modeling for image denoising. The proposed model utilizes non-convex first-order total variation (TV) and non-convex second-order TV to regularize the reflection component and the illumination component, respectively, and employs weak |
None | |
RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement | 2024-07-20 | ShowIn this paper we propose a novel modification of Contrastive Language-Image Pre-Training (CLIP) guidance for the task of unsupervised backlit image enhancement. Our work builds on the state-of-the-art CLIP-LIT approach, which learns a prompt pair by constraining the text-image similarity between a prompt (negative/positive sample) and a corresponding image (backlit image/well-lit image) in the CLIP embedding space. Learned prompts then guide an image enhancement network. Based on the CLIP-LIT framework, we propose two novel methods for CLIP guidance. First, we show that instead of tuning prompts in the space of text embeddings, it is possible to directly tune their embeddings in the latent space without any loss in quality. This accelerates training and potentially enables the use of additional encoders that do not have a text encoder. Second, we propose a novel approach that does not require any prompt tuning. Instead, based on CLIP embeddings of backlit and well-lit images from training data, we compute the residual vector in the embedding space as a simple difference between the mean embeddings of the well-lit and backlit images. This vector then guides the enhancement network during training, pushing a backlit image towards the space of well-lit images. This approach further dramatically reduces training time, stabilizes training and produces high quality enhanced images without artifacts, both in supervised and unsupervised training regimes. Additionally, we show that residual vectors can be interpreted, revealing biases in training data, and thereby enabling potential bias correction. |
None | |
Dual High-Order Total Variation Model for Underwater Image Restoration | 2024-07-20 | ShowUnderwater images are typically characterized by color cast, haze, blurring, and uneven illumination due to the selective absorption and scattering when light propagates through the water, which limits their practical applications. Underwater image enhancement and restoration (UIER) is one crucial mode to improve the visual quality of underwater images. However, most existing UIER methods concentrate on enhancing contrast and dehazing, rarely pay attention to the local illumination differences within the image caused by illumination variations, thus introducing some undesirable artifacts and unnatural color. To address this issue, an effective variational framework is proposed based on an extended underwater image formation model (UIFM). Technically, dual high-order regularizations are successfully integrated into the variational model to acquire smoothed local ambient illuminance and structure-revealed reflectance in a unified manner. In our proposed framework, the weight factors-based color compensation is combined with the color balance to compensate for the attenuated color channels and remove the color cast. In particular, the local ambient illuminance with strong robustness is acquired by performing the local patch brightest pixel estimation and an improved gamma correction. Additionally, we design an iterative optimization algorithm relying on the alternating direction method of multipliers (ADMM) to accelerate the solution of the proposed variational model. Considerable experiments on three real-world underwater image datasets demonstrate that the proposed method outperforms several state-of-the-art methods with regard to visual quality and quantitative assessments. Moreover, the proposed method can also be extended to outdoor image dehazing, low-light image enhancement, and some high-level vision tasks. The code is available at https://github.com/Hou-Guojia/UDHTV. |
13 pages, 10 figures | Code Link |
Adaptive Frequency Enhancement Network for Single Image Deraining | 2024-07-19 | ShowImage deraining aims to improve the visibility of images damaged by rainy conditions, targeting the removal of degradation elements such as rain streaks, raindrops, and rain accumulation. While numerous single image deraining methods have shown promising results in image enhancement within the spatial domain, real-world rain degradation often causes uneven damage across an image's entire frequency spectrum, posing challenges for these methods in enhancing different frequency components. In this paper, we introduce a novel end-to-end Adaptive Frequency Enhancement Network (AFENet) specifically for single image deraining that adaptively enhances images across various frequencies. We employ convolutions of different scales to adaptively decompose image frequency bands, introduce a feature enhancement module to boost the features of different frequency components and present a novel interaction module for interchanging and merging information from various frequency branches. Simultaneously, we propose a feature aggregation module that efficiently and adaptively fuses features from different frequency bands, facilitating enhancements across the entire frequency spectrum. This approach empowers the deraining network to eliminate diverse and complex rainy patterns and to reconstruct image details accurately. Extensive experiments on both real and synthetic scenes demonstrate that our method not only achieves visually appealing enhancement results but also surpasses existing methods in performance. |
8pages | None |
Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement | 2024-07-18 | ShowDespite recent strides made by AI in image processing, the issue of mixed exposure, pivotal in many real-world scenarios like surveillance and photography, remains inadequately addressed. Traditional image enhancement techniques and current transformer models are limited with primary focus on either overexposure or underexposure. To bridge this gap, we introduce the Unified-Exposure Guided Transformer (Unified-EGformer). Our proposed solution is built upon advanced transformer architectures, equipped with local pixel-level refinement and global refinement blocks for color correction and image-wide adjustments. We employ a guided attention mechanism to precisely identify exposure-compromised regions, ensuring its adaptability across various real-world conditions. U-EGformer, with a lightweight design featuring a memory footprint (peak memory) of only $\sim$1134 MB (0.1 Million parameters) and an inference time of 95 ms (9.61x faster than the average), is a viable choice for real-time applications such as surveillance and autonomous navigation. Additionally, our model is highly generalizable, requiring minimal fine-tuning to handle multiple tasks and datasets with a single architecture. |
Under submission | None |
Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations | 2024-07-17 | ShowCurrent deep learning-based low-light image enhancement methods often struggle with high-resolution images, and fail to meet the practical demands of visual perception across diverse and unseen scenarios. In this paper, we introduce a novel approach termed CoLIE, which redefines the enhancement process through mapping the 2D coordinates of an underexposed image to its illumination component, conditioned on local context. We propose a reconstruction of enhanced-light images within the HSV space utilizing an implicit neural function combined with an embedded guided filter, thereby significantly reducing computational overhead. Moreover, we introduce a single image-based training loss function to enhance the model's adaptability to various scenes, further enhancing its practical applicability. Through rigorous evaluations, we analyze the properties of our proposed framework, demonstrating its superiority in both image quality and scene adaptability. Furthermore, our evaluation extends to applications in downstream tasks within low-light scenarios, underscoring the practical utility of CoLIE. The source code is available at https://github.com/ctom2/colie. |
Accepted to ECCV24 | Code Link |
GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval | 2024-07-17 | ShowMost existing Low-light Image Enhancement (LLIE) methods either directly map Low-Light (LL) to Normal-Light (NL) images or use semantic or illumination maps as guides. However, the ill-posed nature of LLIE and the difficulty of semantic retrieval from impaired inputs limit these methods, especially in extremely low-light conditions. To address this issue, we present a new LLIE network via Generative LAtent feature based codebook REtrieval (GLARE), in which the codebook prior is derived from undegraded NL images using a Vector Quantization (VQ) strategy. More importantly, we develop a generative Invertible Latent Normalizing Flow (I-LNF) module to align the LL feature distribution to NL latent representations, guaranteeing the correct code retrieval in the codebook. In addition, a novel Adaptive Feature Transformation (AFT) module, featuring an adjustable function for users and comprising an Adaptive Mix-up Block (AMB) along with a dual-decoder architecture, is devised to further enhance fidelity while preserving the realistic details provided by codebook prior. Extensive experiments confirm the superior performance of GLARE on various benchmark datasets and real-world data. Its effectiveness as a preprocessing tool in low-light object detection tasks further validates GLARE for high-level vision applications. Code is released at https://github.com/LowLevelAI/GLARE. |
Accep...Accepted by ECCV 2024 |
Code Link |
NamedCurves: Learned Image Enhancement via Color Naming | 2024-07-13 | ShowA popular method for enhancing images involves learning the style of a professional photo editor using pairs of training images comprised of the original input with the editor-enhanced version. When manipulating images, many editing tools offer a feature that allows the user to manipulate a limited selection of familiar colors. Editing by color name allows easy adjustment of elements like the "blue" of the sky or the "green" of trees. Inspired by this approach to color manipulation, we propose NamedCurves, a learning-based image enhancement technique that separates the image into a small set of named colors. Our method learns to globally adjust the image for each specific named color via tone curves and then combines the images using an attention-based fusion mechanism to mimic spatial editing. We demonstrate the effectiveness of our method against several competing methods on the well-known Adobe 5K dataset and the PPR10K dataset, showing notable improvements. |
Europ...European Conference on Computer Vision ECCV 2024 |
None |
Taming Lookup Tables for Efficient Image Retouching | 2024-07-13 | ShowThe widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To this end, we propose Image Color Enhancement Lookup Table (ICELUT) that adopts LUTs for extremely efficient edge inference, without any convolutional neural network (CNN). During training, we leverage pointwise (1x1) convolution to extract color information, alongside a split fully connected layer to incorporate global information. Both components are then seamlessly converted into LUTs for hardware-agnostic deployment. ICELUT achieves near-state-of-the-art performance and remarkably low power consumption. We observe that the pointwise network structure exhibits robust scalability, upkeeping the performance even with a heavily downsampled 32x32 input image. These enable ICELUT, the first-ever purely LUT-based image enhancer, to reach an unprecedented speed of 0.4ms on GPU and 7ms on CPU, at least one order faster than any CNN solution. Codes are available at https://github.com/Stephen0808/ICELUT. |
Accepted by ECCV2024 | Code Link |
LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models | 2024-07-12 | ShowIn this paper, we propose a diffusion-based unsupervised framework that incorporates physically explainable Retinex theory with diffusion models for low-light image enhancement, named LightenDiffusion. Specifically, we present a content-transfer decomposition network that performs Retinex decomposition within the latent space instead of image space as in previous approaches, enabling the encoded features of unpaired low-light and normal-light images to be decomposed into content-rich reflectance maps and content-free illumination maps. Subsequently, the reflectance map of the low-light image and the illumination map of the normal-light image are taken as input to the diffusion model for unsupervised restoration with the guidance of the low-light feature, where a self-constrained consistency loss is further proposed to eliminate the interference of normal-light content on the restored results to improve overall visual quality. Extensive experiments on publicly available real-world benchmarks show that the proposed LightenDiffusion outperforms state-of-the-art unsupervised competitors and is comparable to supervised methods while being more generalizable to various scenes. Our code is available at https://github.com/JianghaiSCU/LightenDiffusion. |
Accep...Accepted by ECCV 2024 |
Code Link |
A Lightweight Low-Light Image Enhancement Network via Channel Prior and Gamma Correction | 2024-07-10 | ShowHuman vision relies heavily on available ambient light to perceive objects. Low-light scenes pose two distinct challenges: information loss due to insufficient illumination and undesirable brightness shifts. Low-light image enhancement (LLIE) refers to image enhancement technology tailored to handle this scenario. We introduce CPGA-Net, an innovative LLIE network that combines dark/bright channel priors and gamma correction via deep learning and integrates features inspired by the Atmospheric Scattering Model and the Retinex Theory. This approach combines the use of traditional and deep learning methodologies, designed within a simple yet efficient architectural framework that focuses on essential feature extraction. The resulting CPGA-Net is a lightweight network with only 0.025 million parameters and 0.030 seconds for inference time, yet it achieves superior performance over existing LLIE methods on both objective and subjective evaluation criteria. Furthermore, we utilized knowledge distillation with explainable factors and proposed an efficient version that achieves 0.018 million parameters and 0.006 seconds for inference time. The proposed approaches inject new solution ideas into LLIE, providing practical applications in challenging low-light scenarios. |
Prepr...Preprint of an article submitted for consideration in [International Journal of Pattern Recognition and Artificial Intelligence] \c{opyright} [2024] [copyright World Scientific Publishing Company] [https://www.worldscientific.com/worldscinet/ijprai] |
None |
CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement | 2024-07-10 | ShowLow-Light Image Enhancement (LLIE) has advanced with the surge in phone photography demand, yet many existing methods neglect compression, a crucial concern for resource-constrained phone photography. Most LLIE methods overlook this, hindering their effectiveness. In this study, we investigate the effects of JPEG compression on low-light images and reveal substantial information loss caused by JPEG due to widespread low pixel values in dark areas. Hence, we propose the Compression-Aware Pre-trained Transformer (CAPformer), employing a novel pre-training strategy to learn lossless information from uncompressed low-light images. Additionally, the proposed Brightness-Guided Self-Attention (BGSA) mechanism enhances rational information gathering. Experiments demonstrate the superiority of our approach in mitigating compression effects on LLIE, showcasing its potential for improving LLIE in resource-constrained scenarios. |
None | |
Vision Language Model-Empowered Contract Theory for AIGC Task Allocation in Teleoperation | 2024-07-09 | ShowIntegrating low-light image enhancement techniques, in which diffusion-based AI-generated content (AIGC) models are promising, is necessary to enhance nighttime teleoperation. Remarkably, the AIGC model is computation-intensive, thus necessitating the allocation of AIGC tasks to edge servers with ample computational resources. Given the distinct cost of the AIGC model trained with varying-sized datasets and AIGC tasks possessing disparate demand, it is imperative to formulate a differential pricing strategy to optimize the utility of teleoperators and edge servers concurrently. Nonetheless, the pricing strategy formulation is under information asymmetry, i.e., the demand (e.g., the difficulty level of AIGC tasks and their distribution) of AIGC tasks is hidden information to edge servers. Additionally, manually assessing the difficulty level of AIGC tasks is tedious and unnecessary for teleoperators. To this end, we devise a framework of AIGC task allocation assisted by the Vision Language Model (VLM)-empowered contract theory, which includes two components: VLM-empowered difficulty assessment and contract theory-assisted AIGC task allocation. The first component enables automatic and accurate AIGC task difficulty assessment. The second component is capable of formulating the pricing strategy for edge servers under information asymmetry, thereby optimizing the utility of both edge servers and teleoperators. The simulation results demonstrated that our proposed framework can improve the average utility of teleoperators and edge servers by 10.88 |
11 pages, 10 figures | Code Link |
Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization | 2024-07-09 | ShowDiffusion models have demonstrated impressive performance in various image generation, editing, enhancement and translation tasks. In particular, the pre-trained text-to-image stable diffusion models provide a potential solution to the challenging realistic image super-resolution (Real-ISR) and image stylization problems with their strong generative priors. However, the existing methods along this line often fail to keep faithful pixel-wise image structures. If extra skip connections between the encoder and the decoder of a VAE are used to reproduce details, additional training in image space will be required, limiting the application to tasks in latent space such as image stylization. In this work, we propose a pixel-aware stable diffusion (PASD) network to achieve robust Real-ISR and personalized image stylization. Specifically, a pixel-aware cross attention module is introduced to enable diffusion models perceiving image local structures in pixel-wise level, while a degradation removal module is used to extract degradation insensitive features to guide the diffusion process together with image high level information. An adjustable noise schedule is introduced to further improve the image restoration results. By simply replacing the base diffusion model with a stylized one, PASD can generate diverse stylized images without collecting pairwise training data, and by shifting the base model with an aesthetic one, PASD can bring old photos back to life. Extensive experiments in a variety of image enhancement and stylization tasks demonstrate the effectiveness of our proposed PASD approach. Our source codes are available at \url{https://github.com/yangxy/PASD/}. |
Code Link | |
7T MRI Synthesization from 3T Acquisitions | 2024-07-08 | ShowSupervised deep learning techniques can be used to generate synthetic 7T MRIs from 3T MRI inputs. This image enhancement process leverages the advantages of ultra-high-field MRI to improve the signal-to-noise and contrast-to-noise ratios of 3T acquisitions. In this paper, we introduce multiple novel 7T synthesization algorithms based on custom-designed variants of the V-Net convolutional neural network. We demonstrate that the V-Net based model has superior performance in enhancing both single-site and multi-site MRI datasets compared to the existing benchmark model. When trained on 3T-7T MRI pairs from 8 subjects with mild Traumatic Brain Injury (TBI), our model achieves state-of-the-art 7T synthesization performance. Compared to previous works, synthetic 7T images generated from our pipeline also display superior enhancement of pathological tissue. Additionally, we implement and test a data augmentation scheme for training models that are robust to variations in the input distribution. This allows synthetic 7T models to accommodate intra-scanner and inter-scanner variability in multisite datasets. On a harmonized dataset consisting of 18 3T-7T MRI pairs from two institutions, including both healthy subjects and those with mild TBI, our model maintains its performance and can generalize to 3T MRI inputs with lower resolution. Our findings demonstrate the promise of V-Net based models for MRI enhancement and offer a preliminary probe into improving the generalizability of synthetic 7T models with data augmentation. |
13 pa...13 pages including supplemental materials, accepted for presentation at MICCAI 2024 |
None |
EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy | 2024-07-08 | ShowWireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels remains underexplored. To tackle this, we introduce EndoUIC, a WCE unified illumination correction solution using an end-to-end promptable diffusion transformer (DiT) model. In our work, the illumination prompt module shall navigate the model to adapt to different exposure levels and perform targeted image enhancement, in which the Adaptive Prompt Integration (API) and Global Prompt Scanner (GPS) modules shall further boost the concurrent representation learning between the prompt parameters and features. Besides, the U-shaped restoration DiT model shall capture the long-range dependencies and contextual information for unified illumination restoration. Moreover, we present a novel Capsule-endoscopy Exposure Correction (CEC) dataset, including ground-truth and corrupted image pairs annotated by expert photographers. Extensive experiments against a variety of state-of-the-art (SOTA) methods on four datasets showcase the effectiveness of our proposed method and components in WCE illumination restoration, and the additional downstream experiments further demonstrate its utility for clinical diagnosis and surgical assistance. |
To ap...To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC |
Code Link |
Synthetically Enhanced: Unveiling Synthetic Data's Potential in Medical Imaging Research | 2024-07-08 | ShowChest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enhancing dataset diversity. This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research. The study employed DDPMs to create synthetic CXRs conditioned on demographic and pathological characteristics from the CheXpert dataset. These synthetic images were used to supplement training datasets for pathology classifiers, with the aim of improving their performance. The evaluation involved three datasets (CheXpert, MIMIC-CXR, and Emory Chest X-ray) and various experiments, including supplementing real data with synthetic data, training with purely synthetic data, and mixing synthetic data with external datasets. Performance was assessed using the area under the receiver operating curve (AUROC). Adding synthetic data to real datasets resulted in a notable increase in AUROC values (up to 0.02 in internal and external test sets with 1000% supplementation, p-value less than 0.01 in all instances). When classifiers were trained exclusively on synthetic data, they achieved performance levels comparable to those trained on real data with 200%-300% data supplementation. The combination of real and synthetic data from different sources demonstrated enhanced model generalizability, increasing model AUROC from 0.76 to 0.80 on the internal test set (p-value less than 0.01). In conclusion, synthetic data supplementation significantly improves the performance and generalizability of pathology classifiers in medical imaging. |
None | |
Image-Conditional Diffusion Transformer for Underwater Image Enhancement | 2024-07-07 | ShowUnderwater image enhancement (UIE) has attracted much attention owing to its importance for underwater operation and marine engineering. Motivated by the recent advance in generative models, we propose a novel UIE method based on image-conditional diffusion transformer (ICDT). Our method takes the degraded underwater image as the conditional input and converts it into latent space where ICDT is applied. ICDT replaces the conventional U-Net backbone in a denoising diffusion probabilistic model (DDPM) with a transformer, and thus inherits favorable properties such as scalability from transformers. Furthermore, we train ICDT with a hybrid loss function involving variances to achieve better log-likelihoods, which meanwhile significantly accelerates the sampling process. We experimentally assess the scalability of ICDTs and compare with prior works in UIE on the Underwater ImageNet dataset. Besides good scaling properties, our largest model, ICDT-XL/2, outperforms all comparison methods, achieving state-of-the-art (SOTA) quality of image enhancement. |
None | |
A Physical Model-Guided Framework for Underwater Image Enhancement and Depth Estimation | 2024-07-05 | ShowDue to the selective absorption and scattering of light by diverse aquatic media, underwater images usually suffer from various visual degradations. Existing underwater image enhancement (UIE) approaches that combine underwater physical imaging models with neural networks often fail to accurately estimate imaging model parameters such as depth and veiling light, resulting in poor performance in certain scenarios. To address this issue, we propose a physical model-guided framework for jointly training a Deep Degradation Model (DDM) with any advanced UIE model. DDM includes three well-designed sub-networks to accurately estimate various imaging parameters: a veiling light estimation sub-network, a factors estimation sub-network, and a depth estimation sub-network. Based on the estimated parameters and the underwater physical imaging model, we impose physical constraints on the enhancement process by modeling the relationship between underwater images and desired clean images, i.e., outputs of the UIE model. Moreover, while our framework is compatible with any UIE model, we design a simple yet effective fully convolutional UIE model, termed UIEConv. UIEConv utilizes both global and local features for image enhancement through a dual-branch structure. UIEConv trained within our framework achieves remarkable enhancement results across diverse underwater scenes. Furthermore, as a byproduct of UIE, the trained depth estimation sub-network enables accurate underwater scene depth estimation. Extensive experiments conducted in various real underwater imaging scenarios, including deep-sea environments with artificial light sources, validate the effectiveness of our framework and the UIEConv model. |
This ...This work has been submitted to the IEEE for possible publication |
None |
IDA-UIE: An Iterative Framework for Deep Network-based Degradation Aware Underwater Image Enhancement | 2024-06-26 | ShowUnderwater image quality is affected by fluorescence, low illumination, absorption, and scattering. Recent works in underwater image enhancement have proposed different deep network architectures to handle these problems. Most of these works have proposed a single network to handle all the challenges. We believe that deep networks trained for specific conditions deliver better performance than a single network learned from all degradation cases. Accordingly, the first contribution of this work lies in the proposal of an iterative framework where a single dominant degradation condition is identified and resolved. This proposal considers the following eight degradation conditions -- low illumination, low contrast, haziness, blurred image, presence of noise and color imbalance in three different channels. A deep network is designed to identify the dominant degradation condition. Accordingly, an appropriate deep network is selected for degradation condition-specific enhancement. The second contribution of this work is the construction of degradation condition specific datasets from good quality images of two standard datasets (UIEB and EUVP). This dataset is used to learn the condition specific enhancement networks. The proposed approach is found to outperform nine baseline methods on UIEB and EUVP datasets. |
None | |
A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning | 2024-06-26 | ShowUnderwater image enhancement (UIE) presents a significant challenge within computer vision research. Despite the development of numerous UIE algorithms, a thorough and systematic review is still absent. To foster future advancements, we provide a detailed overview of the UIE task from several perspectives. Firstly, we introduce the physical models, data construction processes, evaluation metrics, and loss functions. Secondly, we categorize and discuss recent algorithms based on their contributions, considering six aspects: network architecture, learning strategy, learning stage, auxiliary tasks, domain perspective, and disentanglement fusion. Thirdly, due to the varying experimental setups in the existing literature, a comprehensive and unbiased comparison is currently unavailable. To address this, we perform both quantitative and qualitative evaluations of state-of-the-art algorithms across multiple benchmark datasets. Lastly, we identify key areas for future research in UIE. A collection of resources for UIE can be found at {https://github.com/YuZhao1999/UIE}. |
A sur...A survey on the underwater image enhancement task |
Code Link |
Quality-guided Skin Tone Enhancement for Portrait Photography | 2024-06-22 | ShowIn recent years, learning-based color and tone enhancement methods for photos have become increasingly popular. However, most learning-based image enhancement methods just learn a mapping from one distribution to another based on one dataset, lacking the ability to adjust images continuously and controllably. It is important to enable the learning-based enhancement models to adjust an image continuously, since in many cases we may want to get a slighter or stronger enhancement effect rather than one fixed adjusted result. In this paper, we propose a quality-guided image enhancement paradigm that enables image enhancement models to learn the distribution of images with various quality ratings. By learning this distribution, image enhancement models can associate image features with their corresponding perceptual qualities, which can be used to adjust images continuously according to different quality scores. To validate the effectiveness of our proposed method, a subjective quality assessment experiment is first conducted, focusing on skin tone adjustment in portrait photography. Guided by the subjective quality ratings obtained from this experiment, our method can adjust the skin tone corresponding to different quality requirements. Furthermore, an experiment conducted on 10 natural raw images corroborates the effectiveness of our model in situations with fewer subjects and fewer shots, and also demonstrates its general applicability to natural images. Our project page is https://github.com/IntMeGroup/quality-guided-enhancement . |
Code Link | |
LU2Net: A Lightweight Network for Real-time Underwater Image Enhancement | 2024-06-21 | ShowComputer vision techniques have empowered underwater robots to effectively undertake a multitude of tasks, including object tracking and path planning. However, underwater optical factors like light refraction and absorption present challenges to underwater vision, which cause degradation of underwater images. A variety of underwater image enhancement methods have been proposed to improve the effectiveness of underwater vision perception. Nevertheless, for real-time vision tasks on underwater robots, it is necessary to overcome the challenges associated with algorithmic efficiency and real-time capabilities. In this paper, we introduce Lightweight Underwater Unet (LU2Net), a novel U-shape network designed specifically for real-time enhancement of underwater images. The proposed model incorporates axial depthwise convolution and the channel attention module, enabling it to significantly reduce computational demands and model parameters, thereby improving processing speed. The extensive experiments conducted on the dataset and real-world underwater robots demonstrate the exceptional performance and speed of proposed model. It is capable of providing well-enhanced underwater images at a speed 8 times faster than the current state-of-the-art underwater image enhancement method. Moreover, LU2Net is able to handle real-time underwater video enhancement. |
None |