Skip to content

Latest commit

 

History

History
972 lines (920 loc) · 51 KB

File metadata and controls

972 lines (920 loc) · 51 KB

LOGO

📖 Recommendations of Document Image Processing

This repository contains a paper collection of the methods for document image processing, including appearance enhancement, deshadow, dewarping, deblur, and binarization.

🔥 Contents

1. Registration

Document registration (also known as document alignment) aims to densely map two document images with the same content (such as a scanned and photographed version of the same document). It has important applications in automated data annotation and template-based dewarping tasks.

1.1 Papers

Year Venue Title Repo
2023 IJDAR Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping Code
2023 Arxiv DocAligner: Annotating real-world photographic document images by simply taking pictures Code
2024 ACM MM Document Registration: Towards Automated Labeling of Pixel-Level Alignment Between Warped-Flat Documents
2024 ICDAR Coarse-to-Fine Document Image Registration for Dewarping Code

1.2 Datasets

Dataset Num. (train/test) Type Example Download
DocAlign12K 12K (10K/2K) Synth Example Link

1.3 SOTA

Venue Method DocUNet (130)
MS-SSIM↑ AD↓
Arxiv'23 DocAligner 0.8232 0.0445

2. Appearance Enhancement

Appearance enhancement (also known as illumination correction) is not limited to a specific degradation type and aims to restore a clean appearance similar to that obtained from a scanner or digital born PDF files.

2.1 Papers

Year Venue Title Repo
2019 ACM TOG Document Rectification and Illumination Correction using a Patch-based CNN Code
2020 BMVC Intrinsic Decomposition of Document Images In-the-wild Code
2021 ICCV DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks Code
2021 ACM MM DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction Code
2022 CVPR Fourier Document Restoration for Robust Document Dewarping and Recognition
2022 ACM MM UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior Code
2023 TAI Appearance Enhancement for Camera-captured Document Images in the Wild Code
2023 ICCVW Template-guided Illumination Correction for Document Images with Imperfect Geometric Reconstruction Code
2023 arxiv DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF Versions
2024 ICASSP Efficient Joint Rectification of Photometric and Geometric Distortions in Document Images
2024 CVPR DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks Code

2.2 Datasets

Dataset Num. (train/test) Type Example Download
Doc3DShade 90K Synth Example Link
DocProj 2450 Synth Example Link
DocUNet from DocAligner 130 Real Example Link
RealDAE 600 (450/150) Real Example Link
Inv3D 25K Synth Example Link

2.3 Apps

2.4 SOTA

Venue Methods Training data DocUNet from DocAligner (130) RealDAE (150)
SSIM PSNR SSIM PSNR
- - - 0.7195 13.09 0.8264 12.26
TOG'19 DocProj DocProj 0.7098 14.71 0.8684 19.35
BMVC'20 Das et al. Doc3DShade 0.7276 16.42 0.8633 19.87
MM'21 DocTr DocProj 0.7067 15.78 0.7925 18.62
MM'22 UDoc-GAN DocProj 0.6833 14.29 0.7558 16.43
TAI'23 GCDRNet RealDAE 0.7658 17.09 0.9423 24.42
CVPR'24 DocRes 0.7598 17.60 0.9219 24.65

3. Deshadow

Deshadowing aims to eliminate shadows that are mainly caused by occlusion to obtain shadow-free document images.

3.1 Papers

Year Venue Title Repo
2018 CVPR Document Enhancement Using Visibility Detection Code
2020 CVPR BEDSR-Net A Deep Shadow Removal Network from a Single Document Image Code*
2022 ICPR Document Shadow Removal with Foreground Detection Learning From Fully Synth Images Code
2022 MERCon Shadow Removal for Documents with Reflective Textured Surface
2023 ICASSP ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal Code
2023 ICASSP Shadow Removal of Text Document Images Using Background Estimation and Adaptive Text Enhancement
2023 ICASSP LP-IOANet: Efficient High Resolution Document Shadow Removal
2023 Optical Review Shadow removal from document image based on background estimation employing selective median filter and black-top-hat transform
2023 CVPR Document Image Shadow Removal Guided by Color-Aware Background Code
2023 arxiv ShaDocFormer: A Shadow-attentive Threshold Detector with Cascaded Fusion Refiner for document shadow removal
2023 ICCV High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net Code
2023 Sensors Synthetic Document Images with Diverse Shadows for Deep Shadow Removal Networks Code
2024 AAAI DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degradations Code
2024 CVPR DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks Code
2024 IJDAR Am I readable? Transfer learning based document image rectification

* indicates that the implementation is unofficial.

3.2 Datasets

Dataset Num. (train/test) Type Example Download
RDD 4916 (4371/545) Real Example Link
Kligler et al. 300 Real Example Link
FSDSRD 14200 Synth Example Link
Jung et al. 87 Real Example Link
OSR 237 Real Example Link
WEZUT OCR 176 Real Example Link
SD7K 7620 (6479/760) Real Example Link
SynDocDS 50K (40K/5K) Synth Link

3.3 SOTA

Venue Method Training data Kligler et al. (300) Jung et al. (87) OSR (237) RDD (545) SD7K (760)
RMSE↓ PSNR↑ SSIM↑ RMSE↓ PSNR↑ SSIM↑ RMSE↓ PSNR↑ SSIM↑ RMSE↓ PSNR↑ SSIM↑ RMSE↓ PSNR↑ SSIM↑
CVPR'23 BGShadowNet RDD 5.377 29.17 0.948 2.219 37.58 0.983
ICCV'23 FSENet SD7K 10.60 28.98 0.93 17.56 23.60 0.85 10.00 28.67 0.96
CVPR'24 DocRes 27.14 0.900 23.02 0.908 21.64 0.937

4. Dewarping

Dewarping, also referred to as geometric rectification, aims to rectify document images that suffer from curves, folds, crumples, perspective/affine deformation and other geometric distortions.

4.1 Papers

Year Venue Title Repo
2018 CVPR DocUNet: Document Image Unwarping via A Stacked U-Net
2019 TOG Document Rectification and Illumination Correction using a Patch-based CNN Code
2019 ICCV DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks Code
2020 PR Geometric Rectification of Document Images using Adversarial Gated Unwarping Network
2020 ECCV Can You Read Me Now? Content Aware Rectification using Angle Supervision
2020 DAS Dewarping Document Image by Displacement Flow Estimation with Fully Convolutional Network Code
2021 ACM MM DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction Code
2021 ICCV End-to-end Piece-wise Unwarping of Document Images Code
2021 ICDAR Document Dewarping with Control Points Code
2022 CVPR Fourier Document Restoration for Robust Document Dewarping and Recognition
2022 CVPR Revisiting Document Image Dewarping by Grid Regularization
2022 ACM MM Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild
2022 SIGGRAPH Learning From Documents in the Wild to Improve Document Unwarping Code
2022 ECCV Geometric Representation Learning for Document Image Rectification Code
2022 ECCV Learning an Isometric Surface Parameterization for Texture Unwrapping Code
2022 Arxiv DocScanner: Robust Document Image Rectification with Progressive Learning Code
2022 ICPR Document Image Rectification in Complex Scene Using Stacked Siamese Networks
2023 Arxiv Geometric Rectification of Creased Document Images based on Isometric Mapping
2023 IJDAR Adaptive Dewarping of Severely Warped Camera-captured Document Images Based on Document Map Generation
2023 TMM Deep Unrestricted Document Image Rectification Code
2023 Arxiv Neural Document Unwarping using Coupled Grids
2023 IJDAR Inv3D: A High-resolution 3D Invoice Dataset for Template-guided Single-image Document Unwarping Code
2023 Arxiv MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary
2023 ICCVW Template-guided Illumination Correction for Document Images with Imperfect Geometric Reconstruction Code
2023 ICCV Foreground and Text-lines Aware Document Image Rectification Code
2023 ACM TOG Layout-Aware Single-Image Document Flatening Code
2023 WACV DocReal: Robust Document Dewarping of Real-Life Images via Attention-Enhanced Control Point Prediction Code
2023 TCSVT Rethinking Supervision in Document Unwarping: A Self-consistent Flow-free Approach
2023 SIGGRAPH UVDoc: Neural Grid-based Document Unwarping Code
2023 Arxiv Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints under Polar Representation
2024 ICASSP Efficient Joint Rectification of Photometric and Geometric Distortions in Document Images
2024 ICDAR Coarse-to-Fine Document Image Registration for Dewarping Code
2024 CVPR DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks Code
2024 IJDAR Am I readable? Transfer learning based document image rectification
2024 ACM MM Document Registration: Towards Automated Labeling of Pixel-Level Alignment Between Warped-Flat Documents

4.2 Dataset

Dataset Num. Type Example Download/Codes
DocUNet 130 Real Example Link
Doc3D 100K Synth - Link
DIW 5K Real Example Link
WarpDoc 1020 Real Example Link
DIR300 300 Real Example Link
Inv3D 25K Synth Example Link
Inv3DReal 360 Real Example Link
DICP - Synth - Link
DIF - Synth - Link
Simulated Paper 90K Synth - Link
DocReal 200 Real Example Link
UVDoc 20K Synth Example Link
WarpDoc-R 840 Real

4.3 SOTA

Venue Method DocUNet (130) DIR300 (300) DocReal (200) UVDoc (50)
MS-SSIM↑ LD↓ AD↓ MS-SSIM↑ LD↓ AD↓ MS-SSIM↑ LD↓ MS-SSIM↑ AD↓
ICCV'19 DewarpNet 0.474 8.39 0.426 0.492 13.94 0.331 0.589 0.193
DAS'20 FCN-based 0.448 7.84 0.434 0.503 9.75 0.331
ICCV'21 Piece-Wise 0.492 8.64 0.468
ICDAR'21 DDCP 0.473 8.99 0.453 0.552 10.95 0.357 0.46 16.04 0.585 0.290
MM'21 DocTr 0.511 7.76 0.396 0.616 7.21 0.254 0.55 12.66 0.697 0.160
CVPR'22 RDGR 0.497 8.51 0.461 0.610 0.280
MM'22 Marior 0.478 7.27 0.403
ECCV'22 DocGeoNet 0.504 7.71 0.380 0.638 6.40 0.242 0.55 12.22 0.706 0.168
SIGGRAPH'22 PaperEdge 0.473 7.81 0.392 0.583 8.00 0.255 0.52 11.46
Arxiv'22 DocScanner-L 0.518 7.45 0.334
ICCV'23 Li et al. 0.497 8.43 0.376 0.607 7.68 0.244
WACV'23 DocReal 0.50 7.03 0.56 9.83
TCSVT'23 DRNet 0.51 7.42
TMM'23 DocTr++ 0.51 7.54 0.45 19.88
Arxiv'23 Polar-Doc 0.605 7.17 0.206
Arxiv'23 MetaDoc 0.502 7.42 0.315 0.638 5.75 0.178
SIGGRAPH'23 UVDoc 0.544 6.83 0.315 0.785 0.119
ACM TOG'23 LA-DocFlatten 0.526 6.72 0.300 0.651 5.70 0.195
CVPR'24 DocRes 0.626 6.83 0.241
IJDAR'24 DocTLNet 0.51 6.70 0.658 5.75
  • Note that the 127th and 128th distorted images in DocUNet benchmark are rotated by 180 degrees, which does not match the ground truth documents. The performance reported here is based on corrected data.
  • Note that the UVDoc benchmark reported in our repository is based on the full UVDoc benchmark dataset (reported on the official github page). The results in the paper used only half of the UVDoc benchmark.

5. Deblur

5.1 Papers

Year Venue Title Repo
2019 NIPS SVDocNet: Spatially Variant U-Net for Blind Document Deblurring
2019 MTA DeepDeblur: text image recovery from blur to sharp code
2020 TPAMI DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement code
2021 ICCV End-to-End Unsupervised Document Image Blind Denoising
2023 ACM MM DocDiff: Document Enhancement via Residual Diffusion ModelscDiff code
2024 AAAI DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degradations Code
2024 CVPR DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks Code
2024 Arxiv NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement Code

5.2 Datasets

Dataset Num. (train/test) Type Example Download
TDD (text deblur dataset) 67.6K (66K/1.6K) Synth Example Link1, Link2

5.3 SOTA

Coming Soon ...

6. Binarization

6.1 Papers

Year Venue Title Repo
2019 PR DeepOtsu: Document enhancement and binarization using iterative deep learning code
2021 PR Complex image processing with less data—Document image binarization by integrating multiple pre-trained U-Net modules code
2022 PR Two-Stage Generative Adversarial Networks for Binarization of Color Document Images code
2023 PR GDB: Gated Convolutions-based Document Binarization code
2023 ACM MM DocDiff: Document Enhancement via Residual Diffusion ModelscDiff code
2023 ICDAR ColDBin: Cold Diffusion for Document Image Binarization code
2023 IF A Novel Degraded Document Binarization Model through Vision Transformer Network
2023 Arxiv DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization
2024 AAAI DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degradations Code
2024 CVPR DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks Code
2024 Arxiv NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement Code

6.2 Datasets

Dataset Num. Type Example Download
DocEng 2019 15 Real Example Link
DocEng 2020 32 Real Example Link
DocEng 2021 222 Real Example Link
DocEng 2022 80 Real Example Link
DIBCO 2009 10 Real Example Link
H-DIBCO 2010 10 Real Example Link
DIBCO 2011 16 Real Example Link
H-DIBCO 2012 14 Real Example Link
DIBCO 2013 16 Real Example Link
H-DIBCO 2014 10 Real Example Link
H-DIBCO 2016 10 Real Example Link
DIBCO 2017 20 Real Example Link
DIBCO 2018 10 Real Example Link
DIBCO 2019 10 Real Example Link
Bickly-diary 7 Real Example Link
Synchromedia Multispectral (MSI) 240 Real Example Link
Persian Heritage Image Binarization (PHIBD) 15 Real Example Link
Palm Leaf 50 Real Example Link
NoiseOffice 216 Synth Example Link
LRDE Document Binarization Dataset 125 Real - Link
Shipping label dataset 1082 Real Example Link

6.3 SOTA

Coming Soon ...

⭐ Star Rising

Star Rising