This repo aims to use deeplabv3+ to remove the handwriting on the papers. I define the task of removing handwriting as a segmentation task. So we use a segmentation model named Deeplabv3+ to solve this task.
Python 3.7
Pytorch 1.7+
pip install -r requirements.txt
We use one dataset supported by Baidu Aistdio at https://aistudio.baidu.com/aistudio/datasetdetail/121039
You can use the flowing links to download the Pretrained model and the datasets.
Pretrained model: https://pan.baidu.com/s/1QmCGhwpKNtdKtYoGU0YK3w?pwd=a2jh
Datasets: https://pan.baidu.com/s/1l2mQ3FLx_NAPJEh30cOeZA?pwd=05bl
-
To optimize the segmentation result we use the tracks as follows:
-
Use the focal loss to replace the cross-entropy loss. This will help us solve the class imbalance problem in this task.
-
Use overlapping cropping to enhance the datasets.
-
Don't directly resize the image when predicting the result. Instead, I cut down the input image into many small images than predicting them separately.
Overall Acc | Mean Acc | FreqW Acc | Mean IoU |
---|---|---|---|
0.990526 | 0.920829 | 0.982231 | 0.867805 |
After downloading the ckpt and datasets, you can use :
python3 main.py --data_root /home/disk2/ray/datasets/HandWriting --loss_type focal_loss --gpu_id 2 --batch_size 4
to train your own model.
You can also use:
python3 main.py --data_root /home/disk2/ray/datasets/HandWriting --loss_type focal_loss --gpu_id 2 --batch_size 4 --ckpt checkpoints/best_deeplabv3plus_resnet50_os16.pth --test_only --save_val_results
to test your model and generate the predicted result at ./result
.
[1] Rethinking Atrous Convolution for Semantic Image Segmentation
[2] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation