The official pytorch implementation of Paper: RECOGNITION-GUIDED DIFFUSION MODEL FOR SCENE TEXT IMAGE SUPER-RESOLUTION
Environment preparation: (Python 3.8 + PyTorch 1.7.0 + Torchvision 0.8.1 + pytorch_lightning 1.5.10 + CUDA 11.0)
conda create -n RGDiffSR python=3.8
git clone [email protected]:shercoo/RGDiffSR.git
cd RGDiffSR
pip install -r requirements.txt
You can also refer to taming-transformers for the installation of taming-transformers library (Needed if VQGAN is applied).
Download the TextZoom dataset at TextZoom.
Download the pre-trained recognizers Aster, Moran, CRNN.
Download the checkpoints of pre-trained VQGAN and RGDiffSR at Baidu Netdisk. Password: yws3
First train the latent encoder (VQGAN) model.
CUDA_VISIBLE_DEVICES=<GPU_IDs> python main.py -b configs/autoencoder/vqgan_2x.yaml -t --gpus <GPU_IDS>
Put the pre-trained VQGAN model in checkpoints/
.
CUDA_VISIBLE_DEVICES=<GPU_IDs> python main.py -b configs/latent-diffusion/sr_best.yaml -t --gpus <GPU_IDS>
Put the pre-trained RGDiffSR model in checkpoints/
.
CUDA_VISIBLE_DEVICES=<GPU_IDs> python test.py -b configs/latent-diffusion/sr_test.yaml --gpus <GPU_IDS>
You can manually modify the test dataset directory in sr_test.yaml
for test on different difficulty of TextZoom dataset.
The model is licensed under the MIT license.
Our code is built on the latent-diffusion and TATT repositories. Thanks to their research!