Name		Name	Last commit message	Last commit date
parent directory ..
conf		conf
figures		figures
models		models
LICENSE.md		LICENSE.md
README.md		README.md
dataset.py		dataset.py
main.py		main.py
optimizer.py		optimizer.py
requirements.txt		requirements.txt
utils.py		utils.py
write_synthtext_pyarrow.py		write_synthtext_pyarrow.py

README.md

Vision-Language Pre-Training for Boosting Scene Text Detectors

The official PyTorch implementation of VLPT-STD (CVPR 2022).

VLPT-STD is a new pre-training paradigm for scene text detection that only requires text annotations. We propose three vision-language pretraining pretext tasks: imagetext contrastive learning (ITC), masked language modeling (MLM) and word-in-image prediction (WIP) to learn contextualized, joint representations, for the sake of enhancing the performance of scene text detectors. Extensive experiments on standard benchmarks demonstrate that the proposed paradigm can significantly improve the performance of various representative text detectors.

Paper

Install requirements

PyTorch version >= 1.8.0
Python version >= 3.6
apex

pip3 install -r requirements.txt

Dataset

Download synthtext dataset.

The structure of data folder as below.

data
└── SynthText
    ├── 1
    ├── 2
    ├── 3
    ├── ...
    └── gt.mat

Use write_synthtext_pyarrow.py to prepare arrow data format for pretraining.

Pretrained Models

pretrained resnet50 at this url.

Training

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch -nproc_per_node=8 main.py --exp_name base

Benchmarks

Performances on EAST, DB and PSENet are summaried as follows:

	ICDAR2015			ICDAR2017			MSRA-TD500
	P	R	F	P	R	F	P	R	F
EAST + SynthText	89.6	81.5	85.3	75.1	61.9	67.9	86.9	77.6	82.0
EAST + VLPT-STD	91.5	85.4	88.3	77.7	64.6	70.5	88.5	76.7	82.2

	ICDAR2015			Total-Text			MSRA-TD500
	P	R	F	P	R	F	P	R	F
DB + SynthText	88.2	82.7	85.4	87.1	82.5	84.7	91.5	79.2	84.9
DB + VLPT-STD	92.0	81.6	86.5	88.7	84.0	86.3	92.3	84.9	88.5

	ICDAR2015			Total-Text			CTW1500
	P	R	F	P	R	F	P	R	F
PSENet + SynthText	84.3	78.4	81.3	89.2	79.2	83.9	83.6	79.7	81.6
PSENet + VLPT-STD	86.0	82.8	84.3	90.8	82.0	86.1	86.3	80.7	83.3

Acknowledgements

This implementation has been based on ViLT.

Citation

If you find this work useful, please cite:

@inproceedings{song2022vision,
  title={Vision-Language Pre-Training for Boosting Scene Text Detectors},
  author={Song, Sibo and Wan, Jianqiang and Yang, Zhibo and Tang, Jun and Cheng, Wenqing and Bai, Xiang and Yao, Cong},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15681--15691},
  year={2022}
}

License

VLPT-STD is released under the terms of the Apache License, Version 2.0.

VLPT-STD is an algorithm for scene text detection pretraining and the code and models herein created by the authors from Alibaba can only be used for research purpose.
Copyright (C) 1999-2022 Alibaba Group Holding Ltd. 

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLPT-STD

VLPT-STD

README.md

Vision-Language Pre-Training for Boosting Scene Text Detectors

Paper

Install requirements

Dataset

Pretrained Models

Training

Benchmarks

Acknowledgements

Citation

License

Files

VLPT-STD

Directory actions

More options

Directory actions

More options

Latest commit

History

VLPT-STD

Folders and files

parent directory

README.md

Vision-Language Pre-Training for Boosting Scene Text Detectors

Paper

Install requirements

Dataset

Pretrained Models

Training

Benchmarks

Acknowledgements

Citation

License