Text Recognition with ViTSTR

Introduction

This is a mini-project of implementing Text Recognition task using ViTSTR (Vision Transformer for Scene Text Recognition). The method is inspired by this public repository created by roatienza that built using a fork of CLOVA AI Deep Text Recognition Benchmark. This project also based on Vision Transformer for Fast and Efficient Scene Text Recognition paper.

ViTSTR is a simple single-stage model that uses a pre-trained Vision Transformer (ViT) to perform Scene Text Recognition (ViTSTR). It has a comparable accuracy with state-of-the-art STR models although it uses significantly less number of parameters and FLOPS. ViTSTR is also fast due to the parallel computation inherent to ViT architecture.

The main advantage by using ViTSTR for text recognition is the simplicity and the efficiency. Instead of using general method (four-steps, three-steps) to do text recognition task, ViTSTR only using one stage (Transformer Encoder) to performs. We can see the comparison in the figure below.

Tutorial

Clone the project

git clone https://github.com/zogojogo/text-recognition-wii.git

Go to the project directory

cd text-recognition-wii

Download Dependencies

pip install -r requirements.txt

Start API service

python3 app.py

API Reference

Service: http://your-ip-address:8080

POST image

  POST /segment_lung

Content-Type: multipart/form-data

Name	Type	Description
`image`	`file`	Required. `image/png` or `image/jpg` MIME Type

Output Example

Output:

{
  "filename": "<filename>",
  "contentype": "<image type>",
  "output text": "<predicted text>",
  "inference time": "<inference time>"
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
examples		examples
models		models
preprocess		preprocess
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
inference_onnx.py		inference_onnx.py
inference_torch.py		inference_torch.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Recognition with ViTSTR

Introduction

Tutorial

API Reference

POST image

Output Example

About

Releases

Packages

Languages

License

zogojogo/text-recognition-wii

Folders and files

Latest commit

History

Repository files navigation

Text Recognition with ViTSTR

Introduction

Tutorial

API Reference

POST image

Output Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages