Skip to content

This is a mini-project of implementing Text Recognition task using ViTSTR (Vision Transformer for Scene Text Recognition)

License

Notifications You must be signed in to change notification settings

zogojogo/text-recognition-wii

Repository files navigation

Text Recognition with ViTSTR

Introduction

This is a mini-project of implementing Text Recognition task using ViTSTR (Vision Transformer for Scene Text Recognition). The method is inspired by this public repository created by roatienza that built using a fork of CLOVA AI Deep Text Recognition Benchmark. This project also based on Vision Transformer for Fast and Efficient Scene Text Recognition paper.

ViTSTR is a simple single-stage model that uses a pre-trained Vision Transformer (ViT) to perform Scene Text Recognition (ViTSTR). It has a comparable accuracy with state-of-the-art STR models although it uses significantly less number of parameters and FLOPS. ViTSTR is also fast due to the parallel computation inherent to ViT architecture.

VitSTR Architecture

The main advantage by using ViTSTR for text recognition is the simplicity and the efficiency. Instead of using general method (four-steps, three-steps) to do text recognition task, ViTSTR only using one stage (Transformer Encoder) to performs. We can see the comparison in the figure below.

STR design patterns

Tutorial

Clone the project

git clone https://github.com/zogojogo/text-recognition-wii.git

Go to the project directory

cd text-recognition-wii

Download Dependencies

pip install -r requirements.txt

Start API service

python3 app.py

API Reference

Service: http://your-ip-address:8080

POST image

  POST /segment_lung

Content-Type: multipart/form-data

Name Type Description
image file Required. image/png or image/jpg MIME Type

Output Example

Output:

{
  "filename": "<filename>",
  "contentype": "<image type>",
  "output text": "<predicted text>",
  "inference time": "<inference time>"
}

About

This is a mini-project of implementing Text Recognition task using ViTSTR (Vision Transformer for Scene Text Recognition)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages