Guided CTC architecture for text recognition

The architecture has beeen proposed for text recognition, which exploit both attention and CTC. These two architecture are mainstream for text recognition. As a supervised approach, inputs contain (image, label)

train Dataset: SynthText , MJsynth
test Dataset : IC03, IC13, IC15, IIIT5K , SVT
- Input Images: text images wth varying size.
- Labels : text(concatenation of characters)[transcription of the text image]
- as we said, we are combining attention and CTC. as the label consideration for each of these method is different, we need some modification on GT labels
- after encoding character to index, for:
- attention : we add SoS(start of sequence) and EoS(end of sequence) character for each text sequence.
- CTC: is based on repetition and blank label.
- anyway, both CTC and attention code, need same length of text in each batch which is identified based on maximum length of text in each batch and padding the remaining based on this maximum length in that batch

The architecture is composed of four main submoduls:

1. STN

If the image is not horizontally alligned, it is needed to transform input image to achieve normalized image.

1) localization network

to predict transformation parameters

2) grid generator

in our model, rectification is done manually throught the manual transformation.
As another normalization step, text images are resized to fixed height
we will have fixed height and variable width images. As batches as input to model, should be in the same size. width of images per batches are padded to the maximum width of the image in that batches

2. Feature Extraction

As the most common architecture for extracting features from images, CNN is applied based in computational power, Resnet or light mobilenet. Size : (Batchsize, w , h, c)

1) ResNet

ResNet50 is applied with some modification

1)* Mobilenet

extracted feature maps should be transformed to feature vectors
Size(Batchsize, w , h*c)
So far we have encoded the images, next step is decoding.

we should apply decoder as input for decoder parts for training.

Attention Decoder need some weights for decoder inputs, too. we consider decoder input with a constant length(maximum length text in all texes+2[as SOS and EOS]).

CTC decoder input is just the text encoded array.

Note: based on the paper the first three submodules(STN, ResNet-CNN and the attentional guidance) are solely trained with cross entropy loss.while the GCN+CTC decoder is trained with CTC loss.

3. Attentional guidance

the attention decoder

4. Graph Convolutional Network(GCN)powered CTC decoder

the CTC decoder ############################## *) config file is considered as the configuration file for model

*) a generator is used to create batches of data, for model fitting

*) images should be resized by scale to: [64,None]

first install the require packages in requirement.txt
train.sh for training the network
test.sh for test
tensorboard.sh for tensorboard visualization

####################

Transformer Section:

data_generator.py : you can choose which dataset and which mode(transformer for ctc)

train : train_transformer_final.py some related files are available in Models (Layer,Model, callbacks, loss,metric,etc)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ConFig		ConFig
DataLoad		DataLoad
Models		Models
Guided_CTC.py		Guided_CTC.py
README.md		README.md
attnkeras.py		attnkeras.py
char2int_Attn.npy		char2int_Attn.npy
char2int_Attn_syn.npy		char2int_Attn_syn.npy
char2int_CTC_syn.npy		char2int_CTC_syn.npy
data_generator.py		data_generator.py
data_loader.sh		data_loader.sh
dataloader.py		dataloader.py
distributed_strategy.py		distributed_strategy.py
example_batch.npy		example_batch.npy
guided_CTC_architecture.jpg		guided_CTC_architecture.jpg
inference.py		inference.py
lr_schedule.py		lr_schedule.py
model_1.png		model_1.png
readSyntext.py		readSyntext.py
rectification.py		rectification.py
reqirements.txt		reqirements.txt
sample_df.csv		sample_df.csv
tensorboard.sh		tensorboard.sh
testCTC.sh		testCTC.sh
test_Attention.py		test_Attention.py
test_CTC.py		test_CTC.py
train.sh		train.sh
train_CTC_final.py		train_CTC_final.py
train_transformer_final.py		train_transformer_final.py
traintransformer.sh		traintransformer.sh
transformer_result.txt		transformer_result.txt
utils.py		utils.py
vcMJsyn.npy		vcMJsyn.npy
y_batch_Attn.npy		y_batch_Attn.npy
y_batch_CTC.npy		y_batch_CTC.npy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Guided CTC architecture for text recognition

1. STN

1) localization network

2) grid generator

2. Feature Extraction

1) ResNet

1)* Mobilenet

3. Attentional guidance

4. Graph Convolutional Network(GCN)powered CTC decoder

Transformer Section:

About

Releases

Packages

Languages

fariba87/seq2seq-OCR

Folders and files

Latest commit

History

Repository files navigation

Guided CTC architecture for text recognition

1. STN

1) localization network

2) grid generator

2. Feature Extraction

1) ResNet

1)* Mobilenet

3. Attentional guidance

4. Graph Convolutional Network(GCN)powered CTC decoder

Transformer Section:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages