What is RHCR?

RHCR is a Russian Handwriting Character Recognition (RHCR) project powered by TensorFlow

How did RHCR come about?

The project originator (Ray) was trying to help out his sister with dissertation work; she had a lot of source material in handwritten Russian and needed a faster way to make the words legible.

Aaron joined when he mentioned wanting to learn how to use some TensorFlow and wanting to become a data scientist or machine learning engineer to Ray.

How it works

This repo is in development and uses TensorFlow 2 to perform handwriting recognition.

Synthetic Data is in /synthetic_data_generation

Model development, training code, and saved models are in /model_training

From phone call with Ray on April 6

Run wikidump
Run traindata

Current Status

From Ray: "We left off at the final stages of synthetic data generation, but work could definitely be started on the model structure but the accuracy will be pointless"

Ran the TensorFlow TF2 migration script, but because the original mnist_saved_model script used tf.contrib, there will need to be manual code refactoring.

Future Steps

Refactor to proper Tensorflow 2
Refactor from PIL to Pillow (Pillow supports Python 3, PIL does not)
Create requirements.txt for easier reproducibility
Dockerize?
Consider a PyTorch branch and see performance differences
Would more data help?

From phone call with Ray on April 6

Better Bounding Box Drawing in Traindatagen.py a. Add random noise to the box dimensions?
A real CNN and spellchecker/RNN to get to text a. Possibly double sided if using CNN and RNN
Accuracy Assessment a. We are using Image of Text to Text
Image Warping/Colors/Damage (more realistic images) a. This part will be tedious due to matrix math
Make it faster

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.vscode		.vscode
model_training/mnist		model_training/mnist
model_training_v2/mnist		model_training_v2/mnist
synthetic_data_generation		synthetic_data_generation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TODO.txt		TODO.txt
model_training_tree_report_1751.txt		model_training_tree_report_1751.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is RHCR?

How did RHCR come about?

How it works

Current Status

Future Steps

From phone call with Ray on April 6

Requirements

About

Releases

Packages

Languages

License

AaronWChen/RHCR

Folders and files

Latest commit

History

Repository files navigation

What is RHCR?

How did RHCR come about?

How it works

Current Status

Future Steps

From phone call with Ray on April 6

Requirements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages