Skip to content

avaibh/Post-OCR-Processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Optical Character Recognition Post Processing

IITB-intern

Designed an algorithm for word corrections of post OCR Sanskrit scripts through statistical language models using edit distance on a known dictionary. Improved the accuracy of the algorithm up by 10%, by introducing morphological language constraints from the grammar, using Sandhi. Demonstrated different machine learning and deep learning approaches to improve features for ameliorating the error corrections in Sanskrit OCR documents like LSTM and attention models-RNN. We simulated Subanta Prakaraṇam of Aṣṭādhyāyī for synthesizing off-the-shelf dictionary. Adapted various auxiliary sources with plug-in classifiers and achieved the best F-score of 93.72 with the approach of LSTM networks. Our paper was presented at the 17th WSC, Vancouver in July 2018.

Releases

No releases published

Packages

No packages published

Languages