Skip to content

OCR/HTR training data from the OpenArabicPE corpus of Arabic periodicals

License

Notifications You must be signed in to change notification settings

OpenArabicPE/ocr_training-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title author date tags
OpenArabicPE: OCR training data
Till Grallert
2019-05-22
OpenArabicPE
Arabic periodicals

This repository holds training data for OCR/HTR algorithms and some tools for converting TEI files into one .txt file per page required for this task.

The oXygen project OpenArabicPE_OCR.xpr is configured with a single transformation scenario, which makes use of an XSLT stylesheet (convert_tei-to-plain-text-pages.xsl) from another GitHub repository (convert_tei-to-markdown). In order for this conversion to work, both repositories (ocr_training-data and convert_tei-to-markdown) need to be children of the same folder and input TEI should validate against the OpenArabicPE ODD.

About

OCR/HTR training data from the OpenArabicPE corpus of Arabic periodicals

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages