OCRforPaperless-ngx

pre_consume_script for paperless-ngx to turn pdf's into searchable documents using azure document ai.

the free version of azure only supports documents up to 2 pages, if you do more than 2 pages you will get an error that the document is too large.

i run docker and also created a custom docker build file to install dependencies!

in paperless set PAPERLESS_PRE_CONSUME_SCRIPT=/usr/src/paperless/scripts/azure-ocr.sh

edit the fr_generate_searchable_pdf.py file with your azure key and endpoint.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
azure-ocr.sh		azure-ocr.sh
dockerfile		dockerfile
fr_generate_searchable_pdf.py		fr_generate_searchable_pdf.py
is_ocrd_pdf.sh		is_ocrd_pdf.sh

Provide feedback