Skip to content

pre_consume_script for paperless-ngx to turn pdf's into searchable documents using azure document ai.

Notifications You must be signed in to change notification settings

DarkPhyber-hg/OCRforPaperless-ngx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCRforPaperless-ngx

pre_consume_script for paperless-ngx to turn pdf's into searchable documents using azure document ai.

the free version of azure only supports documents up to 2 pages, if you do more than 2 pages you will get an error that the document is too large.

i run docker and also created a custom docker build file to install dependencies!

in paperless set PAPERLESS_PRE_CONSUME_SCRIPT=/usr/src/paperless/scripts/azure-ocr.sh

edit the fr_generate_searchable_pdf.py file with your azure key and endpoint.

fr_generate_searchable_pdf.py is from here: https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/generate-searchable-pdfs-with-azure-form-recognizer/bc-p/3930832 check this post for dependencies!!

is_ocrd_pdf.sh is from here: https://github.com/jfilter/pdf-scripts

About

pre_consume_script for paperless-ngx to turn pdf's into searchable documents using azure document ai.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published