Reads the texts of images and makes them browsable and searchable via HTTP.
On an Ubuntu system run the following to install prerequisites for the OCR and processing:
sudo apt-get install sqlite3 tesseract-ocr-osd tesseract-ocr-fin tesseract-ocr imagemagick unpaper
Also Golang is required. Tested with go 1.8.
Clone this repository and run the following command:
go build
Start the web server with the following:
./paperless
This starts a web-server to the port 8078. See the ‘–help’ argument for command line options
File uploading happens with a browser. There is the ‘+’ button which opens a panel where one can drag-and-drop images to OCR.
If you have Docker properly set up, you can run this inside docker with the following:
cd docker
./run.sh
This should start an Ubuntu 16.04 Docker instance where the program is running.
In the uploader directory there is a go-application that can be used to send batches of tagged images to the server.
Build it with the following:
cd uploader
go build
Example run:
./uploader -t important,dontremove http://localhost:8078 important-01.jpg important-02.jpg
Usage is printed with the –help argument.
The frontend development requires Nodejs and NPM. Therefore the environment can be set up with:
cd web/paperless-frontend
npm install
To set up a running environment do the following:
- Start the paperless -application to get the backend running in port 8078.
- Start the webpack-dev-server with:
cd web/paperless-frontend npm run serve
- This will start the frontend to port 8080 that connects to the backend in port 8078.
The frontend is embedded in the final binary. To update the changes from the frontend development files to the binary, do the following:
- Install the ‘esc’ file embedder, so the esc can be found in the $PATH.
go get -i github.com/mjibson/esc
- Build the dist-package of the frontend:
cd web/paperless-frontend npm run build
- Regenerate the lib/web-generated.go with:
cd lib go generate
- Build normally, test and commit the generated files.
MIT license