From 21bc3580b58470f7fd5fb8dfc0ab036868a63ddb Mon Sep 17 00:00:00 2001
From: ci-bot Papermerge DMS or simply Papermerge is a open source document management system designed to work with scanned documents (also called digital archives). It extracts text from your scans using OCR, indexes them, and prepares them for full text search. Papermerge provides look and feel of modern desktop file browsers. It has features like dual panel document browser, drag and drop, tags, hierarchical folders and full text search so that you can efficiently store and organize your documents. It supports PDF, TIFF, JPEG and PNG document file formats. Papermerge is perfect tool for long term storage of your documents. For Papermerge a document is anything which is a good candidate for archiving - some piece of information which is not editable but you need to store it for future reference. For example receipts are good examples - you don't need to read receipts everyday, but eventually you will need them for your tax declaration. In this sense - scanned documents, which are usually in PDF or TIFF format, are perfect match. PDF (Portable Document Format) is de facto standard for storing archived documents. In correct technical terms - it is PDF/A subset. PDF/A differs from PDF by prohibiting features unsuitable for long-term archiving, such as font linking and encryption. Most of the modern office scanners will output scanned files in PDF/A format. This is why, PDF is practically synonymous for document in context of Papermerge. A picture made with smart phone of A4 paper document is regarded by Papermerge as document. Papermerge docker image is shipped with backup and restore utilities. Shipped utility will backup all your folders, documents with their associated versions and OCR data, tags and users. Search engine index is not included in backup though. Note User passwords are included in backup file as well. Passwords are stored as digests. Backup your documents with following command: where Example: In above example the Papermerge has 5 containers: app server (the core or web or http or REST API server, pick the name you like :P), solr search engine, redis, database and finally one paper worker. To create a backup in root folder of the app container just run: When above command is ready, check that backup file was created: Backup file is backup_10_12_2023-06_30_37.tar.gz. Now you can copy backup file to your local filesystem: You may choose to name file differently: Then copy it to your local filesystem: Note Backup files are gzipped tar archives, thus you probably want to append \".tar.gz\" to their name. When you plan to restore previous backup, we suggest to start with new Papermerge instance, with only one superuser (which is created by default anyway). Make sure there are no documents in the new instance. For sake of example, let's say the superuser's username is \"admin\". For restoring use For that to work, you need first to copy backup archive file to core (server) container. Sticking with example from previous section: If \"admin\" user already existed in backup file, then admin's password will be set to the one from the backup file. The backup file is a gzipped tar archive with following content: User folders mentioned in point 4. are provided for convenience, so that you may quickly get an understanding of the folder structure and their content. Each file in user folder is actually a symbolic link pointing to the last version of the document (from Warning Each user has two special folders: For complete changelog see changelog file in github repository. Command line utility which uses REST API to interact with your Papermerge instance. It can be used to upload documents from local filesystem to yout Papermerge instance. In order to use Install pip is package installer for python - it usually comes with python interpreter. In order to install pip on Ubuntu use following command: Papermerge Cli is configured via environment variables: as the name suggests, the first one is the host of the REST API server and second value is the REST API token. REST API server should be specified with Note The host may or may not contain the To get REST API token follow these instructions. List the content of you home folder: In order to list content of specific folder (including inbox folder):: In order to see current user details use Recursively imports documents and folders from local filesystem. For example, in order to import recursively all documents from local folder: You can also import one single document: By default all documents are imported to your user's In order to learn UUID of the folder you want to import to use If you want the local copy the uploaded documents to be deleted after successful import - use Danger Be careful with Danger Always, before using this flag make safe backup of the documents to be uploaded! !!! note: In order to get general help about the command use: In order to get help for individual commands, place This section describes a set of command line utilities which can interact (e.g. import documents to, list nodes etc) with your Papermerge instance. What is common to all command line utilities listed here is that they all use REST API interface. In order to use REST API you need to know: Host address should be provided with Examples: Note REST API server may or may not end with Currently there is no web UI for getting your user's token. The only way to get REST API token is by running docker command. Click here for details. There is docker image for development mode. Docker image is tagged with All examples described below assume that you got Papermerge source code and you are in root repository of source code folder: This is the simplest local dev scenario, you start docker compose file only with web app i.e. REST API server + ui. The go to the folder where source was cloned and create following docker compose file: Assuming you are in root folder of the source code, above docker compose will mount the source code to the correct location in docker image. The appliction will be accessible to on local port 11000. Here is docker compose file for the case when you want to build dev docker image yourself: Following docker compose adds worker service. Worker and Web App communicate via redis (message broker), thus we need to add redis service as well: Both worker and web app read their logging configurations from file pointed by You may recognize it. It should be YAML version of python logging config. Here is an example of docker compose with web wepp + worker + custom logging configuration: Papermerge is shipped with a default search library - Xapian. However, you may opt-in to use full fledged search engine like Solr. In order to change search backend, use Notice that Solr is started with Here is an example of docker compose which uses PostgreSQL as database: Papermerge provides very powerful REST API. In order to user REST API, you need REST API server URL and user token. REST server URL is the http address of your running instance. HTTP address also include the scheme. Examples of REST API server URLs: Currently there is no web UI to get token, the only way to get REST API token is by running docker command. See next section for details. Papermerge REST API is exposed via Open API standard. Papermerge ships with swagger REST API documentation reference. You can access it in your running Papermerge instance from user menu -> REST API: Currently there is no web UI for getting the REST API token. Instead, you get the REST API token by running one docker command: You can list users in Papermerge with following command: Example: In above example the Papermerge has four containers: app server, redis, database and one worker. For our purpose we need app container (in example above - fordoc-web-1). Let's list all users first: There is only one user with username \"john\". In order to get REST API token for user \"john\" run following command: The long list of characters from above is the token for user with username \"john\". Username for the superuser. Default value is Example: Password for the superuser. No default value. Example: Email for the superuser. Default value is Example: This sections is for database configurations. Papermerge supports following databases SQLite3, PostgreSQL, MySQL/MariaDB. For PostgreSQL the database URL is given in following format: Example: For MariaDB and MySQL the URL scheme is Example: For SQLite the format is: Example: Default value is Note Both web_app and worker must have same Applies only for Tivoli. Set database connections pool size. Defaults to 5. Note This configuration option applies only to Tivoli, which is internal JWT token validator component. Tivoli uses SQLAlchemy as ORM. SQLAlchemy has built-in database connections pooling. The core app though, uses Django ORM - which does not have built-in pooling capabilities. Absolute filesystem path to the directory that will hold user-uploaded documents. Example: Absolute filesystem path to the yaml file that will hold logging detailed configuration. Content of logging configuration file is expected to be in yaml format and it is very python specific. Example: Which timezone to use. Example: This section groups all OCR specific configurations. By default Papermerge will use language specified with this option to perform OCR. Change this value for language used by majority of your documents. For detailed list of three letter codes see 639-2/T column from ISO 639 2. Example as environment variable: Default value is \"deu\" (German language). Papermerge loads its settings from environment variables. Environment variables have following format: double underscores are used as delimiter; environment variables names must be all upper case. Only required environment variables are: Note In the documentation, for brevity sake, Papermerge uses redis. For Redis the URL is given in following format: For example: Note Both web_app and worker must have same Papermerge supports multiple search engine backends. Currently two backends are available: Search engine backend to use. For Solr format is: Example: For Xapian URL format is Default value for Note Both web_app and worker must have same Required . Unique secret key. The secret key must be a large random value and it must be kept secret. This option does not have default value, you always need to supply a value for it. Secret key is used to sign JWT tokens. Example as environment variable: By default the Papermerge Docker image includes English, German, French, Italian, Spanish, Romanian and Portugues OCR languages. You can install extra languages by creating a new docker image from base Create new docker file with following content: All languages are specified in three letters code as per ISO 639-2T standard - second column in the table. In order to build your image run: Check that OCR languages were installed: Ansible playbook is available at papermerge/ansible. Playbook will install web app, two workers, database, Redis and Solr search engine on target host. All services will be deployed as docker containers. All services will be placed behind traefik, reverse proxy which will take care of TLS certficates. Choose one of following options: Ansible repository does not include secrets file. Secrets file contains all sensitive (paswords, api tokens) information. You need to create secrets filein Place following content: Of course you need to replace dots with correct passwords, secret_key etc. database_url is in secrets file because it includes password. Make sure Install Papermerge DMS with PostgreSQL: Application will be accessible via https:// In this setup application will connect to the database via pgbouncer, this means that Your Install Papermerge DMS with PostgreSQL and PgBouncer: Application will be accessible via https:// For Mysql/MariaDB Install Papermerge DMS with MariaDB: In order to create a backup: In order to restore the backup: The backup file path is the one from inside docker container. papermerge/ansible assumes Debian12/Ubuntu 22.04 host. We are happy to accept your pull requests for other hosts. This section describes how to setup Papermerge using docker compose. The simpliest docker compose setup for Papermerge is following: You can access Papermerge user interface using any modern web browser (e.g. Firefox, Chrome). Open your web browser and point it to http://localhost:12000. By default Papermerge uses sqlite3 database. Here is setup to which uses PostgreSQL: By default Papermerge uses Xapian search engine. However, for production environments, full fledged search engine like Solr is recommanded. Here is an example of docker compose setup with MariaDB: ... The only two required environment variables are Point your web browser to Credentials are: Note The above Official Papermerge docker image is available on docker hub. The recommended way to get the Papermerge docker image is via docker pull command: For complete setup you need to start one or multiple workers. Worker is the component which, among other things, performs OCR. Here is minimal docker compose file with web UI and one worker: With above setup, web app is accessible on To be added soon... Papermerge consists of one web app and multiple workers (at least one). The web app is the one which you see in your browsers or interact via REST API. The worker (one or multiple) is the part which performs background tasks like OCR, updating search engine index etc. In order to function Papermerge needs a database, which can be one of following: By default Papermerge uses SQLite. Papermerge supports multiple search engine backends: Xapian is used by default. Papermerge uses Tesseract to perform Optical Character Recognition. Papermerge is designed to run on Linux/Unix compatible system. You need to have docker installed, as Papermerge is shipped as docker image. All docker images are stored on docker hub. Make sure that you have docker available: Hardware specification for Papermerge depends on number of documents and users. For one user with 1000-2000 pages a system spec with: will do just fine. For OCR, Papermerge uses Tesseract. OCR is very CPU intensive operation, thus more CPUs and RAM memory your system has - better. More CPU cores and more powerful the CPUs means OCR will be performed faster. Note GPU is not required as Tesseract runs OCR entirely on your CPU.
"},{"location":"#what-it-does","title":"What It Does","text":"
"},{"location":"#what-it-doesnt-do","title":"What It Doesn't Do","text":"
"},{"location":"#what-is-a-document","title":"What is a Document?","text":"$ docker exec <papermerge-server-container> backup.sh <optional-location>\n
<optional-location>
is the path to file or folder inside container where to save backup file. If location is not provided, backup file will be saved in /core_app/ folder - the papermerge core application's current folder. $ docker ps --format '{\\{.ID\\}} {\\{.Command\\}} {\\{.Names\\}}'\n\n 914dda21dd3d \"/run.bash server\" 091223_30-web-1\n 42095cee91f0 \"docker-entrypoint.s\u2026\" 091223_30-solr-1\n d65b3205d9ec \"/run.bash worker\" 091223_30-worker-1\n ac5cfd76993a \"docker-entrypoint.s\u2026\" 091223_30-redis-1\n 8ad6d0a7eb6c \"/opt/bitnami/script\u2026\" 091223_30-db-1\n
$ docker exec 914dda21dd3d backup.sh /\n
$ docker exec 914dda21dd3d ls /\n\nauth_server_app\nbackup_10_12_2023-11_30_37.tar.gz\nbin\nboot\n...\ncore_app\ncore_ui\ndb\n...\nusr\nvar\n
$ docker cp 914dda21dd3d:/backup_10_12_2023-06_30_37.tar.gz .\n
$ docker exec 914dda21dd3d backup.sh /my-daily-backup.tar.gz\n
$ docker cp 914dda21dd3d:/my-daily-backup.tar.gz .\n
restore.sh
command:$ docker exec <papermerge-server-container> restore.sh <backup-file>\n
$ docker cp my-backup.tar.gz 914dda21dd3d:/my-backup.tar.gz\n$ docker exec 914dda21dd3d restore.sh /my-backup.tar.gz\n
backup.json
fileocr/
folderdocvers/
folderusername1
/, username2
, ... i.e. one folder per user with folder title being user's usernamebackup.json
file contains all necessary info to restore the database i.e. all users, their nodes, tags etc.docvers/
contains actually document versions files. Your documents are here.ocr/
contains OCR data of each individual page in the document.docvers
)..home
and .inbox
; special folder's title start with dot. If you open backup archive in file browsers which hides dot files (file starting with dot character) - the content of user folder may appear empty! When opening backup archive make sure you set 'show hidden files flag' on.papermerge-cli
you need to have python installed. You need python version >= 3.10.papermerge-cli
with following command:pip install papermerge-cli\n
"},{"location":"cli/cli/#configuration","title":"Configuration","text":"sudo apt install python3-pip\n
PAPERMERGE_CLI__HOST
PAPERMERGE_CLI__TOKEN
http://
or https://
prefix, but without the /api
suffix. Valid values examples: http://papermege.local, https://my-dms.papermerge.de./
. E.g. http://papermege.local and http://papermerge.local/ are both valid values and point to the same host papermerge-cli ls\n
"},{"location":"cli/cli/#me","title":"me","text":" papermerge-cli ls --parent-uuid=UUID-of-the-folder\n
me
command:
"},{"location":"cli/cli/#import","title":"import","text":" papermerge-cli me\n
papermerge-cli import /path/to/local/folder/\n
papermerge-cli import /path/to/some/document.pdf\n
.inbox
folder. If you want to import to another folder, use --target-uuid
:papermerge-cli import /path/to/some/document.pdf --target-uuid <uuid>\n
papermerge-cli ls
command. To get UUIDs of .home
and .inbox
folders, use papermerge-cli me
command.--delete
flag:papermerge-cli import --delete /path/to/folder/\n
--delete
flag! When present,papermerge-cli
will irreversible delete the local copy of all documents and folders in the /path/to/folder/
!--delete
flag deletes the local copy of the documents/path to import after successful upload - this means that even if though you local copy of the documents vanished - the originals are still available in Papermerge! papermerge-cli --help\n
--help
flag after the command:
"},{"location":"cli/overview/","title":"Overview","text":" papermerge-cli import --help\n
"},{"location":"cli/overview/#host-address","title":"Host Address","text":"http://
or https://
prefix.
/
character. Thus, both http://papermerge.local and http://papermerge.local/ are valid.3.0devX
. With dev image, you can get feedback of your source code changes without needing to install any dependency or any deveopment environment configurations.
"},{"location":"contributor/docker/#web-app","title":"Web App","text":"$ git clone git@github.com:papermerge/papermerge-core.git PapermergeSourceCode\n$ cd PapermergeSourceCode\n
version: \"3.9\"\n\nservices:\n backend:\n image: papermerge/papermege:3.0dev # check the latest dev image number in dockerhub!\n volumes:\n - ./papermerge:/core_app/papermerge/\n - ./ui:/core_ui/\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: 1234\n ports:\n - \"11000:80\"\n
"},{"location":"contributor/docker/#web-app-worker","title":"Web App + Worker","text":"version: \"3.9\"\n\nservices:\n backend:\n build:\n context: .\n dockerfile: docker/dev/Dockerfile\n volumes:\n - ./papermerge:/core_app/papermerge/\n - ./ui:/core_ui/\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: 1234\n ports:\n - \"11000:80\"\n
"},{"location":"contributor/docker/#logging-config","title":"Logging Config","text":"version: \"3.9\"\n\nx-backend: &common # yaml anchor definition\n build:\n context: .\n dockerfile: docker/dev/Dockerfile\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: 1234\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n volumes:\n - ./papermerge:/core_app/papermerge/\n - ./ui:/core_ui/\n - data:/db\n - index_db:/core_app/index_db\n - media_root:/core_app/media\n\nservices:\n web:\n <<: *common\n ports:\n - \"11000:80\"\n depends_on:\n - redis\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n\nvolumes:\n data:\n index_db:\n media_root:\n
PAPERMERGE__MAIN__LOGGING_CFG
environment variable. An example of custom logging config would be:version: 1\ndisable_existing_loggers: true\n\nformatters:\n verbose:\n format: '%(asctime)s %(levelname)s %(name)s.%(funcName)s %(message)s'\n\nhandlers:\n console:\n level: DEBUG\n class: logging.StreamHandler\n formatter: verbose\n\nloggers:\n auth_server:\n level: DEBUG\n handlers: [console]\n papermerge.search.tasks:\n level: DEBUG\n handlers: [console]\n propagate: no\n format: verbose\n
"},{"location":"contributor/docker/#solr","title":"Solr","text":"version: \"3.9\"\n\nx-backend: &common\n build:\n context: .\n dockerfile: docker/dev/Dockerfile\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: 1234\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n PAPERMERGE__MAIN__LOGGING_CFG: /logging.yml # <-- absolute path to custom config file\n volumes:\n - ./papermerge:/core_app/papermerge/\n - ./ui:/core_ui/\n - data:/db\n - index_db:/core_app/index_db\n - media_root:/core_app/media\n - ./custom_logging.yml:/logging.yml # mount local logging config file\n\nservices:\n web:\n <<: *common\n ports:\n - \"11000:80\"\n depends_on:\n - redis\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n\nvolumes:\n data:\n index_db:\n media_root:\n
PAPERMERGE__SEARCH__URL
env variable:version: \"3.9\"\n\nx-backend: &common\n build:\n context: .\n dockerfile: docker/dev/Dockerfile\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: 1234\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n PAPERMERGE__SEARCH__URL: solr://solr:8983/pmg-index # <- use Solr's \"pmg-index\" index\n volumes:\n - ./papermerge:/core_app/papermerge/\n - ./ui:/core_ui/\n - data:/db\n - index_db:/core_app/index_db\n - media_root:/core_app/media\n\nservices:\n web:\n <<: *common\n ports:\n - \"11000:80\"\n depends_on:\n - redis\n - solr\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n solr:\n image: solr:9.3\n ports:\n - \"8983:8983\"\n volumes:\n - solr_data:/var/solr\n command:\n - solr-precreate\n - pmg-index # <- creates index at startup of the Solr service\n\nvolumes:\n data:\n solr_data:\n index_db:\n media_root:\n
solr-precreate pmg-index
command, which means that Solr service will be started with pre-created index named pmg-index
.
"},{"location":"contributor/docker/#oauth-20","title":"OAuth 2.0","text":""},{"location":"rest-api/overview/","title":"Overview","text":"version: \"3.9\"\n\nx-backend: &common\n build:\n context: .\n dockerfile: docker/dev/Dockerfile\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: 1234\n PAPERMERGE__DATABASE__URL: postgresql://postgres:123@db:5432/postgres\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n PAPERMERGE__SEARCH__URL: solr://solr:8983/pmg-index\n volumes:\n - ./papermerge:/core_app/papermerge/\n - ./ui:/core_ui/\n - index_db:/core_app/index_db\n - media_root:/core_app/media\n depends_on:\n - redis\n - solr\n - db\n\nservices:\n web:\n <<: *common\n ports:\n - \"11000:80\"\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n solr:\n image: solr:9.3\n ports:\n - \"8983:8983\"\n volumes:\n - solr_data:/var/solr\n command:\n - solr-precreate\n - pmg-index\n db:\n image: bitnami/postgresql:14.4.0\n volumes:\n - postgres_data:/var/lib/postgresql/data/\n environment:\n - POSTGRES_PASSWORD=123\n\nvolumes:\n postgres_data:\n solr_data:\n index_db:\n media_root:\n
docker exec <papermege-container> create_token.sh <username>\n
docker exec <papermege-container> list_users.sh\n
$ docker ps --format '{\\{.ID\\}} {\\{.Command\\}} {\\{.Names\\}}'\n\nd8b965388fd9 \"/run.bash server\" fordoc-web-1\n8fb8f6f565a2 \"/run.bash worker\" fordoc-worker-1\n8a42db0bb7f9 \"/opt/bitnami/script\u2026\" fordoc-db-1\n8a6146801936 \"docker-entrypoint.s\u2026\" fordoc-redis-1\n
$ docker exec d8b965388fd9 list_users.sh\n\nusername=john email=admin@example.com\n
$ docker exec d8b965388fd9 create_token.sh john\n\neyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJqb2huIiwidXNlcl9pZCI6IjJiODQwY2RhLThjMmYtNDExYy05NDYwLTY0ZDA3YWY3YTJiZSIsImV4cCI6MTcwMzM1MTUzNn0.KJAL9TjRiV63liwVO5bh9GQ_I_QFXMoviKV9Lww3cDs\n
admin
.
"},{"location":"settings/auth/#auth__password","title":"AUTH__PASSWORD","text":"PAPERMERGE__AUTH__USERNAME=john\n
"},{"location":"settings/auth/#auth__email","title":"AUTH__EMAIL","text":"PAPERMERGE__AUTH__PASSWORD=topsecret\n
admin@example.com
.
"},{"location":"settings/database/","title":"Database","text":"PAPERMERGE__AUTH__EMAIL=john@mail.com\n
postgresql://USER:PASSWORD@HOST:PORT/NAME\n
postgresql://scott:tiger@db:5432/mydatabase\n
mysql
.mysql://myuser:mypass@db:3306/paperdb\n
sqlite:///PATH
.sqlite:////db/db.sqlite3\n
sqlite:////db/db.sqlite3
, in other words, if DATABASE__URL
is missing, Papermerge will use SQLite with /db/db.sqlite3
as db file.PAPERMERGE__DATABASE__URL
.
"},{"location":"settings/main/#main__logging_cfg","title":"MAIN__LOGGING_CFG","text":"PAPERMERGE__MAIN__MEDIA_ROOT=/var/www/example.com/media/\n
"},{"location":"settings/main/#main__timezone","title":"MAIN__TIMEZONE","text":" PAPERMERGE__MAIN__LOGGING_CFG=/etc/papermerge/logging.yaml\n
"},{"location":"settings/ocr/","title":"OCR","text":"PAPERMERGE__MAIN__TIMEZONE=Europe/Berlin\n
PAPERMERGE__OCR__DEFAULT_LANGUAGE=spa\n
PAPERMERGE__<section>__<name>\n
PAPERMERGE__SECURITY__SECRET_KEY
is the key to securing signed data \u2013 it is vital you keep this secure, or attackers could use it to generate their own signed values.PAPERMERGE__AUTH__PASSWORD
is the password for super user (administrative user or admin user). Super user is created automatically for you when Papermerge starts for the first time.PAPERMERGE__
prefix may be omitted. For example docs may say: default value for DATABASE__URL
is \"sqlite:////db/db.sqlite3\"; what is meant actually is: default value for PAPERMERGE__DATABASE_URL
is \"sqlite:////db/db.sqlite3\".redis://HOST:PORT/NUMBER\n
redis://redis:6379/0\n
PAPERMERGE__REDIS__URL
"},{"location":"settings/search/#search__url","title":"SEARCH__URL","text":"solr://HOST:PORT/INDEX\n
solr://solr:8983/pmg-index
xapian:///PATH
. Example: xapian:////index_db
- in other words, xapian will store all index data in /index_db
folder.PAPERMERGE__SEARCH__URL
is xapian:////index_db
PAPERMERGE__SEARCH__URL
"},{"location":"setup/add-ocr-langs/","title":"Add OCR Languages","text":"PAPERMERGE__SECRET__SECRET_KEY=asjrijfpHHJH00huge00secret00QMNB344GHOOooaq\n
papermerge/papermerge
.FROM papermerge/papermerge:3.0.1\n\n# add Danish and Polish OCR languages\nRUN apt install tesseract-ocr-dan tesseract-ocr-pol\n
docker build -t mypaper:3.0 -f Dockerfile .\n
"},{"location":"setup/ansible/","title":"Ansible","text":"docker run -it --rm mypaper:3.0 tesseract --list-langs\n
"},{"location":"setup/ansible/#secrets","title":"Secrets","text":"group_vars
folder:$ touch groups_vars/secrets\n
secret_key: ...\nsuperuser_password: ...\ndatabase_url: ...\ndb_pass: ...\ncloudflare_api_key: ...\ntraefik_api_password: ...\n
database_url
in your secrets files matches database related options in group_vars/all
(db_user, db_name). Also port number database_url
should match the one in db_postgres/vars/main.yml
.database_url
should have following format:postgresql://<user>:<pass>@db:5432/<dbname>\n
$ ansible-playbook install_1.yml -i inventory --extra-vars \"@group_vars/secrets\"\n
acme_domain
is variable you set in group_vars/all
e.g. trusel.net"},{"location":"setup/ansible/#option-2-postgresql-pgbouncer","title":"Option 2 / PostgreSQL + PgBouncer","text":"database_url
should point to pgbouncer.database_url
should look like:postgresql://<user>:<pass>@pgbouncer:6432/<dbname>\n
$ ansible-playbook install_2.yml -i inventory --extra-vars \"@group_vars/secrets\"\n
acme_domain
is variable you set in group_vars/all
e.g. trusel.net"},{"location":"setup/ansible/#option-3-mariadb","title":"Option 3 / MariaDB","text":"database_url
should have following format:mysql://<user>:<pass>@db:3306/<dbname>\n
"},{"location":"setup/ansible/#backup","title":"Backup","text":"$ ansible-playbook install_3.yml -i inventory --extra-vars \"@group_vars/secrets\"\n
"},{"location":"setup/ansible/#restore","title":"Restore","text":"$ ansible-playbook backup.yml\n
ansible-playbook restore.yml --extra-vars \"backup_file=/backup/backup_20_11_2023-07_33_03.tar.gz\"\n
version: \"3.9\"\n\n x-backend: &common\n image: papermerge/papermerge:3.0.1\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 12345\n PAPERMERGE__AUTH__USERNAME: john\n PAPERMERGE__AUTH__PASSWORD: hohoho\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n volumes:\n - data:/db\n - index_db:/core_app/index_db\n - media:/core_app/media\n services:\n web:\n <<: *common\n ports:\n - \"12000:80\"\n depends_on:\n - redis\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n volumes:\n data:\n index_db:\n media:\n
"},{"location":"setup/docker-compose/#solr","title":"Solr","text":" version: \"3.9\"\n\n x-backend: &common\n image: papermerge/papermerge:3.0.1\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 12345\n PAPERMERGE__AUTH__USERNAME: john\n PAPERMERGE__AUTH__PASSWORD: hohoho\n PAPERMERGE__DATABASE__URL: postgresql://scott:tiger@db:5432/mydatabase\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n volumes:\n - index_db:/core_app/index_db\n - media:/core_app/media\n services:\n web:\n <<: *common\n ports:\n - \"12000:80\"\n depends_on:\n - redis\n - db\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n db:\n image: bitnami/postgresql:14.4.0\n volumes:\n - postgres_data:/var/lib/postgresql/data/\n environment:\n POSTGRES_USER: scott\n POSTGRES_PASSWORD: tiger\n POSTGRES_DB: mydatabase\n volumes:\n postgres_data:\n index_db:\n media:\n
"},{"location":"setup/docker-compose/#mysql-mariadb","title":"MySQL / MariaDB","text":"version: \"3.9\"\n\nx-backend: &common\n image: papermerge/papermerge:3.0.1\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 12345\n PAPERMERGE__AUTH__USERNAME: john\n PAPERMERGE__AUTH__PASSWORD: hohoho\n PAPERMERGE__DATABASE__URL: postgresql://scott:tiger@db:5432/mydatabase\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n PAPERMERGE__SEARCH__URL: solr://solr:8983/pmg-index\n volumes:\n - media:/core_app/media\n\nservices:\n web:\n <<: *common\n ports:\n - \"12000:80\"\n depends_on:\n - redis\n - db\n - solr\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n db:\n image: bitnami/postgresql:14.4.0\n volumes:\n - postgres_data:/var/lib/postgresql/data/\n environment:\n POSTGRES_USER: scott\n POSTGRES_PASSWORD: tiger\n POSTGRES_DB: mydatabase\n solr:\n image: solr:9.3\n ports:\n - \"8983:8983\"\n volumes:\n - solr_data:/var/solr\n command:\n - solr-precreate\n - pmg-index\n\nvolumes:\n postgres_data:\n solr_data:\n media:\n
"},{"location":"setup/docker-compose/#oauth-20","title":"OAuth 2.0","text":"version: \"3.9\"\n\nx-backend: &common\n image: papermerge/papermerge:3.0.1\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: eugen\n PAPERMERGE__AUTH__PASSWORD: 1234\n PAPERMERGE__DATABASE__URL: mysql://myuser:mypass@db:3306/paperdb\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n PAPERMERGE__SEARCH__URL: solr://solr:8983/pmg-index\n volumes:\n - media_root:/core_app/media\n depends_on:\n - redis\n - solr\n - db\n\nservices:\n web:\n <<: *common\n ports:\n - \"11000:80\"\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n solr:\n image: solr:9.3\n ports:\n - \"8983:8983\"\n volumes:\n - solr_data:/var/solr\n command:\n - solr-precreate\n - pmg-index\n db:\n image: mariadb:11.2\n volumes:\n - maria:/var/lib/mysql\n environment:\n MYSQL_ROOT_PASSWORD: mypass\n MYSQL_DATABASE: paperdb\n MYSQL_USER: myuser\n MYSQL_PASSWORD: mypass\nvolumes:\n maria:\n solr_data:\n media_root:\n
PAPERMERGE__SECURITY__SECRET_KEY
and PAPERMERGE__AUTH__PASSWORD
. To start web ui part use following command:docker run -p 9400:80 \\\n -e PAPERMERGE__SECURITY__SECRET_KEY=abc \\\n -e PAPERMERGE__AUTH__PASSWORD=123 \\\n papermerge/papermerge:3.0.1\n
http://localhost:9400
and you will see login screen:
admin
123
docker run
starts only web UI part. For complete setup you also need one or multiple workers.
"},{"location":"setup/docker/#web-app-worker","title":"Web App + Worker","text":"docker pull papermerge/papermerge:3.0.1\n
version: \"3.9\"\n\n x-backend: &common\n image: papermerge/papermerge:3.0.1\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 12345\n PAPERMERGE__AUTH__USERNAME: john\n PAPERMERGE__AUTH__PASSWORD: hohoho\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n volumes:\n - data:/db\n - index_db:/core_app/index_db\n - media:/core_app/media\n services:\n web:\n <<: *common\n ports:\n - \"12000:80\"\n depends_on:\n - redis\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n volumes:\n data:\n index_db:\n media:\n
http://localhost:12000
.
"},{"location":"setup/requirements/#hardware","title":"Hardware","text":" $ docker --version\n Docker version 24.0.3, build 3713ee1\n
Testing system for Papermerge has following specs:
Papermerge supports PDF, TIFF, JPEG and PNG file formats.
PDF format is called native because Papermerge interals operate as if all documents are PDF.
TIFF, JPEG or PNG on the other hand are not native (non-native) formats.
The import of native format yields one version document - the PDF itself i.e. orignal version.
The import of any non-native formats yields two versions document:
Note
At its core Papermerge code is written to work with PDF files only. All other files (non-natives) are converted, on import, into PDF format.
"},{"location":"user/getting-started/","title":"Getting Started","text":"In this part of the documentation we define important concepts used Papermerge parlance. We highly recommend you to read and understand this section.
"},{"location":"user/getting-started/#document","title":"Document","text":"For Papermerge a document is anything which is a good candidate for archiving - some piece of information which is not editable but you need to store it for future reference. For example receipts - you don't need to edit receipts or read them everyday, but eventually you will need them for your tax declaration. In this sense - scanned documents, which are usually in PDF, JPEG or TIFF format, are perfect match.
If you take a picture of a paper document with your mobile phone - you'll have a file in jpeg format (or maybe png file format). In context of Papermerge that picture of a document (though just a single jpeg file) is a valid one page document.
On the other hand, if you take a picture of a flower and upload that jpeg image to Papermerge - the 'document' will be processed. However, that jpeg format flower image is not a document in Papermerge sense.
Usually office formats with .docx (Microsoft Word), .odt (Libre Office), .txt (plain text) are not good candidates for archiving - as by their nature they are meant to be changed/edit regularly. However, once converted to PDF format (for instance Contract_C2.docx to Contract_C2.pdf) they are full fledged documents in Papermerge sense.
Info
Papermerge works with four file formats: PDF, TIFF, JPEG and PNG.
"},{"location":"user/getting-started/#document-version","title":"Document Version","text":"One document has one or multiple versions. The original document version - is version number 1. For every change applied to the document - a new document version is created with that change applied.
When we say \"change applied to a document\" - we mean things like rotate pages, reorder pages or merge two documents.
The point of document versions is to keep track of changes applied to the document.
"},{"location":"user/getting-started/#folder","title":"Folder","text":"Folder in Papermerge is counterpart of \"folder\" concept used in major computer file manager applications (e.g. Finder in macOS). Folders in Papermerge are, intuitively enough, hierarchical - in other words one folder may contain other folders and/or documents.
"},{"location":"user/getting-started/#node","title":"Node","text":"Node is an abstraction of two concepts: document and folder. Every time you read node, you can mentally replace that term with either document or folder and the statement will still hold.
Bellow is graphical example of Folder, Document, Document Version relationship:
Same hierarchy can be illustrated as nodes:
"},{"location":"user/getting-started/#special-folders","title":"Special Folders","text":"Each user in Papermerge has two special folders: Inbox and Home.
Inbox folder is where all incoming documents land first. Home folder is where all user documents are.
Special folders are top level folders (they don't have parent folder).
Note
Both Inbox
and Home
folders are special only by convention; structurally they are just normal folders. Internally their title is actually \".inbox\" and \".home\". By convention special folders start with dot character.
OCR (Optical Character Recognition) is a technique to extract text information from binary image formats. This technique enables users to:
OCR is essential tool (or technique if you will) which helps basically to extract textual information and thus derive useful work-flows (based on document's actual content) with the documents. Papermerge relies on external open source specialized tools like Google's Tesseract OCR
"},{"location":"user/getting-started/#tags","title":"Tags","text":"Organizing documents in folders is very common. Thus the idea of keeping your documents in folders doesn't need further introduction. The idea of using tags to organize your documents may be new for you though. Tags are kind of labels. You can associate tags to a document or to a folder. Tags have a color and a name.
Once tagged, documents can be searched by their tags. Conversely, is it also possible to show all the documents tagged with a particular tag(s).
Both tags and folders complement each other and provide you with powerful means to stay organized.
"},{"location":"user/getting-started/#page-management","title":"Page Management","text":"Many times scanning documents in bulk yields documents with blank pages; some pages my be out of order, rotated, maybe part of totally different document. Even if you notices these flaws immediately it is time consuming and frustrating to redo scanning process. Papermerge helps you with your scanned documents like no other tool. With Papermerge you can reorder, rotate or even delete pages in case you need to do so.
There is a separate chapter about page management where you can learn details about this feature.
"},{"location":"user/merge-documents/","title":"Merge Documents","text":"Let's first clarify what is meant by documents merging. Merging is the process of combining two documents into one: all pages from the source document are transferred into destination document and then source document is deleted.
On the target document, transferred pages can:
The rest of this documentation chapter describes how to use Papermerge in case 1. For how to use Papermerge in case 2, see :ref:Page Moving <page_move>
.
Figure 1 illustrate this case. Both source (better_scan.pdf) and target (scan_d.pdf) documents have only one version (v1). Both source and target have two pages.
In this case merge result is that in scan_d.pdf document there is a new version created (v2) and new version contains only source pages (BS1 and BS2). Previous pages of scan_d.pdf document (D1, D2) are still available in version 1 (v1 in figure) of the document.
This use case is useful when you scan same document twice and for some reason you want to keep both copies around. Because both copies contain slightly different versions of the same document, it is more practical to keep them as two document versions in one single file. In such case you will avoid duplicate results in search results.
"},{"location":"user/merge-documents/#2-source-pages-are-appended-to-the-target-pages","title":"2. Source pages are appended to the target pages","text":"Figure 2 illustrate this case. Both source (better_scan.pdf) and target (scan_d.pdf) documents have only one version (v1). Also, both source and target have two pages.
In this case result is that in scan_d.pdf document there is a new version created (v2) and new version contains now four pages: BS1, BS2, D1, D2. Previous version of scan_d.pdf document (v1) has contains two pages: D1 and D2.
This scenario is special case of 'moving pages' between documents with all pages selected on the source. How to use Papermerge in this scenario is described in detail in Moving Pages section.
Important
When merging two documents, one of them (source) is deleted. That's why, it is very important that when you merge two documents, you correctly choose which one is the source and which one is the target.
Now, that you understand what is meant exactly by \"document merging\", let's see how you can merge document with Papermerge.
"},{"location":"user/merge-documents/#dual-panel","title":"Dual Panel","text":"In order to merge two documents in Papermerge you need to open each of them in two panels:
In one of the panels, the one which you want to be the source, right click the mouse button to open the context menu.
Important
Merge Documents context menu item will be displayed only if there are no selected pages.
In Figure 3, notice the direction of the arrow icon just before \"Merge Document\". The arrow icon points from source to the target. In Figure 3, context menu was opened in left panel, this means that document opened in left panel (better_scan.pdf) is the source. On the other hand if we would open context menu in right panel, then the arrow will point from right to left - which also implies that in such case document opened in the right panel would be the source.
Click the \"Merge Document\" context menu item. After you confirm the operation, the source document (better_scan.pdf) will be merged with scan_d.pdf.
"},{"location":"user/ocr/","title":"OCR","text":"OCR is the process which extracts text information from the scanned document and makes them searchable.
By default, ocr process is triggered automatically on document file upload. The OCR process status is indicated by little circle next to document's title. When OCR process is completed new document version is created and document becomes searchable.
"},{"location":"user/ocr/#automatic-ocr","title":"Automatic OCR","text":"By default OCR is triggered automatically when document is uploaded. However, you can disable automatic OCR triggering, in such case you can start OCR only when you consider necessary.
Important
Documents for which OCR was skipped - are not searchable!
In order to disable automatic OCR, go to User Menu -> Preferences -> OCR -> Trigger -> Manual
"},{"location":"user/ocr/#default-ocr-language","title":"Default OCR Language","text":"In order to perform OCR on the document you need to indicate beforehand the language of respective document. Choosing ocr language for each and every document uploaded is tedious - instead, in preferences a default OCR Language is set - and that language is applied for each uploaded document.
In order to set default OCR language, go to User Menu -> Preferences -> OCR -> Language
"},{"location":"user/ocr/#status-indicator","title":"Status Indicator","text":"Papermerge features real time OCR status indicator - this means that you can see document's OCR status updates as they happen (i.e. in real time). The OCR status is displayed by a small circle next to the document's title. The status indicates has following meanings:
"},{"location":"user/ocr/#ocred-text-layer","title":"OCRed Text Layer","text":"
Once OCR process completed successfully a new document version is created - version with OCRed text layer. This version is available for download from the Download
dropdown in document view.
Note
Under the hood Papermerge uses awesome OCRmyPDF utility to create OCRed text layer. Thus, in respect of OCRed text layer, Papermerge acts like a graphical user interface for OCRmyPDF.
"},{"location":"user/ocr/#document-ocred-text","title":"Document OCRed Text","text":"You can view OCRed text of the entire document either from commander or from viewer, in both cases choose \"OCRed Text\" from context menu:
If you want to see OCRed text of entire document (to be exact - all pages of the last document version) from the viewer - just make sure that no pages are selected:
"},{"location":"user/ocr/#selected-pages-ocred-text","title":"Selected Pages OCRed Text","text":"In case document has many pages and you are interested in OCRed text of one (or multiple) very specific pages, then select pages first and then from context menu choose \"OCRed Text\" item:
Note
In case there are selected pages, OCRed Text menu item will show you OCRed text ONLY of the selected pages.
"},{"location":"user/ocr/#ocr-languages-support","title":"OCR Languages Support","text":"Papermerge uses Tesseract to extract text from scanned documents. Tesseract supports over 130 languages - thus with Papermerge you can have documents in any of those languages.
"},{"location":"user/page-management/","title":"Page Management","text":"Many times scanning documents in bulk results in documents with blank pages; some pages maybe out of order or maybe part of totally different document. Even if you notice these problems immediately it is time consuming to redo scanning process. Wouldn't it be nice to fix out of order pages without scanning all docs again?
Page management is set of features which helps to fix scanning process errors. In other words you can delete, reorder, rotate, and extract pages within document(s).
Every time one of the operations described in this section is applied - a new document version is created. Because of this, the changes you apply on the document like rotate, delete, extract, reorder, do not destroy the document, in other words page management is non-destructive process.
Note
In order perform any of operations described below (delete, reorder, rotate or extract) you need to have Change Permission on respective document. You have automatically granted Change Permission on the documents you uploaded (because you own the documents uploaded by you).
"},{"location":"user/page-management/#delete","title":"Delete","text":"You can delete specific pages (for instance blank pages) from the document. Although many scanners have automatic \"remove blank pages\" feature, many times they get confused of what a blank page is. In case your scans end up with undesired blank pages you can easily remove those pages.
In order to delete a page, you need to select desired page by clicking on it, then Right Click--> Delete Page
.
Every time you delete one or several pages, document version is incremented by one. For instance if document Invoice-X56.pdf currently has four pages and the document latest version is version 1, then, after deleting one page - document latest version will be 2. Thus document's version 1 has all four pages and document version 2 has three pages:
"},{"location":"user/page-management/#reorder","title":"Reorder","text":"Out of order pages occur very often during scanning process. Papermerge empowers users to change pages order within the document.
For instance, in figure below you can see that pages 2 and 4 are out of place. To correct pages' order use drag 'n drop. For example grab page 2 and drop it in correct position, and then do same thing with page 3:
For these changes to take effect you need to click 'Apply Changes' button.
Warning
Document pages reorder will only be saved when you click 'Apply Changes'
Similarly to deleting pages, every time you save new pages order, document version will be incremented (i.e. advanced by one).
"},{"location":"user/page-management/#rotate","title":"Rotate","text":"Often scanned pages are upside down or maybe rotated 90\u00b0 (degrees). In order to quickly fix that, select one or multiple pages you want to rotate and then Right Click --> Rotate --> 180\u00b0 CCW
(or 90\u00b0 CW, 90\u00b0 CCW depending on your specific case):
Note
CW stand for clockwise. CCW stands for counter-clockwise.
Similarly to page deletion and page ordering, every time you rotate a page, document version will be incremented (i.e. advanced by one).
Warning
After page rotation you have to re-run OCR for the document. It is because if page was upside down when ingested, the OCR operation won't make sense of it and thus won't be able to extract text (and then index) from that page. After you have manually fixed the page (by correctly rotating it) - OCR will be able to extract and index page's contented.
"},{"location":"user/page-management/#move-document-to-document","title":"Move (Document to Document)","text":"You can move one, multiple or even all strayed pages from one document (source) to another (target). If you choose to move all documents from the source - the source will be deleted, because it does not make sense to have \"document with zero pages\".
When moving pages between documents you will be prompted to choose between two different move strategies:
The outcome between replace vs append strategies is illustrated below:
The difference is outcome of the B.pdf (target). With replace strategy, the document B.pdf ended up having two pages (which replaced previous ones), while with append strategy the document B.pdf ended up with four pages as source pages were appened to the existing ones.
Note
What happens if you select all source pages, i.e. when you
select A1, B1, B2, A2? In such case - source document (A.pdf) will be deleted, because it does not make any sense to have a document with zero pages. For the target document (B.pdf) this case does not make any difference, as the outcome is always the same.
Note
Use case when you select all pages and chose \"replace strategy\" has same outcome as merging documents.
Now, that \"theory\" is clear, let's move on to the practical part and see Papermerge in action. First of all, note that in Papermerge you can move pages between documents either using context menu or by using drag 'n drop.
Tip
You can also move pages between documents with REST API as well
"},{"location":"user/page-management/#use-context-menu","title":"Use Context Menu","text":"In order to move pages between documents, using context menu:
thumbnails panel <Thumbnails_Panel>
of the source document viewerContext menu is dynamic - which means it renders only relevant menu items. If for example you have in one panel opened :ref:document viewer <Viewer>
while other panel is in :ref:commander mode <Commander>
, then \"there will be \"extract\" menu item instead of \"move\". In other words, \"move\" menu item will be visible only if:
viewer mode <Viewer>
Important
The arrow next to the \"Move\" menu item changes direction depending in which panel you invoke context menu - it hints direction of the pages transfer. Arrow icon of the \"Move\" item always points from source to target.
"},{"location":"user/page-management/#drag-n-drop","title":"Drag 'n Drop","text":"In example illustrated in pictures below there are two documents:
During scan page B1 wrongly ended up in document A, although it belongs to document B.
Note
A page that during the scan ended up in wrong document is called strayed page. In example above, page B1 is strayed page.
In order to fix this scanning issue, you need open documents in two panels and then drag 'n drop page B1 from document A (source) to document B (target):
Note
Pages are moved immediately after 'mouse drop' i.e. there is no need to 'click apply button' as in re-order operation
Note
Both documents' (source and target) version will be incremented by one
"},{"location":"user/page-management/#extract-document-to-folder","title":"Extract (Document to Folder)","text":"Page extraction is moving page out of the document as completely new document. It differs from page moving <page_move>
because the destination is a folder, not a document.
You can extract one or multiple pages at once. Pages can be extracted:
Note that in Papermerge you can extract pages either using context menu or by using drag 'n drop.
Tip
You can also extract pages by using REST API
"},{"location":"user/page-management/#using-context-menu","title":"Using Context Menu","text":"In order to extract pages from the document, using context menu:
Because of the dynamic nature of the context menu, \"Extract\" menu item will be visible only if all of the following conditions are true:
viewer mode
while another is in commander mode
After you've clicked \"Extract\", the \"Extract Pages\" modal dialog will prompt you for additional details like title of the newly created document(s) and if you want to extract all pages as you or multiple documents:
A couple of notes here. First, newly created document will have extension \".pdf\", you cannot change that. Second, if \"Extract each page into separate document\" is checked, each pages will be, obviously enough, extracted as separate one page documents, otherwise all extracted pages will be placed into a single document in the target folder.
Note
Papermerge will try to make sure that newly created documents feature unique name. Thus if you choose to extract, say, two pages as separate documents, Papermerge will append to the title an UUID number. In case you choose to extract two pages into a single document - no UUID number will be appended. In case you leave \"Title format\" field empty, Papermerge will generate an unique title for you.
"},{"location":"user/page-management/#using-drag-n-drop","title":"Using Drag 'n Drop","text":"Let's show how page extraction works by example. Say we have one document - document A - with following pages: A1, A2, B1, B2, A3. What we want to do is to extract pages B1 and B2 into a new document. As mentioned above there are two cases:
In order to extract pages B1 and B2 into one single new document you need to uncheck 'Extract each page into separate document' checkbox in modal dialog:
Similarly to other operations document A's (source document) version is incremented by one.
"},{"location":"user/page-management/#ocr-data","title":"OCR Data","text":"Do you need re-run OCR after document's page was moved/rotated/extracted/deleted ?
In short - no, you don't need to re-run OCR. The only exception is page rotation. Every time you rotate a page in the document, you need to re-run OCR for that document. It actually makes sense, because if page was upside down when document was ingested, the OCR operation won't make much sense of it and thus won't be able to extract any text data from the page. Once you correct that part manually (rotate page), you re-run OCR so that correct text will be extracted and then indexed.
Note
Generally speaking you don't need to re-run OCR after performing page management operations. The only exception from this rule is page rotation.
For longer answer, let's clarify first what OCR data is. OCR data is: text information extracted from the document by OCR and associated with that document. That text information is stored in both database and on filesystem.
When one page is moved from one document into another (or when page is deleted), the text associated with source (or target) document changes as well. For example, say document fruits.pdf has three pages: apples, oranges and bananas, i.e each page has only one word page 1 has work apples etc. You can find document fruits.pdf by searching 'apples' (will match first page), 'orages' (will match second page) or bananas (will match last page).
After you extract first page (apples) from document fruits.pdf into another document, searching by term 'apples' should not reveal document 'fruits.pdf' - because term/page 'apples' is not part of it anymore.
In order to keep text information associated with document fruits.pdf up to date, there are at least two possibilities:
From technical point of view 1. is very easy to implement but very inefficient in terms of computing power. Think that you have 100 pages document and you delete one blank page - what a waste of CPU resources to re-OCR entire document when OCR data is already available!
The second possibility (point 2.) is very challenging to implement, but extremely efficient - you need to run OCR on the document only once (maybe twice, in case you decide to fix couple of pages by rotating them).
Papermerge decided on 2. in other words, Papermerge reuses already extracted OCR data and updates it accordingly every time you re-order/move/extract/delete pages.
The result is that whatever page management operation you perform the search results are always up-to-date without the need to re-OCR the document! As mentioned above the only exception are page rotations.
Below is illustrated the case of three page fruits.pdf document with apples/oranges/bananas content. Initially search term 'apples' will reveal fruits.pdf document (from Inbox). After 'apples' page was extracted into separate document (found in Home/My Documents folder) search term 'apples' correctly reveals new document! Notice here that search index is updated instantaneously:
"},{"location":"user/search/","title":"Search","text":"Papermerge offers an extensive searching mechanism that is designed to allow you to quickly find a document you're looking for.
When you search Papermerge for a document, it tries to match this query against your documents. Papermerge will look for matching documents by inspecting their content, title, and tags.
Note
Papermerge searches only in content of the last version of the document
By default, Papermerge returns only documents which contain all words typed in the search bar. However, Papermerge also offers additional search syntax if you want to drill down the results further.
Matching inexact words:
*5951\n
Will return document with title: brother_005951.pdf
Matching specific tags:
tags:paid\n
will return documents with tag \"paid\"
"},{"location":"user/tags/","title":"Tags","text":"Tags are sort of labels. You can associate tags to a document or to a folder. Tags have a color and a name. Once tagged, documents can be searched by their tags.
"},{"location":"user/upload-documents/","title":"Upload Documents","text":"There are multiple ways to upload documents to your Papermerge instance: web UI, command line utilities, REST API.
The obvious way is via web UI. After briefly explaining how to upload documents from web user interface, this page will dive into more interesting parts: command line utilities and REST API.
"},{"location":"user/upload-documents/#web-ui","title":"Web UI","text":"Uploading documents via user interface is the most straightforward method, just click upload
button:
Documents will be uploaded into your current folder. Current folder is considered the one which you currently see as opened in web UI:
Also, instead of using upload button, you can drag'n drop documents from your desktop file manager into Papermerge's web ui.
Warning
Currently drag 'n drop feature does not work for folders, in other words you can drag 'n drop only documents. If you want to import an folders with entire content preserved - use papermerge-cli
described in next paragraph.
You can upload documents and folders from your local filesystem using papermerge-cli command line utility:
papermerge-cli import /path/to/local/folder/\n
Note that papermerge-cli
will import all content of /path/to/local/folder/ directory recursively i.e. it will preserve the structure of local folder in Papermerge as well.
You can upload one document by providing path to the document:
papermerge-cli import /path/to/document.pdf\n
Note
By default all imported documents and folders will end up inside user's Inbox folder.
For more information about papermerge-cli
check papermerge-cli section.
For uploading documents you can use directly REST API. You can access REST API swagger schema definition from user menu (upper right corner of the web UI). In order to upload a document there are two steps:
For step 1. use POST /nodes/
REST API endpoint. For step 2 use POST /documents/<doc-uuid>/upload
REST API endpoint, where <doc-uuid>
is the ID of the node created in first step.
Papermerge comes with a simple and intuitive user interface (UI) layout. The UI is divided into four areas:
Commander (in figure 1. marked with number 4.) is designed to have similar look and feel of modern desktop file browsers. This is the place where you browse your documents and folders.
In order to assist you to quickly move around documents, folders and pages - there is a special mode - dual panel model. In dual panel mode there are two panels displayed side by side. Between two panels documents (as well as folders and pages) can be moved with one simple drag'n drop. Figure below shows how dual panel mode looks like:
In order to switch to dual panel mode, use Commander's upper right button:
To switch back to single panel mode, use close button - which is in the upper right corner of one of the panels:
Note
Close button will be displayed only on one of the panels. Although both panels look and feel exactly the same, internally application still distinguishes them as main and secondary one. Main panel is the one which is always visible and secondary panel is the on which opens and closes i.e. the one with \"close button\" in upper right corner.
"},{"location":"user/user-interface/#commander","title":"Commander","text":"Commander or Commander Panel is one of the two available panels. Commander is the panel which shows documents and folders - modern web based file browser if you will.
"},{"location":"user/user-interface/#viewer","title":"Viewer","text":"Viewer or Viewer Panel or Document Viewer is one of the two available panels. Viewer is the panel in which document is opened.
There can be two Viewers opened side by side. This mode (i.e. dual panel mode with a Viewer in each panel) is very handy when it comes to moving pages between documents.
"},{"location":"user/user-interface/#thumbnails-panel","title":"Thumbnails Panel","text":"Document viewer features a thumbnails panel which can be toggles on and off. Pages can be selected only inside thumbnails panel; also pages can be dragged/dropped only from thumbnails panel.
"},{"location":"user/user-management/","title":"User Management","text":"Papermerge is multi user system.
The most privileged user (which has all permissions) is called superuser
. Brand new Papermerge instance ships with one default user - which happens to be superuser
. With default user, you can add as many users as you wish.
Note
You can add multiple superusers
as well
Papermerge is a non-destructive DMS, which means you always have available original document regardless how many transformations (page rotations, deletion, document merges) you apply on the document.
Retention of the original is ensured because of document versioning feature. With each extra transformation you apply - a new document version is created.
Version 1 (one) of the uploaded document is the original file i.e. document without any changes applied. Original document version is always available, regardless what operation(s) you apply to the document (except deletion of the document itself).
Any page management OCR operation on the document will increment (increase by one) its version.
"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Papermerge DMS","text":"Papermerge DMS or simply Papermerge is a open source document management system designed to work with scanned documents (also called digital archives). It extracts text from your scans using OCR, indexes them, and prepares them for full text search. Papermerge provides look and feel of modern desktop file browsers. It has features like dual panel document browser, drag and drop, tags, hierarchical folders and full text search so that you can efficiently store and organize your documents.
It supports PDF, TIFF, JPEG and PNG document file formats. Papermerge is perfect tool for long term storage of your documents.
"},{"location":"#features-highlights","title":"Features Highlights","text":"For Papermerge a document is anything which is a good candidate for archiving - some piece of information which is not editable but you need to store it for future reference. For example receipts are good examples - you don't need to read receipts everyday, but eventually you will need them for your tax declaration. In this sense - scanned documents, which are usually in PDF or TIFF format, are perfect match.
PDF (Portable Document Format) is de facto standard for storing archived documents. In correct technical terms - it is PDF/A subset. PDF/A differs from PDF by prohibiting features unsuitable for long-term archiving, such as font linking and encryption.
Most of the modern office scanners will output scanned files in PDF/A format. This is why, PDF is practically synonymous for document in context of Papermerge.
A picture made with smart phone of A4 paper document is regarded by Papermerge as document.
"},{"location":"backup-restore/","title":"Backup/Restore","text":"Papermerge docker image is shipped with backup and restore utilities. Shipped utility will backup all your folders, documents with their associated versions and OCR data, tags and users. Search engine index is not included in backup though.
Note
User passwords are included in backup file as well. Passwords are stored as digests.
"},{"location":"backup-restore/#backup","title":"Backup","text":"Backup your documents with following command:
$ docker exec <papermerge-server-container> backup.sh <optional-location>\n
where <optional-location>
is the path to file or folder inside container where to save backup file. If location is not provided, backup file will be saved in /core_app/ folder - the papermerge core application's current folder.
Example:
$ docker ps --format '{\\{.ID\\}} {\\{.Command\\}} {\\{.Names\\}}'\n\n 914dda21dd3d \"/run.bash server\" 091223_30-web-1\n 42095cee91f0 \"docker-entrypoint.s\u2026\" 091223_30-solr-1\n d65b3205d9ec \"/run.bash worker\" 091223_30-worker-1\n ac5cfd76993a \"docker-entrypoint.s\u2026\" 091223_30-redis-1\n 8ad6d0a7eb6c \"/opt/bitnami/script\u2026\" 091223_30-db-1\n
In above example the Papermerge has 5 containers: app server (the core or web or http or REST API server, pick the name you like :P), solr search engine, redis, database and finally one paper worker.
To create a backup in root folder of the app container just run:
$ docker exec 914dda21dd3d backup.sh /\n
When above command is ready, check that backup file was created:
$ docker exec 914dda21dd3d ls /\n\nauth_server_app\nbackup_10_12_2023-11_30_37.tar.gz\nbin\nboot\n...\ncore_app\ncore_ui\ndb\n...\nusr\nvar\n
Backup file is backup_10_12_2023-06_30_37.tar.gz. Now you can copy backup file to your local filesystem:
$ docker cp 914dda21dd3d:/backup_10_12_2023-06_30_37.tar.gz .\n
You may choose to name file differently:
$ docker exec 914dda21dd3d backup.sh /my-daily-backup.tar.gz\n
Then copy it to your local filesystem:
$ docker cp 914dda21dd3d:/my-daily-backup.tar.gz .\n
Note
Backup files are gzipped tar archives, thus you probably want to append \".tar.gz\" to their name.
"},{"location":"backup-restore/#restore","title":"Restore","text":"When you plan to restore previous backup, we suggest to start with new Papermerge instance, with only one superuser (which is created by default anyway). Make sure there are no documents in the new instance.
For sake of example, let's say the superuser's username is \"admin\". For restoring use restore.sh
command:
$ docker exec <papermerge-server-container> restore.sh <backup-file>\n
For that to work, you need first to copy backup archive file to core (server) container. Sticking with example from previous section:
$ docker cp my-backup.tar.gz 914dda21dd3d:/my-backup.tar.gz\n$ docker exec 914dda21dd3d restore.sh /my-backup.tar.gz\n
If \"admin\" user already existed in backup file, then admin's password will be set to the one from the backup file.
"},{"location":"backup-restore/#backup-file-structure","title":"Backup File Structure","text":"The backup file is a gzipped tar archive with following content:
backup.json
fileocr/
folderdocvers/
folderusername1
/, username2
, ... i.e. one folder per user with folder title being user's usernamebackup.json
file contains all necessary info to restore the database i.e. all users, their nodes, tags etc.
docvers/
contains actually document versions files. Your documents are here.
ocr/
contains OCR data of each individual page in the document.
User folders mentioned in point 4. are provided for convenience, so that you may quickly get an understanding of the folder structure and their content. Each file in user folder is actually a symbolic link pointing to the last version of the document (from docvers
).
Warning
Each user has two special folders: .home
and .inbox
; special folder's title start with dot. If you open backup archive in file browsers which hides dot files (file starting with dot character) - the content of user folder may appear empty! When opening backup archive make sure you set 'show hidden files flag' on.
For complete changelog see changelog file in github repository.
"},{"location":"cli/cli/","title":"Papermerge CLI","text":"Command line utility which uses REST API to interact with your Papermerge instance. It can be used to upload documents from local filesystem to yout Papermerge instance.
"},{"location":"cli/cli/#requirements","title":"Requirements","text":"In order to use papermerge-cli
you need to have python installed. You need python version >= 3.10.
Install papermerge-cli
with following command:
pip install papermerge-cli\n
pip is package installer for python - it usually comes with python interpreter. In order to install pip on Ubuntu use following command:
sudo apt install python3-pip\n
"},{"location":"cli/cli/#configuration","title":"Configuration","text":"Papermerge Cli is configured via environment variables:
PAPERMERGE_CLI__HOST
PAPERMERGE_CLI__TOKEN
as the name suggests, the first one is the host of the REST API server and second value is the REST API token.
REST API server should be specified with http://
or https://
prefix, but without the /api
suffix. Valid values examples: http://papermege.local, https://my-dms.papermerge.de.
Note
The host may or may not contain the /
. E.g. http://papermege.local and http://papermerge.local/ are both valid values and point to the same host
To get REST API token follow these instructions.
"},{"location":"cli/cli/#ls","title":"ls","text":"List the content of you home folder:
papermerge-cli ls\n
In order to list content of specific folder (including inbox folder)::
papermerge-cli ls --parent-uuid=UUID-of-the-folder\n
"},{"location":"cli/cli/#me","title":"me","text":"In order to see current user details use me
command:
papermerge-cli me\n
"},{"location":"cli/cli/#import","title":"import","text":"Recursively imports documents and folders from local filesystem. For example, in order to import recursively all documents from local folder:
papermerge-cli import /path/to/local/folder/\n
You can also import one single document:
papermerge-cli import /path/to/some/document.pdf\n
By default all documents are imported to your user's .inbox
folder. If you want to import to another folder, use --target-uuid
:
papermerge-cli import /path/to/some/document.pdf --target-uuid <uuid>\n
In order to learn UUID of the folder you want to import to use papermerge-cli ls
command. To get UUIDs of .home
and .inbox
folders, use papermerge-cli me
command.
If you want the local copy the uploaded documents to be deleted after successful import - use --delete
flag:
papermerge-cli import --delete /path/to/folder/\n
Danger
Be careful with --delete
flag! When present,
papermerge-cli
will irreversible delete the local copy of all documents and folders in the /path/to/folder/
!
Danger
Always, before using this flag make safe backup of the documents to be uploaded!
!!! note: --delete
flag deletes the local copy of the documents/path to import after successful upload - this means that even if though you local copy of the documents vanished - the originals are still available in Papermerge!
In order to get general help about the command use:
papermerge-cli --help\n
In order to get help for individual commands, place --help
flag after the command:
papermerge-cli import --help\n
"},{"location":"cli/overview/","title":"Overview","text":"This section describes a set of command line utilities which can interact (e.g. import documents to, list nodes etc) with your Papermerge instance.
What is common to all command line utilities listed here is that they all use REST API interface. In order to use REST API you need to know:
Host address should be provided with http://
or https://
prefix.
Examples:
Note
REST API server may or may not end with /
character. Thus, both http://papermerge.local and http://papermerge.local/ are valid.
Currently there is no web UI for getting your user's token. The only way to get REST API token is by running docker command.
Click here for details.
"},{"location":"contributor/docker/","title":"Docker","text":"There is docker image for development mode. Docker image is tagged with 3.0devX
. With dev image, you can get feedback of your source code changes without needing to install any dependency or any deveopment environment configurations.
All examples described below assume that you got Papermerge source code and you are in root repository of source code folder:
$ git clone git@github.com:papermerge/papermerge-core.git PapermergeSourceCode\n$ cd PapermergeSourceCode\n
"},{"location":"contributor/docker/#web-app","title":"Web App","text":"This is the simplest local dev scenario, you start docker compose file only with web app i.e. REST API server + ui.
The go to the folder where source was cloned and create following docker compose file:
version: \"3.9\"\n\nservices:\n backend:\n image: papermerge/papermege:3.0dev # check the latest dev image number in dockerhub!\n volumes:\n - ./papermerge:/core_app/papermerge/\n - ./ui:/core_ui/\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: 1234\n ports:\n - \"11000:80\"\n
Assuming you are in root folder of the source code, above docker compose will mount the source code to the correct location in docker image. The appliction will be accessible to on local port 11000.
Here is docker compose file for the case when you want to build dev docker image yourself:
version: \"3.9\"\n\nservices:\n backend:\n build:\n context: .\n dockerfile: docker/dev/Dockerfile\n volumes:\n - ./papermerge:/core_app/papermerge/\n - ./ui:/core_ui/\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: 1234\n ports:\n - \"11000:80\"\n
"},{"location":"contributor/docker/#web-app-worker","title":"Web App + Worker","text":"Following docker compose adds worker service. Worker and Web App communicate via redis (message broker), thus we need to add redis service as well:
version: \"3.9\"\n\nx-backend: &common # yaml anchor definition\n build:\n context: .\n dockerfile: docker/dev/Dockerfile\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: 1234\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n volumes:\n - ./papermerge:/core_app/papermerge/\n - ./ui:/core_ui/\n - data:/db\n - index_db:/core_app/index_db\n - media_root:/core_app/media\n\nservices:\n web:\n <<: *common\n ports:\n - \"11000:80\"\n depends_on:\n - redis\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n\nvolumes:\n data:\n index_db:\n media_root:\n
"},{"location":"contributor/docker/#logging-config","title":"Logging Config","text":"Both worker and web app read their logging configurations from file pointed by PAPERMERGE__MAIN__LOGGING_CFG
environment variable. An example of custom logging config would be:
version: 1\ndisable_existing_loggers: true\n\nformatters:\n verbose:\n format: '%(asctime)s %(levelname)s %(name)s.%(funcName)s %(message)s'\n\nhandlers:\n console:\n level: DEBUG\n class: logging.StreamHandler\n formatter: verbose\n\nloggers:\n auth_server:\n level: DEBUG\n handlers: [console]\n papermerge.search.tasks:\n level: DEBUG\n handlers: [console]\n propagate: no\n format: verbose\n
You may recognize it. It should be YAML version of python logging config.
Here is an example of docker compose with web wepp + worker + custom logging configuration:
version: \"3.9\"\n\nx-backend: &common\n build:\n context: .\n dockerfile: docker/dev/Dockerfile\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: 1234\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n PAPERMERGE__MAIN__LOGGING_CFG: /logging.yml # <-- absolute path to custom config file\n volumes:\n - ./papermerge:/core_app/papermerge/\n - ./ui:/core_ui/\n - data:/db\n - index_db:/core_app/index_db\n - media_root:/core_app/media\n - ./custom_logging.yml:/logging.yml # mount local logging config file\n\nservices:\n web:\n <<: *common\n ports:\n - \"11000:80\"\n depends_on:\n - redis\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n\nvolumes:\n data:\n index_db:\n media_root:\n
"},{"location":"contributor/docker/#solr","title":"Solr","text":"Papermerge is shipped with a default search library - Xapian.
However, you may opt-in to use full fledged search engine like Solr. In order to change search backend, use PAPERMERGE__SEARCH__URL
env variable:
version: \"3.9\"\n\nx-backend: &common\n build:\n context: .\n dockerfile: docker/dev/Dockerfile\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: 1234\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n PAPERMERGE__SEARCH__URL: solr://solr:8983/pmg-index # <- use Solr's \"pmg-index\" index\n volumes:\n - ./papermerge:/core_app/papermerge/\n - ./ui:/core_ui/\n - data:/db\n - index_db:/core_app/index_db\n - media_root:/core_app/media\n\nservices:\n web:\n <<: *common\n ports:\n - \"11000:80\"\n depends_on:\n - redis\n - solr\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n solr:\n image: solr:9.3\n ports:\n - \"8983:8983\"\n volumes:\n - solr_data:/var/solr\n command:\n - solr-precreate\n - pmg-index # <- creates index at startup of the Solr service\n\nvolumes:\n data:\n solr_data:\n index_db:\n media_root:\n
Notice that Solr is started with solr-precreate pmg-index
command, which means that Solr service will be started with pre-created index named pmg-index
.
Here is an example of docker compose which uses PostgreSQL as database:
version: \"3.9\"\n\nx-backend: &common\n build:\n context: .\n dockerfile: docker/dev/Dockerfile\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: 1234\n PAPERMERGE__DATABASE__URL: postgresql://postgres:123@db:5432/postgres\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n PAPERMERGE__SEARCH__URL: solr://solr:8983/pmg-index\n volumes:\n - ./papermerge:/core_app/papermerge/\n - ./ui:/core_ui/\n - index_db:/core_app/index_db\n - media_root:/core_app/media\n depends_on:\n - redis\n - solr\n - db\n\nservices:\n web:\n <<: *common\n ports:\n - \"11000:80\"\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n solr:\n image: solr:9.3\n ports:\n - \"8983:8983\"\n volumes:\n - solr_data:/var/solr\n command:\n - solr-precreate\n - pmg-index\n db:\n image: bitnami/postgresql:14.4.0\n volumes:\n - postgres_data:/var/lib/postgresql/data/\n environment:\n - POSTGRES_PASSWORD=123\n\nvolumes:\n postgres_data:\n solr_data:\n index_db:\n media_root:\n
"},{"location":"contributor/docker/#oauth-20","title":"OAuth 2.0","text":""},{"location":"rest-api/overview/","title":"Overview","text":"Papermerge provides very powerful REST API. In order to user REST API, you need REST API server URL and user token.
REST server URL is the http address of your running instance. HTTP address also include the scheme.
Examples of REST API server URLs:
Currently there is no web UI to get token, the only way to get REST API token is by running docker command. See next section for details.
"},{"location":"rest-api/reference/","title":"Reference","text":"Papermerge REST API is exposed via Open API standard.
Papermerge ships with swagger REST API documentation reference. You can access it in your running Papermerge instance from user menu -> REST API:
"},{"location":"rest-api/token/","title":"REST API Token","text":"Currently there is no web UI for getting the REST API token. Instead, you get the REST API token by running one docker command:
docker exec <papermege-container> create_token.sh <username>\n
You can list users in Papermerge with following command:
docker exec <papermege-container> list_users.sh\n
Example:
$ docker ps --format '{\\{.ID\\}} {\\{.Command\\}} {\\{.Names\\}}'\n\nd8b965388fd9 \"/run.bash server\" fordoc-web-1\n8fb8f6f565a2 \"/run.bash worker\" fordoc-worker-1\n8a42db0bb7f9 \"/opt/bitnami/script\u2026\" fordoc-db-1\n8a6146801936 \"docker-entrypoint.s\u2026\" fordoc-redis-1\n
In above example the Papermerge has four containers: app server, redis, database and one worker. For our purpose we need app container (in example above - fordoc-web-1). Let's list all users first:
$ docker exec d8b965388fd9 list_users.sh\n\nusername=john email=admin@example.com\n
There is only one user with username \"john\". In order to get REST API token for user \"john\" run following command:
$ docker exec d8b965388fd9 create_token.sh john\n\neyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJqb2huIiwidXNlcl9pZCI6IjJiODQwY2RhLThjMmYtNDExYy05NDYwLTY0ZDA3YWY3YTJiZSIsImV4cCI6MTcwMzM1MTUzNn0.KJAL9TjRiV63liwVO5bh9GQ_I_QFXMoviKV9Lww3cDs\n
The long list of characters from above is the token for user with username \"john\".
"},{"location":"settings/auth/","title":"Auth","text":""},{"location":"settings/auth/#auth__username","title":"AUTH__USERNAME","text":"Username for the superuser. Default value is admin
.
Example:
PAPERMERGE__AUTH__USERNAME=john\n
"},{"location":"settings/auth/#auth__password","title":"AUTH__PASSWORD","text":"Password for the superuser. No default value.
Example:
PAPERMERGE__AUTH__PASSWORD=topsecret\n
"},{"location":"settings/auth/#auth__email","title":"AUTH__EMAIL","text":"Email for the superuser. Default value is admin@example.com
.
Example:
PAPERMERGE__AUTH__EMAIL=john@mail.com\n
"},{"location":"settings/database/","title":"Database","text":"This sections is for database configurations. Papermerge supports following databases SQLite3, PostgreSQL, MySQL/MariaDB.
"},{"location":"settings/database/#database__url","title":"DATABASE__URL","text":"For PostgreSQL the database URL is given in following format:
postgresql://USER:PASSWORD@HOST:PORT/NAME\n
Example:
postgresql://scott:tiger@db:5432/mydatabase\n
For MariaDB and MySQL the URL scheme is mysql
.
Example:
mysql://myuser:mypass@db:3306/paperdb\n
For SQLite the format is: sqlite:///PATH
.
Example:
sqlite:////db/db.sqlite3\n
Default value is sqlite:////db/db.sqlite3
, in other words, if DATABASE__URL
is missing, Papermerge will use SQLite with /db/db.sqlite3
as db file.
Note
Both web_app and worker must have same PAPERMERGE__DATABASE__URL
.
Applies only for Tivoli. Set database connections pool size. Defaults to 5.
Note
This configuration option applies only to Tivoli, which is internal JWT token validator component. Tivoli uses SQLAlchemy as ORM. SQLAlchemy has built-in database connections pooling. The core app though, uses Django ORM - which does not have built-in pooling capabilities.
"},{"location":"settings/main/","title":"Main","text":""},{"location":"settings/main/#main__media_root","title":"MAIN__MEDIA_ROOT","text":"Absolute filesystem path to the directory that will hold user-uploaded documents.
Example:
PAPERMERGE__MAIN__MEDIA_ROOT=/var/www/example.com/media/\n
"},{"location":"settings/main/#main__logging_cfg","title":"MAIN__LOGGING_CFG","text":"Absolute filesystem path to the yaml file that will hold logging detailed configuration. Content of logging configuration file is expected to be in yaml format and it is very python specific.
Example:
PAPERMERGE__MAIN__LOGGING_CFG=/etc/papermerge/logging.yaml\n
"},{"location":"settings/main/#main__timezone","title":"MAIN__TIMEZONE","text":"Which timezone to use.
Example:
PAPERMERGE__MAIN__TIMEZONE=Europe/Berlin\n
"},{"location":"settings/ocr/","title":"OCR","text":"This section groups all OCR specific configurations.
"},{"location":"settings/ocr/#ocr__default_language","title":"OCR__DEFAULT_LANGUAGE","text":"By default Papermerge will use language specified with this option to perform OCR. Change this value for language used by majority of your documents. For detailed list of three letter codes see 639-2/T column from ISO 639 2.
Example as environment variable:
PAPERMERGE__OCR__DEFAULT_LANGUAGE=spa\n
Default value is \"deu\" (German language).
"},{"location":"settings/overview/","title":"Overview","text":"Papermerge loads its settings from environment variables.
Environment variables have following format:
PAPERMERGE__<section>__<name>\n
double underscores are used as delimiter; environment variables names must be all upper case.
Only required environment variables are:
PAPERMERGE__SECURITY__SECRET_KEY
is the key to securing signed data \u2013 it is vital you keep this secure, or attackers could use it to generate their own signed values.
PAPERMERGE__AUTH__PASSWORD
is the password for super user (administrative user or admin user). Super user is created automatically for you when Papermerge starts for the first time.
Note
In the documentation, for brevity sake, PAPERMERGE__
prefix may be omitted. For example docs may say: default value for DATABASE__URL
is \"sqlite:////db/db.sqlite3\"; what is meant actually is: default value for PAPERMERGE__DATABASE_URL
is \"sqlite:////db/db.sqlite3\".
Papermerge uses redis.
"},{"location":"settings/redis/#redis__url","title":"REDIS__URL","text":"For Redis the URL is given in following format:
redis://HOST:PORT/NUMBER\n
For example:
redis://redis:6379/0\n
Note
Both web_app and worker must have same PAPERMERGE__REDIS__URL
Papermerge supports multiple search engine backends. Currently two backends are available:
Search engine backend to use. For Solr format is:
solr://HOST:PORT/INDEX\n
Example: solr://solr:8983/pmg-index
For Xapian URL format is xapian:///PATH
. Example: xapian:////index_db
- in other words, xapian will store all index data in /index_db
folder.
Default value for PAPERMERGE__SEARCH__URL
is xapian:////index_db
Note
Both web_app and worker must have same PAPERMERGE__SEARCH__URL
Required . Unique secret key. The secret key must be a large random value and it must be kept secret. This option does not have default value, you always need to supply a value for it. Secret key is used to sign JWT tokens.
Example as environment variable:
PAPERMERGE__SECRET__SECRET_KEY=asjrijfpHHJH00huge00secret00QMNB344GHOOooaq\n
"},{"location":"setup/add-ocr-langs/","title":"Add OCR Languages","text":"By default the Papermerge Docker image includes English, German, French, Italian, Spanish, Romanian and Portugues OCR languages.
You can install extra languages by creating a new docker image from base papermerge/papermerge
.
Create new docker file with following content:
FROM papermerge/papermerge:3.0.2\n\n# add Danish and Polish OCR languages\nRUN apt install tesseract-ocr-dan tesseract-ocr-pol\n
All languages are specified in three letters code as per ISO 639-2T standard - second column in the table.
In order to build your image run:
docker build -t mypaper:3.0 -f Dockerfile .\n
Check that OCR languages were installed:
docker run -it --rm mypaper:3.0 tesseract --list-langs\n
"},{"location":"setup/ansible/","title":"Ansible","text":"Ansible playbook is available at papermerge/ansible.
Playbook will install web app, two workers, database, Redis and Solr search engine on target host. All services will be deployed as docker containers. All services will be placed behind traefik, reverse proxy which will take care of TLS certficates.
Choose one of following options:
Ansible repository does not include secrets file. Secrets file contains all sensitive (paswords, api tokens) information.
You need to create secrets filein group_vars
folder:
$ touch groups_vars/secrets\n
Place following content:
secret_key: ...\nsuperuser_password: ...\ndatabase_url: ...\ndb_pass: ...\ncloudflare_api_key: ...\ntraefik_api_password: ...\n
Of course you need to replace dots with correct passwords, secret_key etc. database_url is in secrets file because it includes password.
"},{"location":"setup/ansible/#option-1-postgresql","title":"Option 1 / PostgreSQL","text":"Make sure database_url
in your secrets files matches database related options in group_vars/all
(db_user, db_name). Also port number database_url
should match the one in db_postgres/vars/main.yml
.
database_url
should have following format:
postgresql://<user>:<pass>@db:5432/<dbname>\n
Install Papermerge DMS with PostgreSQL:
$ ansible-playbook install_1.yml -i inventory --extra-vars \"@group_vars/secrets\"\n
Application will be accessible via https:// acme_domain
is variable you set in group_vars/all
e.g. trusel.net"},{"location":"setup/ansible/#option-2-postgresql-pgbouncer","title":"Option 2 / PostgreSQL + PgBouncer","text":"
In this setup application will connect to the database via pgbouncer, this means that database_url
should point to pgbouncer.
Your database_url
should look like:
postgresql://<user>:<pass>@pgbouncer:6432/<dbname>\n
Install Papermerge DMS with PostgreSQL and PgBouncer:
$ ansible-playbook install_2.yml -i inventory --extra-vars \"@group_vars/secrets\"\n
Application will be accessible via https:// acme_domain
is variable you set in group_vars/all
e.g. trusel.net"},{"location":"setup/ansible/#option-3-mariadb","title":"Option 3 / MariaDB","text":"
For Mysql/MariaDB database_url
should have following format:
mysql://<user>:<pass>@db:3306/<dbname>\n
Install Papermerge DMS with MariaDB:
$ ansible-playbook install_3.yml -i inventory --extra-vars \"@group_vars/secrets\"\n
"},{"location":"setup/ansible/#backup","title":"Backup","text":"In order to create a backup:
$ ansible-playbook backup.yml\n
"},{"location":"setup/ansible/#restore","title":"Restore","text":"In order to restore the backup:
ansible-playbook restore.yml --extra-vars \"backup_file=/backup/backup_20_11_2023-07_33_03.tar.gz\"\n
The backup file path is the one from inside docker container.
"},{"location":"setup/ansible/#contribute","title":"Contribute","text":"papermerge/ansible assumes Debian12/Ubuntu 22.04 host.
We are happy to accept your pull requests for other hosts.
"},{"location":"setup/docker-compose/","title":"Docker Compose","text":"This section describes how to setup Papermerge using docker compose.
"},{"location":"setup/docker-compose/#web-app-worker","title":"Web App + Worker","text":"The simpliest docker compose setup for Papermerge is following:
version: \"3.9\"\n\n x-backend: &common\n image: papermerge/papermerge:3.0.2\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 12345\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: admin\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n volumes:\n - data:/db\n - index_db:/core_app/index_db\n - media:/core_app/media\n services:\n web:\n <<: *common\n ports:\n - \"12000:80\"\n depends_on:\n - redis\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n volumes:\n data:\n index_db:\n media:\n\n
You can access Papermerge user interface using any modern web browser (e.g. Firefox, Chrome). Open your web browser and point it to http://localhost:12000.
"},{"location":"setup/docker-compose/#postgresql","title":"PostgreSQL","text":"By default Papermerge uses sqlite3 database. Here is setup to which uses PostgreSQL:
version: \"3.9\"\n\n x-backend: &common\n image: papermerge/papermerge:3.0.2\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 12345\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: admin\n PAPERMERGE__DATABASE__URL: postgresql://coco:kesha@db:5432/cocodb\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n volumes:\n - index_db:/core_app/index_db\n - media:/core_app/media\n services:\n web:\n <<: *common\n ports:\n - \"12000:80\"\n depends_on:\n db:\n condition: service_healthy\n redis:\n condition: service_healthy\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n db:\n image: postgres:16.1\n volumes:\n - postgres_data:/var/lib/postgresql/data/\n environment:\n POSTGRES_PASSWORD: kesha\n POSTGRES_DB: cocodb\n POSTGRES_USER: coco\n healthcheck:\n test: pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB\n interval: 5s\n timeout: 10s\n retries: 5\n start_period: 10s\n volumes:\n postgres_data:\n index_db:\n media:\n
"},{"location":"setup/docker-compose/#solr","title":"Solr","text":"By default Papermerge uses Xapian search engine. However, for production environments, full fledged search engine like Solr is recommanded.
version: \"3.9\"\n\n x-backend: &common\n image: papermerge/papermerge:3.0.2\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: admin\n PAPERMERGE__DATABASE__URL: postgresql://coco:kesha@db:5432/cocodb\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n PAPERMERGE__SEARCH__URL: solr://solr:8983/pmg-index\n volumes:\n - media_root:/core_app/media\n depends_on:\n db:\n condition: service_healthy\n redis:\n condition: service_healthy\n\n services:\n web:\n <<: *common\n ports:\n - \"12000:80\"\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n healthcheck:\n test: redis-cli --raw incr ping\n interval: 5s\n timeout: 10s\n retries: 5\n start_period: 10s\n solr:\n image: solr:9.3\n ports:\n - \"8983:8983\"\n volumes:\n - solr_data:/var/solr\n command:\n - solr-precreate\n - pmg-index\n db:\n image: postgres:16.1\n volumes:\n - postgres_data:/var/lib/postgresql/data/\n environment:\n POSTGRES_PASSWORD: kesha\n POSTGRES_DB: cocodb\n POSTGRES_USER: coco\n\n healthcheck:\n test: pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB\n interval: 5s\n timeout: 10s\n retries: 5\n start_period: 10s\n\n volumes:\n postgres_data:\n solr_data:\n media_root:\n
"},{"location":"setup/docker-compose/#mysql-mariadb","title":"MySQL / MariaDB","text":"Here is an example of docker compose setup with MariaDB:
version: \"3.9\"\n\n x-backend: &common\n image: papermerge/papermerge:3.0.2\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: admin\n PAPERMERGE__DATABASE__URL: mysql://coco:kesha@db:3306/cocodb\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n PAPERMERGE__SEARCH__URL: solr://solr:8983/pmg-index\n volumes:\n - media_root:/core_app/media\n depends_on:\n db:\n condition: service_healthy\n redis:\n condition: service_healthy\n\n services:\n web:\n <<: *common\n ports:\n - \"12000:80\"\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n healthcheck:\n test: redis-cli --raw incr ping\n interval: 5s\n timeout: 10s\n retries: 5\n start_period: 10s\n solr:\n image: solr:9.3\n ports:\n - \"8983:8983\"\n volumes:\n - solr_data:/var/solr\n command:\n - solr-precreate\n - pmg-index\n db:\n image: mariadb:11.2\n volumes:\n - maria:/var/lib/mysql\n environment:\n MYSQL_ROOT_PASSWORD: kesha\n MYSQL_DATABASE: cocodb\n MYSQL_USER: coco\n MYSQL_PASSWORD: kesha\n ports:\n - \"3306:3306\"\n healthcheck:\n test: mariadb-admin ping -h 127.0.0.1 -u $$MYSQL_USER --password=$$MYSQL_PASSWORD\n interval: 5s\n timeout: 10s\n retries: 5\n start_period: 10s\n\n volumes:\n postgres_data:\n solr_data:\n media_root:\n
"},{"location":"setup/docker-compose/#oauth-20","title":"OAuth 2.0","text":"...
"},{"location":"setup/docker/","title":"Docker","text":""},{"location":"setup/docker/#web-app","title":"Web App","text":"The only two required environment variables are PAPERMERGE__SECURITY__SECRET_KEY
and PAPERMERGE__AUTH__PASSWORD
. To start web ui part use following command:
docker run -p 12000:80 \\\n -e PAPERMERGE__SECURITY__SECRET_KEY=abc \\\n -e PAPERMERGE__AUTH__PASSWORD=admin \\\n papermerge/papermerge:3.0.2\n
Point your web browser to http://localhost:12000
and you will see login screen:
Credentials are:
admin
admin
Note
The above docker run
starts only web UI part. For complete setup you also need one or multiple workers.
Official Papermerge docker image is available on docker hub.
"},{"location":"setup/docker/#get-docker-image","title":"Get Docker Image","text":"The recommended way to get the Papermerge docker image is via docker pull command:
docker pull papermerge/papermerge:3.0.2\n
"},{"location":"setup/docker/#web-app-worker","title":"Web App + Worker","text":"For complete setup you need to start one or multiple workers. Worker is the component which, among other things, performs OCR.
Here is minimal docker compose file with web UI and one worker:
version: \"3.9\"\n\n x-backend: &common\n image: papermerge/papermerge:3.0.2\n environment:\n PAPERMERGE__SECURITY__SECRET_KEY: 12345\n PAPERMERGE__AUTH__USERNAME: admin\n PAPERMERGE__AUTH__PASSWORD: admin\n PAPERMERGE__REDIS__URL: redis://redis:6379/0\n volumes:\n - data:/db\n - index_db:/core_app/index_db\n - media:/core_app/media\n services:\n web:\n <<: *common\n ports:\n - \"12000:80\"\n depends_on:\n - redis\n worker:\n <<: *common\n command: worker\n redis:\n image: redis:6\n volumes:\n data:\n index_db:\n media:\n
With above setup, web app is accessible on http://localhost:12000
.
To be added soon...
"},{"location":"setup/overview/","title":"Overview","text":""},{"location":"setup/overview/#web-app-and-worker","title":"Web App and Worker","text":"Papermerge consists of one web app and multiple workers (at least one). The web app is the one which you see in your browsers or interact via REST API. The worker (one or multiple) is the part which performs background tasks like OCR, updating search engine index etc.
"},{"location":"setup/overview/#database","title":"Database","text":"In order to function Papermerge needs a database, which can be one of following:
By default Papermerge uses SQLite.
"},{"location":"setup/overview/#search-engine","title":"Search Engine","text":"Papermerge supports multiple search engine backends:
Xapian is used by default.
"},{"location":"setup/overview/#ocr","title":"OCR","text":"Papermerge uses Tesseract to perform Optical Character Recognition.
"},{"location":"setup/requirements/","title":"Requirements","text":""},{"location":"setup/requirements/#software","title":"Software","text":"Papermerge is designed to run on Linux/Unix compatible system.
You need to have docker installed, as Papermerge is shipped as docker image. All docker images are stored on docker hub.
Make sure that you have docker available:
$ docker --version\n Docker version 24.0.3, build 3713ee1\n
"},{"location":"setup/requirements/#hardware","title":"Hardware","text":"Hardware specification for Papermerge depends on number of documents and users.
For one user with 1000-2000 pages a system spec with:
will do just fine.
For OCR, Papermerge uses Tesseract. OCR is very CPU intensive operation, thus more CPUs and RAM memory your system has - better. More CPU cores and more powerful the CPUs means OCR will be performed faster.
Note
GPU is not required as Tesseract runs OCR entirely on your CPU.
Testing system for Papermerge has following specs:
Papermerge supports PDF, TIFF, JPEG and PNG file formats.
PDF format is called native because Papermerge interals operate as if all documents are PDF.
TIFF, JPEG or PNG on the other hand are not native (non-native) formats.
The import of native format yields one version document - the PDF itself i.e. orignal version.
The import of any non-native formats yields two versions document:
Note
At its core Papermerge code is written to work with PDF files only. All other files (non-natives) are converted, on import, into PDF format.
"},{"location":"user/getting-started/","title":"Getting Started","text":"In this part of the documentation we define important concepts used Papermerge parlance. We highly recommend you to read and understand this section.
"},{"location":"user/getting-started/#document","title":"Document","text":"For Papermerge a document is anything which is a good candidate for archiving - some piece of information which is not editable but you need to store it for future reference. For example receipts - you don't need to edit receipts or read them everyday, but eventually you will need them for your tax declaration. In this sense - scanned documents, which are usually in PDF, JPEG or TIFF format, are perfect match.
If you take a picture of a paper document with your mobile phone - you'll have a file in jpeg format (or maybe png file format). In context of Papermerge that picture of a document (though just a single jpeg file) is a valid one page document.
On the other hand, if you take a picture of a flower and upload that jpeg image to Papermerge - the 'document' will be processed. However, that jpeg format flower image is not a document in Papermerge sense.
Usually office formats with .docx (Microsoft Word), .odt (Libre Office), .txt (plain text) are not good candidates for archiving - as by their nature they are meant to be changed/edit regularly. However, once converted to PDF format (for instance Contract_C2.docx to Contract_C2.pdf) they are full fledged documents in Papermerge sense.
Info
Papermerge works with four file formats: PDF, TIFF, JPEG and PNG.
"},{"location":"user/getting-started/#document-version","title":"Document Version","text":"One document has one or multiple versions. The original document version - is version number 1. For every change applied to the document - a new document version is created with that change applied.
When we say \"change applied to a document\" - we mean things like rotate pages, reorder pages or merge two documents.
The point of document versions is to keep track of changes applied to the document.
"},{"location":"user/getting-started/#folder","title":"Folder","text":"Folder in Papermerge is counterpart of \"folder\" concept used in major computer file manager applications (e.g. Finder in macOS). Folders in Papermerge are, intuitively enough, hierarchical - in other words one folder may contain other folders and/or documents.
"},{"location":"user/getting-started/#node","title":"Node","text":"Node is an abstraction of two concepts: document and folder. Every time you read node, you can mentally replace that term with either document or folder and the statement will still hold.
Bellow is graphical example of Folder, Document, Document Version relationship:
Same hierarchy can be illustrated as nodes:
"},{"location":"user/getting-started/#special-folders","title":"Special Folders","text":"Each user in Papermerge has two special folders: Inbox and Home.
Inbox folder is where all incoming documents land first. Home folder is where all user documents are.
Special folders are top level folders (they don't have parent folder).
Note
Both Inbox
and Home
folders are special only by convention; structurally they are just normal folders. Internally their title is actually \".inbox\" and \".home\". By convention special folders start with dot character.
OCR (Optical Character Recognition) is a technique to extract text information from binary image formats. This technique enables users to:
OCR is essential tool (or technique if you will) which helps basically to extract textual information and thus derive useful work-flows (based on document's actual content) with the documents. Papermerge relies on external open source specialized tools like Google's Tesseract OCR
"},{"location":"user/getting-started/#tags","title":"Tags","text":"Organizing documents in folders is very common. Thus the idea of keeping your documents in folders doesn't need further introduction. The idea of using tags to organize your documents may be new for you though. Tags are kind of labels. You can associate tags to a document or to a folder. Tags have a color and a name.
Once tagged, documents can be searched by their tags. Conversely, is it also possible to show all the documents tagged with a particular tag(s).
Both tags and folders complement each other and provide you with powerful means to stay organized.
"},{"location":"user/getting-started/#page-management","title":"Page Management","text":"Many times scanning documents in bulk yields documents with blank pages; some pages my be out of order, rotated, maybe part of totally different document. Even if you notices these flaws immediately it is time consuming and frustrating to redo scanning process. Papermerge helps you with your scanned documents like no other tool. With Papermerge you can reorder, rotate or even delete pages in case you need to do so.
There is a separate chapter about page management where you can learn details about this feature.
"},{"location":"user/merge-documents/","title":"Merge Documents","text":"Let's first clarify what is meant by documents merging. Merging is the process of combining two documents into one: all pages from the source document are transferred into destination document and then source document is deleted.
On the target document, transferred pages can:
The rest of this documentation chapter describes how to use Papermerge in case 1. For how to use Papermerge in case 2, see :ref:Page Moving <page_move>
.
Figure 1 illustrate this case. Both source (better_scan.pdf) and target (scan_d.pdf) documents have only one version (v1). Both source and target have two pages.
In this case merge result is that in scan_d.pdf document there is a new version created (v2) and new version contains only source pages (BS1 and BS2). Previous pages of scan_d.pdf document (D1, D2) are still available in version 1 (v1 in figure) of the document.
This use case is useful when you scan same document twice and for some reason you want to keep both copies around. Because both copies contain slightly different versions of the same document, it is more practical to keep them as two document versions in one single file. In such case you will avoid duplicate results in search results.
"},{"location":"user/merge-documents/#2-source-pages-are-appended-to-the-target-pages","title":"2. Source pages are appended to the target pages","text":"Figure 2 illustrate this case. Both source (better_scan.pdf) and target (scan_d.pdf) documents have only one version (v1). Also, both source and target have two pages.
In this case result is that in scan_d.pdf document there is a new version created (v2) and new version contains now four pages: BS1, BS2, D1, D2. Previous version of scan_d.pdf document (v1) has contains two pages: D1 and D2.
This scenario is special case of 'moving pages' between documents with all pages selected on the source. How to use Papermerge in this scenario is described in detail in Moving Pages section.
Important
When merging two documents, one of them (source) is deleted. That's why, it is very important that when you merge two documents, you correctly choose which one is the source and which one is the target.
Now, that you understand what is meant exactly by \"document merging\", let's see how you can merge document with Papermerge.
"},{"location":"user/merge-documents/#dual-panel","title":"Dual Panel","text":"In order to merge two documents in Papermerge you need to open each of them in two panels:
In one of the panels, the one which you want to be the source, right click the mouse button to open the context menu.
Important
Merge Documents context menu item will be displayed only if there are no selected pages.
In Figure 3, notice the direction of the arrow icon just before \"Merge Document\". The arrow icon points from source to the target. In Figure 3, context menu was opened in left panel, this means that document opened in left panel (better_scan.pdf) is the source. On the other hand if we would open context menu in right panel, then the arrow will point from right to left - which also implies that in such case document opened in the right panel would be the source.
Click the \"Merge Document\" context menu item. After you confirm the operation, the source document (better_scan.pdf) will be merged with scan_d.pdf.
"},{"location":"user/ocr/","title":"OCR","text":"OCR is the process which extracts text information from the scanned document and makes them searchable.
By default, ocr process is triggered automatically on document file upload. The OCR process status is indicated by little circle next to document's title. When OCR process is completed new document version is created and document becomes searchable.
"},{"location":"user/ocr/#automatic-ocr","title":"Automatic OCR","text":"By default OCR is triggered automatically when document is uploaded. However, you can disable automatic OCR triggering, in such case you can start OCR only when you consider necessary.
Important
Documents for which OCR was skipped - are not searchable!
In order to disable automatic OCR, go to User Menu -> Preferences -> OCR -> Trigger -> Manual
"},{"location":"user/ocr/#default-ocr-language","title":"Default OCR Language","text":"In order to perform OCR on the document you need to indicate beforehand the language of respective document. Choosing ocr language for each and every document uploaded is tedious - instead, in preferences a default OCR Language is set - and that language is applied for each uploaded document.
In order to set default OCR language, go to User Menu -> Preferences -> OCR -> Language
"},{"location":"user/ocr/#status-indicator","title":"Status Indicator","text":"Papermerge features real time OCR status indicator - this means that you can see document's OCR status updates as they happen (i.e. in real time). The OCR status is displayed by a small circle next to the document's title. The status indicates has following meanings:
"},{"location":"user/ocr/#ocred-text-layer","title":"OCRed Text Layer","text":"
Once OCR process completed successfully a new document version is created - version with OCRed text layer. This version is available for download from the Download
dropdown in document view.
Note
Under the hood Papermerge uses awesome OCRmyPDF utility to create OCRed text layer. Thus, in respect of OCRed text layer, Papermerge acts like a graphical user interface for OCRmyPDF.
"},{"location":"user/ocr/#document-ocred-text","title":"Document OCRed Text","text":"You can view OCRed text of the entire document either from commander or from viewer, in both cases choose \"OCRed Text\" from context menu:
If you want to see OCRed text of entire document (to be exact - all pages of the last document version) from the viewer - just make sure that no pages are selected:
"},{"location":"user/ocr/#selected-pages-ocred-text","title":"Selected Pages OCRed Text","text":"In case document has many pages and you are interested in OCRed text of one (or multiple) very specific pages, then select pages first and then from context menu choose \"OCRed Text\" item:
Note
In case there are selected pages, OCRed Text menu item will show you OCRed text ONLY of the selected pages.
"},{"location":"user/ocr/#ocr-languages-support","title":"OCR Languages Support","text":"Papermerge uses Tesseract to extract text from scanned documents. Tesseract supports over 130 languages - thus with Papermerge you can have documents in any of those languages.
"},{"location":"user/page-management/","title":"Page Management","text":"Many times scanning documents in bulk results in documents with blank pages; some pages maybe out of order or maybe part of totally different document. Even if you notice these problems immediately it is time consuming to redo scanning process. Wouldn't it be nice to fix out of order pages without scanning all docs again?
Page management is set of features which helps to fix scanning process errors. In other words you can delete, reorder, rotate, and extract pages within document(s).
Every time one of the operations described in this section is applied - a new document version is created. Because of this, the changes you apply on the document like rotate, delete, extract, reorder, do not destroy the document, in other words page management is non-destructive process.
Note
In order perform any of operations described below (delete, reorder, rotate or extract) you need to have Change Permission on respective document. You have automatically granted Change Permission on the documents you uploaded (because you own the documents uploaded by you).
"},{"location":"user/page-management/#delete","title":"Delete","text":"You can delete specific pages (for instance blank pages) from the document. Although many scanners have automatic \"remove blank pages\" feature, many times they get confused of what a blank page is. In case your scans end up with undesired blank pages you can easily remove those pages.
In order to delete a page, you need to select desired page by clicking on it, then Right Click--> Delete Page
.
Every time you delete one or several pages, document version is incremented by one. For instance if document Invoice-X56.pdf currently has four pages and the document latest version is version 1, then, after deleting one page - document latest version will be 2. Thus document's version 1 has all four pages and document version 2 has three pages:
"},{"location":"user/page-management/#reorder","title":"Reorder","text":"Out of order pages occur very often during scanning process. Papermerge empowers users to change pages order within the document.
For instance, in figure below you can see that pages 2 and 4 are out of place. To correct pages' order use drag 'n drop. For example grab page 2 and drop it in correct position, and then do same thing with page 3:
For these changes to take effect you need to click 'Apply Changes' button.
Warning
Document pages reorder will only be saved when you click 'Apply Changes'
Similarly to deleting pages, every time you save new pages order, document version will be incremented (i.e. advanced by one).
"},{"location":"user/page-management/#rotate","title":"Rotate","text":"Often scanned pages are upside down or maybe rotated 90\u00b0 (degrees). In order to quickly fix that, select one or multiple pages you want to rotate and then Right Click --> Rotate --> 180\u00b0 CCW
(or 90\u00b0 CW, 90\u00b0 CCW depending on your specific case):
Note
CW stand for clockwise. CCW stands for counter-clockwise.
Similarly to page deletion and page ordering, every time you rotate a page, document version will be incremented (i.e. advanced by one).
Warning
After page rotation you have to re-run OCR for the document. It is because if page was upside down when ingested, the OCR operation won't make sense of it and thus won't be able to extract text (and then index) from that page. After you have manually fixed the page (by correctly rotating it) - OCR will be able to extract and index page's contented.
"},{"location":"user/page-management/#move-document-to-document","title":"Move (Document to Document)","text":"You can move one, multiple or even all strayed pages from one document (source) to another (target). If you choose to move all documents from the source - the source will be deleted, because it does not make sense to have \"document with zero pages\".
When moving pages between documents you will be prompted to choose between two different move strategies:
The outcome between replace vs append strategies is illustrated below:
The difference is outcome of the B.pdf (target). With replace strategy, the document B.pdf ended up having two pages (which replaced previous ones), while with append strategy the document B.pdf ended up with four pages as source pages were appened to the existing ones.
Note
What happens if you select all source pages, i.e. when you
select A1, B1, B2, A2? In such case - source document (A.pdf) will be deleted, because it does not make any sense to have a document with zero pages. For the target document (B.pdf) this case does not make any difference, as the outcome is always the same.
Note
Use case when you select all pages and chose \"replace strategy\" has same outcome as merging documents.
Now, that \"theory\" is clear, let's move on to the practical part and see Papermerge in action. First of all, note that in Papermerge you can move pages between documents either using context menu or by using drag 'n drop.
Tip
You can also move pages between documents with REST API as well
"},{"location":"user/page-management/#use-context-menu","title":"Use Context Menu","text":"In order to move pages between documents, using context menu:
thumbnails panel <Thumbnails_Panel>
of the source document viewerContext menu is dynamic - which means it renders only relevant menu items. If for example you have in one panel opened :ref:document viewer <Viewer>
while other panel is in :ref:commander mode <Commander>
, then \"there will be \"extract\" menu item instead of \"move\". In other words, \"move\" menu item will be visible only if:
viewer mode <Viewer>
Important
The arrow next to the \"Move\" menu item changes direction depending in which panel you invoke context menu - it hints direction of the pages transfer. Arrow icon of the \"Move\" item always points from source to target.
"},{"location":"user/page-management/#drag-n-drop","title":"Drag 'n Drop","text":"In example illustrated in pictures below there are two documents:
During scan page B1 wrongly ended up in document A, although it belongs to document B.
Note
A page that during the scan ended up in wrong document is called strayed page. In example above, page B1 is strayed page.
In order to fix this scanning issue, you need open documents in two panels and then drag 'n drop page B1 from document A (source) to document B (target):
Note
Pages are moved immediately after 'mouse drop' i.e. there is no need to 'click apply button' as in re-order operation
Note
Both documents' (source and target) version will be incremented by one
"},{"location":"user/page-management/#extract-document-to-folder","title":"Extract (Document to Folder)","text":"Page extraction is moving page out of the document as completely new document. It differs from page moving <page_move>
because the destination is a folder, not a document.
You can extract one or multiple pages at once. Pages can be extracted:
Note that in Papermerge you can extract pages either using context menu or by using drag 'n drop.
Tip
You can also extract pages by using REST API
"},{"location":"user/page-management/#using-context-menu","title":"Using Context Menu","text":"In order to extract pages from the document, using context menu:
Because of the dynamic nature of the context menu, \"Extract\" menu item will be visible only if all of the following conditions are true:
viewer mode
while another is in commander mode
After you've clicked \"Extract\", the \"Extract Pages\" modal dialog will prompt you for additional details like title of the newly created document(s) and if you want to extract all pages as you or multiple documents:
A couple of notes here. First, newly created document will have extension \".pdf\", you cannot change that. Second, if \"Extract each page into separate document\" is checked, each pages will be, obviously enough, extracted as separate one page documents, otherwise all extracted pages will be placed into a single document in the target folder.
Note
Papermerge will try to make sure that newly created documents feature unique name. Thus if you choose to extract, say, two pages as separate documents, Papermerge will append to the title an UUID number. In case you choose to extract two pages into a single document - no UUID number will be appended. In case you leave \"Title format\" field empty, Papermerge will generate an unique title for you.
"},{"location":"user/page-management/#using-drag-n-drop","title":"Using Drag 'n Drop","text":"Let's show how page extraction works by example. Say we have one document - document A - with following pages: A1, A2, B1, B2, A3. What we want to do is to extract pages B1 and B2 into a new document. As mentioned above there are two cases:
In order to extract pages B1 and B2 into one single new document you need to uncheck 'Extract each page into separate document' checkbox in modal dialog:
Similarly to other operations document A's (source document) version is incremented by one.
"},{"location":"user/page-management/#ocr-data","title":"OCR Data","text":"Do you need re-run OCR after document's page was moved/rotated/extracted/deleted ?
In short - no, you don't need to re-run OCR. The only exception is page rotation. Every time you rotate a page in the document, you need to re-run OCR for that document. It actually makes sense, because if page was upside down when document was ingested, the OCR operation won't make much sense of it and thus won't be able to extract any text data from the page. Once you correct that part manually (rotate page), you re-run OCR so that correct text will be extracted and then indexed.
Note
Generally speaking you don't need to re-run OCR after performing page management operations. The only exception from this rule is page rotation.
For longer answer, let's clarify first what OCR data is. OCR data is: text information extracted from the document by OCR and associated with that document. That text information is stored in both database and on filesystem.
When one page is moved from one document into another (or when page is deleted), the text associated with source (or target) document changes as well. For example, say document fruits.pdf has three pages: apples, oranges and bananas, i.e each page has only one word page 1 has work apples etc. You can find document fruits.pdf by searching 'apples' (will match first page), 'orages' (will match second page) or bananas (will match last page).
After you extract first page (apples) from document fruits.pdf into another document, searching by term 'apples' should not reveal document 'fruits.pdf' - because term/page 'apples' is not part of it anymore.
In order to keep text information associated with document fruits.pdf up to date, there are at least two possibilities:
From technical point of view 1. is very easy to implement but very inefficient in terms of computing power. Think that you have 100 pages document and you delete one blank page - what a waste of CPU resources to re-OCR entire document when OCR data is already available!
The second possibility (point 2.) is very challenging to implement, but extremely efficient - you need to run OCR on the document only once (maybe twice, in case you decide to fix couple of pages by rotating them).
Papermerge decided on 2. in other words, Papermerge reuses already extracted OCR data and updates it accordingly every time you re-order/move/extract/delete pages.
The result is that whatever page management operation you perform the search results are always up-to-date without the need to re-OCR the document! As mentioned above the only exception are page rotations.
Below is illustrated the case of three page fruits.pdf document with apples/oranges/bananas content. Initially search term 'apples' will reveal fruits.pdf document (from Inbox). After 'apples' page was extracted into separate document (found in Home/My Documents folder) search term 'apples' correctly reveals new document! Notice here that search index is updated instantaneously:
"},{"location":"user/search/","title":"Search","text":"Papermerge offers an extensive searching mechanism that is designed to allow you to quickly find a document you're looking for.
When you search Papermerge for a document, it tries to match this query against your documents. Papermerge will look for matching documents by inspecting their content, title, and tags.
Note
Papermerge searches only in content of the last version of the document
By default, Papermerge returns only documents which contain all words typed in the search bar. However, Papermerge also offers additional search syntax if you want to drill down the results further.
Matching inexact words:
*5951\n
Will return document with title: brother_005951.pdf
Matching specific tags:
tags:paid\n
will return documents with tag \"paid\"
"},{"location":"user/tags/","title":"Tags","text":"Tags are sort of labels. You can associate tags to a document or to a folder. Tags have a color and a name. Once tagged, documents can be searched by their tags.
"},{"location":"user/upload-documents/","title":"Upload Documents","text":"There are multiple ways to upload documents to your Papermerge instance: web UI, command line utilities, REST API.
The obvious way is via web UI. After briefly explaining how to upload documents from web user interface, this page will dive into more interesting parts: command line utilities and REST API.
"},{"location":"user/upload-documents/#web-ui","title":"Web UI","text":"Uploading documents via user interface is the most straightforward method, just click upload
button:
Documents will be uploaded into your current folder. Current folder is considered the one which you currently see as opened in web UI:
Also, instead of using upload button, you can drag'n drop documents from your desktop file manager into Papermerge's web ui.
Warning
Currently drag 'n drop feature does not work for folders, in other words you can drag 'n drop only documents. If you want to import an folders with entire content preserved - use papermerge-cli
described in next paragraph.
You can upload documents and folders from your local filesystem using papermerge-cli command line utility:
papermerge-cli import /path/to/local/folder/\n
Note that papermerge-cli
will import all content of /path/to/local/folder/ directory recursively i.e. it will preserve the structure of local folder in Papermerge as well.
You can upload one document by providing path to the document:
papermerge-cli import /path/to/document.pdf\n
Note
By default all imported documents and folders will end up inside user's Inbox folder.
For more information about papermerge-cli
check papermerge-cli section.
For uploading documents you can use directly REST API. You can access REST API swagger schema definition from user menu (upper right corner of the web UI). In order to upload a document there are two steps:
For step 1. use POST /nodes/
REST API endpoint. For step 2 use POST /documents/<doc-uuid>/upload
REST API endpoint, where <doc-uuid>
is the ID of the node created in first step.
Papermerge comes with a simple and intuitive user interface (UI) layout. The UI is divided into four areas:
Commander (in figure 1. marked with number 4.) is designed to have similar look and feel of modern desktop file browsers. This is the place where you browse your documents and folders.
In order to assist you to quickly move around documents, folders and pages - there is a special mode - dual panel model. In dual panel mode there are two panels displayed side by side. Between two panels documents (as well as folders and pages) can be moved with one simple drag'n drop. Figure below shows how dual panel mode looks like:
In order to switch to dual panel mode, use Commander's upper right button:
To switch back to single panel mode, use close button - which is in the upper right corner of one of the panels:
Note
Close button will be displayed only on one of the panels. Although both panels look and feel exactly the same, internally application still distinguishes them as main and secondary one. Main panel is the one which is always visible and secondary panel is the on which opens and closes i.e. the one with \"close button\" in upper right corner.
"},{"location":"user/user-interface/#commander","title":"Commander","text":"Commander or Commander Panel is one of the two available panels. Commander is the panel which shows documents and folders - modern web based file browser if you will.
"},{"location":"user/user-interface/#viewer","title":"Viewer","text":"Viewer or Viewer Panel or Document Viewer is one of the two available panels. Viewer is the panel in which document is opened.
There can be two Viewers opened side by side. This mode (i.e. dual panel mode with a Viewer in each panel) is very handy when it comes to moving pages between documents.
"},{"location":"user/user-interface/#thumbnails-panel","title":"Thumbnails Panel","text":"Document viewer features a thumbnails panel which can be toggles on and off. Pages can be selected only inside thumbnails panel; also pages can be dragged/dropped only from thumbnails panel.
"},{"location":"user/user-management/","title":"User Management","text":"Papermerge is multi user system.
The most privileged user (which has all permissions) is called superuser
. Brand new Papermerge instance ships with one default user - which happens to be superuser
. With default user, you can add as many users as you wish.
Note
You can add multiple superusers
as well
Papermerge is a non-destructive DMS, which means you always have available original document regardless how many transformations (page rotations, deletion, document merges) you apply on the document.
Retention of the original is ensured because of document versioning feature. With each extra transformation you apply - a new document version is created.
Version 1 (one) of the uploaded document is the original file i.e. document without any changes applied. Original document version is always available, regardless what operation(s) you apply to the document (except deletion of the document itself).
Any page management OCR operation on the document will increment (increase by one) its version.
"}]} \ No newline at end of file diff --git a/3.0/setup/add-ocr-langs/index.html b/3.0/setup/add-ocr-langs/index.html index 13eeec5b..a948571c 100644 --- a/3.0/setup/add-ocr-langs/index.html +++ b/3.0/setup/add-ocr-langs/index.html @@ -1228,7 +1228,7 @@By default the Papermerge Docker image includes English, German, French, Italian, Spanish, Romanian and Portugues OCR languages.
You can install extra languages by creating a new docker image from base papermerge/papermerge
.
Create new docker file with following content:
-FROM papermerge/papermerge:3.0.1
+FROM papermerge/papermerge:3.0.2
# add Danish and Polish OCR languages
RUN apt install tesseract-ocr-dan tesseract-ocr-pol
diff --git a/3.0/setup/docker-compose/index.html b/3.0/setup/docker-compose/index.html
index 4fc74ddc..a6ab39f5 100644
--- a/3.0/setup/docker-compose/index.html
+++ b/3.0/setup/docker-compose/index.html
@@ -1339,11 +1339,11 @@ Web App + Worker
version: "3.9"
x-backend: &common
- image: papermerge/papermerge:3.0.1
+ image: papermerge/papermerge:3.0.2
environment:
PAPERMERGE__SECURITY__SECRET_KEY: 12345
- PAPERMERGE__AUTH__USERNAME: john
- PAPERMERGE__AUTH__PASSWORD: hohoho
+ PAPERMERGE__AUTH__USERNAME: admin
+ PAPERMERGE__AUTH__PASSWORD: admin
PAPERMERGE__REDIS__URL: redis://redis:6379/0
volumes:
- data:/db
@@ -1365,6 +1365,7 @@ Web App + Worker
data:
index_db:
media:
+
You can access Papermerge user interface using any modern web browser (e.g. Firefox, Chrome).
Open your web browser and point it to http://localhost:12000.
@@ -1374,13 +1375,13 @@ PostgreSQL
version: "3.9"
x-backend: &common
- image: papermerge/papermerge:3.0.1
+ image: papermerge/papermerge:3.0.2
environment:
- PAPERMERGE__SECURITY__SECRET_KEY: 12345
- PAPERMERGE__AUTH__USERNAME: john
- PAPERMERGE__AUTH__PASSWORD: hohoho
- PAPERMERGE__DATABASE__URL: postgresql://scott:tiger@db:5432/mydatabase
- PAPERMERGE__REDIS__URL: redis://redis:6379/0
+ PAPERMERGE__SECURITY__SECRET_KEY: 12345
+ PAPERMERGE__AUTH__USERNAME: admin
+ PAPERMERGE__AUTH__PASSWORD: admin
+ PAPERMERGE__DATABASE__URL: postgresql://coco:kesha@db:5432/cocodb
+ PAPERMERGE__REDIS__URL: redis://redis:6379/0
volumes:
- index_db:/core_app/index_db
- media:/core_app/media
@@ -1390,21 +1391,29 @@ PostgreSQL
ports:
- "12000:80"
depends_on:
- - redis
- - db
+ db:
+ condition: service_healthy
+ redis:
+ condition: service_healthy
worker:
<<: *common
command: worker
redis:
image: redis:6
db:
- image: bitnami/postgresql:14.4.0
+ image: postgres:16.1
volumes:
- postgres_data:/var/lib/postgresql/data/
environment:
- POSTGRES_USER: scott
- POSTGRES_PASSWORD: tiger
- POSTGRES_DB: mydatabase
+ POSTGRES_PASSWORD: kesha
+ POSTGRES_DB: cocodb
+ POSTGRES_USER: coco
+ healthcheck:
+ test: pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB
+ interval: 5s
+ timeout: 10s
+ retries: 5
+ start_period: 10s
volumes:
postgres_data:
index_db:
@@ -1413,109 +1422,139 @@ PostgreSQL
Solr
By default Papermerge uses Xapian search engine. However, for
production environments, full fledged search engine like Solr is recommanded.
-version: "3.9"
-
-x-backend: &common
- image: papermerge/papermerge:3.0.1
- environment:
- PAPERMERGE__SECURITY__SECRET_KEY: 12345
- PAPERMERGE__AUTH__USERNAME: john
- PAPERMERGE__AUTH__PASSWORD: hohoho
- PAPERMERGE__DATABASE__URL: postgresql://scott:tiger@db:5432/mydatabase
- PAPERMERGE__REDIS__URL: redis://redis:6379/0
- PAPERMERGE__SEARCH__URL: solr://solr:8983/pmg-index
- volumes:
- - media:/core_app/media
+ version: "3.9"
-services:
- web:
- <<: *common
- ports:
- - "12000:80"
- depends_on:
- - redis
- - db
- - solr
- worker:
- <<: *common
- command: worker
- redis:
- image: redis:6
- db:
- image: bitnami/postgresql:14.4.0
- volumes:
- - postgres_data:/var/lib/postgresql/data/
+ x-backend: &common
+ image: papermerge/papermerge:3.0.2
environment:
- POSTGRES_USER: scott
- POSTGRES_PASSWORD: tiger
- POSTGRES_DB: mydatabase
- solr:
- image: solr:9.3
- ports:
- - "8983:8983"
+ PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret
+ PAPERMERGE__AUTH__USERNAME: admin
+ PAPERMERGE__AUTH__PASSWORD: admin
+ PAPERMERGE__DATABASE__URL: postgresql://coco:kesha@db:5432/cocodb
+ PAPERMERGE__REDIS__URL: redis://redis:6379/0
+ PAPERMERGE__SEARCH__URL: solr://solr:8983/pmg-index
volumes:
- - solr_data:/var/solr
- command:
- - solr-precreate
- - pmg-index
-
-volumes:
- postgres_data:
- solr_data:
- media:
+ - media_root:/core_app/media
+ depends_on:
+ db:
+ condition: service_healthy
+ redis:
+ condition: service_healthy
+
+ services:
+ web:
+ <<: *common
+ ports:
+ - "12000:80"
+ worker:
+ <<: *common
+ command: worker
+ redis:
+ image: redis:6
+ healthcheck:
+ test: redis-cli --raw incr ping
+ interval: 5s
+ timeout: 10s
+ retries: 5
+ start_period: 10s
+ solr:
+ image: solr:9.3
+ ports:
+ - "8983:8983"
+ volumes:
+ - solr_data:/var/solr
+ command:
+ - solr-precreate
+ - pmg-index
+ db:
+ image: postgres:16.1
+ volumes:
+ - postgres_data:/var/lib/postgresql/data/
+ environment:
+ POSTGRES_PASSWORD: kesha
+ POSTGRES_DB: cocodb
+ POSTGRES_USER: coco
+
+ healthcheck:
+ test: pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB
+ interval: 5s
+ timeout: 10s
+ retries: 5
+ start_period: 10s
+
+ volumes:
+ postgres_data:
+ solr_data:
+ media_root:
MySQL / MariaDB
Here is an example of docker compose setup with MariaDB:
-version: "3.9"
+ version: "3.9"
-x-backend: &common
- image: papermerge/papermerge:3.0.1
- environment:
+ x-backend: &common
+ image: papermerge/papermerge:3.0.2
+ environment:
PAPERMERGE__SECURITY__SECRET_KEY: 1234 # top secret
- PAPERMERGE__AUTH__USERNAME: eugen
- PAPERMERGE__AUTH__PASSWORD: 1234
- PAPERMERGE__DATABASE__URL: mysql://myuser:mypass@db:3306/paperdb
+ PAPERMERGE__AUTH__USERNAME: admin
+ PAPERMERGE__AUTH__PASSWORD: admin
+ PAPERMERGE__DATABASE__URL: mysql://coco:kesha@db:3306/cocodb
PAPERMERGE__REDIS__URL: redis://redis:6379/0
PAPERMERGE__SEARCH__URL: solr://solr:8983/pmg-index
- volumes:
- - media_root:/core_app/media
- depends_on:
- - redis
- - solr
- - db
-
-services:
- web:
- <<: *common
- ports:
- - "11000:80"
- worker:
- <<: *common
- command: worker
- redis:
- image: redis:6
- solr:
- image: solr:9.3
- ports:
- - "8983:8983"
volumes:
- - solr_data:/var/solr
- command:
- - solr-precreate
- - pmg-index
- db:
- image: mariadb:11.2
- volumes:
- - maria:/var/lib/mysql
- environment:
- MYSQL_ROOT_PASSWORD: mypass
- MYSQL_DATABASE: paperdb
- MYSQL_USER: myuser
- MYSQL_PASSWORD: mypass
-volumes:
- maria:
- solr_data:
- media_root:
+ - media_root:/core_app/media
+ depends_on:
+ db:
+ condition: service_healthy
+ redis:
+ condition: service_healthy
+
+ services:
+ web:
+ <<: *common
+ ports:
+ - "12000:80"
+ worker:
+ <<: *common
+ command: worker
+ redis:
+ image: redis:6
+ healthcheck:
+ test: redis-cli --raw incr ping
+ interval: 5s
+ timeout: 10s
+ retries: 5
+ start_period: 10s
+ solr:
+ image: solr:9.3
+ ports:
+ - "8983:8983"
+ volumes:
+ - solr_data:/var/solr
+ command:
+ - solr-precreate
+ - pmg-index
+ db:
+ image: mariadb:11.2
+ volumes:
+ - maria:/var/lib/mysql
+ environment:
+ MYSQL_ROOT_PASSWORD: kesha
+ MYSQL_DATABASE: cocodb
+ MYSQL_USER: coco
+ MYSQL_PASSWORD: kesha
+ ports:
+ - "3306:3306"
+ healthcheck:
+ test: mariadb-admin ping -h 127.0.0.1 -u $$MYSQL_USER --password=$$MYSQL_PASSWORD
+ interval: 5s
+ timeout: 10s
+ retries: 5
+ start_period: 10s
+
+ volumes:
+ postgres_data:
+ solr_data:
+ media_root:
OAuth 2.0
...
diff --git a/3.0/setup/docker/index.html b/3.0/setup/docker/index.html
index dd7b2e2e..8fce6e1c 100644
--- a/3.0/setup/docker/index.html
+++ b/3.0/setup/docker/index.html
@@ -1323,17 +1323,17 @@ Web App
The only two required environment variables are
PAPERMERGE__SECURITY__SECRET_KEY
and PAPERMERGE__AUTH__PASSWORD
. To start
web ui part use following command:
-docker run -p 9400:80 \
+docker run -p 12000:80 \
-e PAPERMERGE__SECURITY__SECRET_KEY=abc \
- -e PAPERMERGE__AUTH__PASSWORD=123 \
- papermerge/papermerge:3.0.1
+ -e PAPERMERGE__AUTH__PASSWORD=admin \
+ papermerge/papermerge:3.0.2
-Point your web browser to http://localhost:9400
and you will see login screen:
+Point your web browser to http://localhost:12000
and you will see login screen:
Credentials are:
- username
admin
-- password
123
+- password
admin
Note
@@ -1345,7 +1345,7 @@ Official Docker Image
Get Docker Image
The recommended way to get the Papermerge docker image is via
docker pull command:
-docker pull papermerge/papermerge:3.0.1
+docker pull papermerge/papermerge:3.0.2
Web App + Worker
For complete setup you need to start one or multiple workers.
@@ -1354,11 +1354,11 @@
Web App + Worker
version: "3.9"
x-backend: &common
- image: papermerge/papermerge:3.0.1
+ image: papermerge/papermerge:3.0.2
environment:
PAPERMERGE__SECURITY__SECRET_KEY: 12345
- PAPERMERGE__AUTH__USERNAME: john
- PAPERMERGE__AUTH__PASSWORD: hohoho
+ PAPERMERGE__AUTH__USERNAME: admin
+ PAPERMERGE__AUTH__PASSWORD: admin
PAPERMERGE__REDIS__URL: redis://redis:6379/0
volumes:
- data:/db
diff --git a/3.0/sitemap.xml b/3.0/sitemap.xml
index 63c2fa0a..5c3a5baa 100644
--- a/3.0/sitemap.xml
+++ b/3.0/sitemap.xml
@@ -2,177 +2,177 @@
https://docs.papermerge.io/3.0/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/backup-restore/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/changelog/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/cli/cli/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/cli/overview/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/contributor/docker/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/rest-api/overview/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/rest-api/reference/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/rest-api/token/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/settings/auth/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/settings/database/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/settings/main/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/settings/ocr/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/settings/overview/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/settings/redis/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/settings/search/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/settings/security/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/setup/add-ocr-langs/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/setup/ansible/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/setup/docker-compose/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/setup/docker/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/setup/kubernetes/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/setup/overview/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/setup/requirements/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/user/file-formats/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/user/getting-started/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/user/merge-documents/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/user/ocr/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/user/page-management/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/user/search/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/user/tags/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/user/upload-documents/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/user/user-interface/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/user/user-management/
- 2024-01-18
+ 2024-01-21
daily
https://docs.papermerge.io/3.0/user/versions/
- 2024-01-18
+ 2024-01-21
daily
\ No newline at end of file
diff --git a/3.0/sitemap.xml.gz b/3.0/sitemap.xml.gz
index a70cb9201b8f168a3d210ec732b55bf7a8badbcb..0dc0a39e568acf3f1df3472fe9f31a7a3471e10a 100644
GIT binary patch
literal 474
zcmV<00VVz)iwFn?2d!lS|8r?{Wo=<_E_iKh0M(eyZsQ;jfbV^Z$ajpB?V-DJ+}l1u
z`v7AY+b9^=0h;aG7u!*q^Bz`XBpZ7)j6MzXgJ@s>&i3$u6peT1>R30`0iE*Nxba-Q
zzx@_3)ob_EX0j2}p(wj}uBMd2FDb9rYaLC(8A7Z*jWVjK{6u*Z%BKEVbx(&f76aSY
zI$G;hw^)hAjP@$l0RqyDG-7SM{82Zum5b8j8i;B3x;r(eXVDzR=~z^5vIjLafg5ok
z{L^VL_Oq1_k_|@QY5sw$2X)Co5XR&Q*XQ9@1=e@YDrdWguMbOy77UH6K1
z0Li^9-AM_xiTp7m0;y>aoGOz^pPWf^kf60Fi6ks6(9Bs~G`&UUl-q$xC_e`u_W{cs
z@{)TbhY7g{=pKrF!7RqIA)L`Az8=l7#N}Wtiow$i+)2l+?u(d!5^CPCt3(uyyn`6v
ze9T(`0~BY@;L5bPlQ%5$iE{rrQNj&)NV&?&Et8&B2S
z-#_A|dhH(COg3UV6lE7r)s#~BE#>8MsiR3aLx{DfQARbDpD3?F+0?(P?%`0zVqp7P
zM{B+67Avus(O$(mKtP(2My!pOPjw?(xhUPQftY5myT|77Su{s+d?_kd*`1o2z>PQ%
zzIPgo{c7c%WP_16nt$NxL7j6DgfV$SwwRkW>qowX)~pX_K3`Jf!{CK4q<4u-pGzz*
zDrY3Ns7uBO2s?10Rk5pPpl#tWJpHf4D9wPE5ox$v2*S(@0
zKyoijH&Q}vB7e+?Kx*0pr^;l~7iZEOBxo&4A_)r%G;>xLO>dDo<#u2a%Fn^aeZVq@
zJm((CVM6W!x`$$4FpIHl2xoMOuSat%aXA=^V(>HrchYgI`ywWwgqk<(DiK8^?;r*^
zAM;kg0L7UzxH2v73YTeAGUv;
QO84%*0sXX*id+`}043Pr00000