A tool to extract citation data from EPUBs and upload it to a metadata repository.
cit-ex parses EPUB files looking for all the bibliographic references that match the html class(es) defined at prompt. These, in turn, get parsed for more granular results and then uploaded to a metadata of choice.
As of v.0.0.10, the unstructured citations are parsed to find only DOI data and the only metadata repository supported is Thoth
Firstly, you need to initialise a virtual envieronment:
$ cd path/to/the/cit-ex/folder/
$ python3 -m venv .env
Then, install the required dependencies:
(.env) $ python3 -m pip install -r requirements.txt
Given that your epub file is stored at ~/file.epub and the bibliographic references are marked with an HTML class biblio in the EPUB:
(.env) $ python3 cit-ex/main.py ~/file.epub -c biblio --dry-run
If your references are marked either as biblio or biblio2
(.env) $ python3 cit-ex/main.py ~/file.epub -c biblio biblio2 --dry-run
Make sure your login credentials are stored in the environment variables "THOTH_EMAIL" and "THOTH_PWD". You also need to know the identifier (either its DOI or UUID) of the work you with to append the citation data to.
Given that these pre-requisites are satisfied and your identifier is 10.11647/OBP.0288, you can run the command:
(.env) $ python3 cit-ex/main.py ~/file.epub -c biblio biblio2 -i 10.11647/OBP.0288 -r thoth
On top of the steps listed in "Installation", install the dev dependencies with:
(.env) $ python -m pip install -r requirements-dev.txt
The file cit-ex/obp-loader.py
is an OBP-specific wrapper to load chapter-level citations to the repository (Thoth).
It relies on each book chapter to report the URL of their HTML edition. This file is downloaded, embedded into an EPUB and finally run through cit-ex.
The wrapper runs with:
(.env) $ python3 obp-loader.py 10.11647/obp.0085
where "10.11647/obp.0085" is the DOI of the book to be parsed.
Clone the repository and build the image with:
$ docker build . -f Dockerfile-obp-loader -t openbookpublishers/cit-ex-obp-loader
Deploy a container with:
docker run --rm \
-e THOTH_EMAIL=$THOTH_EMAIL \
-e THOTH_PWD=$THOTH_PWD \
openbookpublishers/cit-ex-obp-loader \
obp-loader.py 10.11647/obp.0337
Where $THOTH_EMAIL
and $THOTH_PWD
are your thoth credentials and 10.11647/obp.0337 is the book-level DOI you wish to process.