PDFAnno is a browser-based linguistic annotation tool for PDF documents.
It offers functions for annotating PDF with labels and relations.
For natural language processing, it is suitable for development of gold-standard data with named entity spans, dependency relations, and coreference chains.
Online Demo (version 0.2)
It is highly recommended to use the latest version of Chrome. (Firefox will also be supported in future.)
You can also install PDFAnno via npm:
npm install pdfanno
- Simple and easy-to-use interface.
- No installation is required.
- Client-only application, i.e., no communication with a server.
- Visit the online demo with the latest version of Chrome.
- Put PDF files and annotation files (if any) in your directory, then specify the directory via
Browse
button.
Sample PDFs and annotations are downloadable from here. - Load the target PDF. If you have anno file for the PDF, load it as well.
- Annotate the PDF as you like.
- Save your annotations via
button.
If you continue the annotation, respecify your directory viaBrowse
button to reload the PDF and anno file.
For security reasons, PDFAnno does NOT automatically save your annotations.
Don't forget to download your current annotations!
Span highlighting. It is disallowed to cross page boundaries.
One-way relation. This is used for annotating dependency relation between spans.
Link relation. If you want to add non-directional relation between spans, use this. This is also useful for grouping multiple spans.
Rectangle. It is disallowed to cross page boundaries.
In PDFAnno, the annotation file (.anno) follows TOML format.
Here is an example of anno file:
version = 0.2
[1]
type = "span"
page = 1
position = [[95.818, 252.977, 181.761, 10.909], [95.818, 264.806, 107.136, 10.909]]
label = "label-1"
[2]
type = "span"
page = 1
position = [[323.863, 230.715, 213.988, 11.590], [313.125, 244.522, 224.829, 10.795]]
label = "label-2"
[3]
type = "rect"
page = 1
position = [323.863, 230.715, 213.988, 11.590]
label = "label-3"
[4]
type = "relation"
dir = "two-way"
ids = ["1", "2"]
label = "label-4"
where position
indicates (x, y, width, height)
of the annotation.
To support multi-user annotation, PDFAnno allows to load reference anno file
.
For example, if you create a.anno
and an another annotator creates b.anno
for the same PDF, load a.anno
as usual, and load b.anno
as a reference file. Then PDFAnno renders a.anno
and b.anno
with different colors each other. Rendering more than one reference file is also supported.
This is useful to check inter-annotator agreement and resolving annotation conflicts.
Note that the reference files are rendered as read-only.
PDFAnno is built upon pdf.js for PDF viewer. We implement custom layers for rendering annotations on pdf.js.
First, install Node.js and npm. The version of Node.js must be 6+.
Then, run the following commands:
npm install
npm run anno:publish
where the output is on docs/
, and you can access PDFAnno via docs/index.html
.
For developing,
npm run anno:watch
This command starts Webpack Dev Server and you can access http://localhost:8080/dist/index.html in your browser.
MIT