This is a Java Swing based Fuzzy Annotation Tool, to be used in Named Entity annotation process.
The latest version can be downloaded from here.
In contrast to a "normal" annotation tool, this software does not enforce the concept of entity boundaries, but instead attempts to bias the annotation in thinking that just a rough marking of the concept is enough.
The goal is to have annotations lose less time in attempting to mark of fix entity span to be on the exact entity boundaries (which can be hard to define), thus increasing annotation speed.
Created annotations are presented to the user as highlights with smooth transitions.
The tool is provided as a single .jar or .exe file to be directly executed. There is no need for installation or initial configuration.
Java 8 is required to run the tool, which can be downloaded from here. However, a complete zip package is provided in the latest release, which includes the Java runtime for Windows PCs.
For the complete zip version:
- Download the latest zip package (
FuzzyAnnotationTool_full.zip
) of the tool from here. - Extract the zip file to a folder of your choice.
- Run the
FuzzyAnnotationTool.exe
file.
For the individual .jar file:
- Download Java 8 from here and install it in your computer following the setup instructions.
- Download the latest version of the tool from here.
- Run the
FuzzyAnnotationTool.jar
file.
Windows may complain about a security warning when trying to run the .exe file, with a message "Windows protected your PC". This is because the .exe file is not signed.
To run the file anyway, click on "More info" and then on "Run anyway".
If it is not possible to run the .exe file, the .jar file can be used instead.
This software has support for two different types of annotation process, which generate different annotation outputs when exported.
In this mode, annotators must click a point in the text, preferably in the middle of an entity, to create an annotation for that position. In this mode, the entity span is not relevant.
Similar to more usual annotation tools, in this mode annotators have to highlight a text span to add an annotation. The entity boundaries do not need to be exactly marked as the highlighted annotation will have a similar size that the highlighted span.
This section describes the usage of the software and meaning of each screen element.
NOTE: User preferences are stored in the OS's registry system. Therefore, they are automatically detected and applied independently of where the executable file is located.
In this screen, the user must select what type of annotation process is going to be used (Point-wise or highlight) (In-depth explanation in Section Operation modes) and select the file containing the texts to be annotated (the file must comply with the restrictions written in Section Files, otherwise the software may not be able to parse it).
- Operation mode radio buttons: Select in which operation mode the Annotation Tool will start.
Select File
: Opens the window for XML file selection.Start
: Loads the selected file and starts the Annotation Tool in the Operation mode selected.Auto Save?
Checkbox: If checked, the annotations will be automatically saved to the selected file everytime the user changes the annotation text. NOTE: Please be aware that this option overwrites the original file when saving the annotations.
If a file containing annotations is selected as input, the tool is able to parse it and show as highlighted text spans in the Annotation Screen (given that the annotation format matches the Operation Mode selected, see the Section Output file for in depth explanation on the annotation tag schema).
This is the main annotation window, where the user can select the entities in the current text. Marked entities appear as highlighted sections in the text area. The way the selection is done changes depending on the chosen Operation mode.
Tag
box: Tagged which the entity is annotated (currently onlyC
is supported)Document
box: The number of the currently shown text document. A number can be typed to skip to the referring document. (Support to reopen in the last evaluated document is foreseen)Options
button: Open the Options screenPrev
button: Returns to the previous document.Next
button: Advances to the next document.Undo
button: Removes the last annotation. Can be pressed multiple times until all annotations are removed.Export file
button: Saves the all loaded documents with the marked annotations to an XML file. If pressed, this button will open a file selection screen to save the data to a new file. This option is useful in the case the
NOTE: Currently the only way to remove annotations is through the Undo
button. However, If the
user switches to another text or closes the application, the changes are definitely applied, and it
is impossible to remove.
This screen contains configuration parameters for the annotation highlight shown in the screen.
NOTE: This screen present advanced options which normally shouldn't be used by annotators.
NOTE2: These parameters do not affect the actual annotation saved in the output file. These only affect the way the annotation is presented to the user in the Annotation Screen.
Highlight color
: Changes the color in which the annotations are marked.Fuzziness
(Only for highlight annotation): Modifies the fade out effect of the annotation. If set all the way to the right, the annotations will present hard boundaries.Fuzzy weight
: Modifies the shown length of the fuzzy annotation span. ( Known bug: This only affects the current annotations. Annotations done after the modification of the parameter will not be affected by the span change effect)Minimum Highlight Size
(Only for point-wise annotation): Defines the smallest span length possible to be highlighted for an annotation.Maximium Highlight Size
(Only for point-wise annotation): Defines the largest span length possible to be highlighted for an annotation.Font Type
: Changes the font type used in the annotation screen.Font Size
: Changes the font size used in the annotation screen.
XML based files are used as both input and output.
The input file must contain one or more text documents wrapped in <article>
tags. The text within
the tag is recognized as one document and read keeping its original formatting.
The whole set of <article>
tags must be wrapped by a <articles>
tag to differentiate the actual
annotatable text data from xml file metadata.
A file template is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<articles>
<article>
TEXT HERE.
</article>
<article>
ANOTHER TEXT HERE.
</article>
</articles>
The file saved by this software has a similar overall format as the input file, however, the newly
produced annotations are added to the text inside the <article>
tag.
The produced tags are dependent on the Operation mode selected:
The annotated textual span is wrapped by <C
> and </C>
tags.
Example:
He was diagnosed with <C>transient angina</C> after a chest CT, 12-lead EKG, and lab draw.
The exact annotation position is marked by a <C />
tag.
Example:
He was diagnosed with transient <C/> angina after a chest CT, 12-lead EKG, and lab draw.