Skip to content

Tutorial for annotators

Neil Ashton edited this page Jun 7, 2013 · 9 revisions

Filtering and annotating

Your task as an annotator is to peruse, filter, and annotate the hits that ezra has found for a given target. This tutorial will lead you through the process of finding and processing a hit.

The index of hits for a target is accessed via the Target page. Each particular hit can be inspected and edited by means of the annotation screen. This screen contains tools to listen to audio associated with the hit, edit the audio's transcription, sort the hit into a category, and give it values for features.

Perusing hits: the target and hit indices

Once you log in, you can view active targets by clicking on the Targets menu, which will take you to an index of targets. Each target is listed with its name and total number of hits, as well as counts of hits in each of four categories.

Clicking on any of the category counts will take you to a paginated index of hits of that category for that target. Click either the name of a target or the number under Total, which will take you to that target's full hit index.

The hit index lists the hits (of the chosen category) for the chosen target. It also displays the category counts for that target and lists the features which are associated with the target.

The list entry for each hit includes its ID, target, "confirmed" status, and "flagged" status.

To work with a hit, click on the box next to its ID. This will take you to the annotation screen for that hit.

Using the annotation screen

Working with hits consists of listening to their audio, correcting their transcripts, categorizing them, giving them values for features, and saving your edits. Let's look at each of these steps individually.

Listening to audio

The most interesting thing about a hit to you, the annotator, is the audio associated with it. This audio can be listened to using the audio player immediately below the name of the hit.

To listen to the entire associated audio file, click the "play" button at the far left of the player. To pause playback, click the "pause" button. To skip to a different point in the audio file, click anywhere within the player's progress bar.

The audio player can be controlled by keyboard shortcuts. You can see the full selection of keyboard shortcuts by clicking the Keyboard shortcuts text below the player, which will bring up a help display. Using the mouse to navigate audio files gets tiring very fast, so we recommend using the keyboard shortcuts as much as possible!

Finding and transcribing the selection

Each hit's audio file has a specially marked portion called the selection. This represents the small window around the point in the audio file where ezra thinks the target has been found. The range of the current selection is given in the two rows of edit boxes start and end. To listen to the current selection, click the selection button to the left of the audio player. The player will play the selected portion of the audio, stopping at its end point.

The same edit boxes which indicate the start and end of the selection range can also be used to edit it. Changing the values in those boxes adjusts the selection accordingly. You can also press the start or end button next to either row to set its values to match the current playback point in the audio player.

The text box above the selection ranges is the transcript. It represents an attempt at a transcription of the contents of the selection. It will often be ungrammatical or simply wrong. One of your jobs as an annotator is to make sure that the transcript is correct, so you should edit the transcript to accurately reflect the content of the selection.

Often, you'll just want to make sure that the audio selection lines up with the words in the transcript. To do this, you only really need to listen to the start and the end of the selection. To hear the first or last two seconds of the selection, click the start 2s or last 2s buttons, respectively, to the left of the audio player.

Categorizing the hit

The most interesting thing about a hit, as far as project supervisors are concerned, is whether or not it is actually a hit. Once you have listened to the audio, you are in a position to judge this.

A hit belongs to one of four categories. It begins life as an Unconfirmed hit, one whose status is unknown. A hit whose audio has been verified to contain the target is a Confirmed hit; one whose audio definitely does not contain the target is a Not present hit. If a hit is the same as another hit in the system, it is a Duplicate hit.

You can register your judgment of the hit's category by clicking on one of the four colored boxes to the right of the hit's ID. When you move your cursor over a box, a tooltip will pop up, telling you what category the box represents.

A hit can also be flagged. Flagging a hit is a generic way to indicate that there is something wrong with it or that it is otherwise in need of attention. To flag a hit, check the box labeled Flag to the right of the colored category boxes.

More information about why a hit is unusual can be recorded in the edit box labeled Notes, if necessary. You do not have to flag a hit to edit the Notes box.

Valuing features

Each target is associated with features, which are potential properties of hits that are of interest in the research project. A feature can take on any of a number of values for a given hit. Once you have listened to the hit's audio, you are able to assess the correct values for the hit's features.

Each hit can take on values for features that are associated with its target. The features associated with the current hit's target are listed on the hit index for the target, but they're also listed at the bottom of the annotation screen for each hit. Each feature is listed there with its name, the date of its creation, the name of its creator, a question giving the meaning of the feature, and finally an interactive means of choosing the feature's value.

The particular means of choosing the feature's value depends on which of three types the feature belongs to. The first type is the single-value feature: each hit may have at most one value for the feature, drawn from a finite set. Its values are selected by means of a set of radio buttons. The second type is the multi-value feature, for which each hit may have multiple values, selected using check boxes. The last and least common type is the string-valued feature, whose values are arbitrary strings specified with an edit box.

To value a feature, simply use whatever means is provided for choosing its value. For single- or multi-value features, click the button or box next to the appropriate value(s). For string-valued features, click on the edit box and enter the appropriate text.

Wrapping up

Once you've corrected the transcript, assigned a category, and valued the features, your annotation work on this hit is done. You can finish up by clicking the Save button at the very top of the annotation screen. This will display a message letting you know that your changes have been saved. If you have made a mistake and wish to undo the changes you have made, press the Cancel button at the right of the row of buttons that contains Save.

As an annotator, you will not usually be finished perusing hits after a single hit. To move on to the next hit, press the Next button, located to the right of the Save button. To move on to the next unconfirmed hit, which will usually be the hit you are interested in viewing next, press the Next Unconfirmed button.