Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add django command for extracting annotations #387

Merged
merged 47 commits into from
Oct 2, 2023
Merged

Add django command for extracting annotations #387

merged 47 commits into from
Oct 2, 2023

Conversation

twinkarma
Copy link
Collaborator

No description provided.

ianroberts and others added 30 commits May 5, 2023 11:31
Clear pending annotations when rejecting a user from a project
… page loads on bad status and blank response
Add transition effect when document changes
…, since otherwise concurrent calls can cause a race condition and assign two different documents to the same user at the same time, causing errors later.

Fixes #374
Added an "if" option to all annotation widgets, whose value is an expression that has access to the document data _and_ the current state of annotationOutput.  Widgets are only rendered if their if expression evaluates to a truthy value.  The conditionals are re-evaluated every time the annotation data changes, so widgets can appear or disappear depending on the current state.
…t an "if", plus those with an "if" that tests true).
…orm-specific optional dependencies (otherwise tests can only run on x64 linux, not on my ARM Mac)
…ing to play whack-a-mole with the ones we want to forbid as "dangerous", instead forbid _all_ function calls by default and add special syntax for the cases that we do want to allow.

Allowing things like Object.assign is a big hole as it allows "if" expressions to pollute the global Array.prototype, etc., allowing arbitrary method calls lets the expression call mutating methods like someArray.push(), ...  Determining which functions are "safe" and which aren't is asking for trouble.  Given the point of these expressions is to determine whether or not to display other widgets, they shouldn't really need to use functions much at all.  The main classes of expressions we need to support are:

- simple single-value binary operator comparisons - annotation.something === 'value', annotation.confidence < 4, etc.
- universal and existential quantifiers over arrays of checkbox values
  - any(cb in annotation.someCheckbox, cb === 'other')
  - all(cb in annotation.someCheckbox, cb > 3)
- or over values from the document
  - any(v in document.validChoices, annotation.choice === v)

In addition regular expression matching of some kind is useful for conditions over free-text strings, and in JS this normally requires a function call (/regex/.test(str)) - we borrow syntax from Perl to convert this into a binary operation instead of a call (str =~ /regex/).

With this new syntax in place we can forbid _all_ other function and method calls.  This, along with also forbidding access to the __ob__ property (that Vue adds to all observable objects, and that has a chain of properties leading to the global "window"), should give the expressions sufficient power to be useful but not permit them to modify the document/annotation or read anything outside of the context data itself.
…tation

- "thing in X", "all(item in X, pred)" and "any(item in X, pred)" are now safe when given a null or undefined X, treating it the same as an empty array rather than throwing an error ("anything in X" is always false, "all(i in X, p)" is true, "any(i in X, p)" is false)
- "any" and "all" can now range over object properties as well as array elements
  - "any(p in someObject, p.key == 'example' || p.value > 4)" will evaluate the predicate for each key/value pair in Object.entries(someObject), each time setting p to an object with key & value properties
Note I had to add the jse-eval dependencies to docs/package.json explicitly, as the docs toolchain sees frontend/src but not transitive dependencies of the frontend module
…ts rather than just a single document, and cycle through the list each time the preview form is submitted or rejected
…rs that occur when parsing or evaluating the "if" expressions. This debug facility is only turned on for the preview mode on the project configuration page, not for anywhere where the form is presented to annotators
…s annotation data, and added discussion of error handling and how to deal with unset annotation values
enable version.py update to take version number as argument
User-level DB lock in get_annotator_task to avoid race condition
Better frontend error page when backend is down
twinkarma and others added 17 commits May 25, 2023 23:17
correct error in navbar property setting
Upgrade postgres-backup-local to version 14
…already has a docker-compose.yml and an .env file then it will detect the existing settings and offer to perform an upgrade rather than a fresh installation.
…en it is run as part of a get-teamware upgrade, since that would overwrite the variables we have just been carefully gathering from the user
The combination of .env and docker-compose.yml could be any compose-based application, if we look for the teamware-specific shell scripts as well then we can be more confident that this is genuinely a previous installation of Teamware that we are upgrading.
Support upgrades as well as fresh installs for get-teamware.sh script
…d of adding one annotator and having them do all the training, test and annotation docs, then adding the next annotator, etc. we now add all the annotators, then have them all do the training, then all do the test, then all do the annotations documents.

This is because it is possible for just five of the six annotators to between them triple-annotate the entire corpus (each annotator is allowed up to 12 documents, and 60 annotations in total would fill the corpus), causing an error when the test attempts to add a sixth annotator to the already-completed project.  Adding all the annotators up front before anyone starts working is a more realistic test.
…or was given which documents during the TestAnnotationTaskManagerTrainTestMode test cases
Rather than picking the next document for each annotator completely at random, we now prefer documents that have fewer existing annotations.  This is achieved by first sorting the list of documents by the number of COMPLETED+PENDING annotations and then randomizing only within each group, i.e. we first try (in random order) those documents with no existing annotations, then if none of those are suitable we try (again in random order) those documents with one annotation, then two, etc. until we either find a valid document to assign or run out of documents to try.  The effect of this should be that at any given time the full set of documents should be "evenly" annotated, or as close as possible if the number of completed annotation does not divide evenly into num_docs*annotations_per_doc

Fixes #372
Attempt to spread documents more evenly across annotators
Add django command for extracting annotations
@twinkarma twinkarma merged commit c05f692 into master Oct 2, 2023
10 checks passed
@github-actions
Copy link

github-actions bot commented Oct 2, 2023

Jest Coverage

File % Stmts % Branch % Funcs % Lines Uncovered Line #s
All files 83.8 83.96 64 83.8
File % Stmts % Branch % Funcs % Lines Uncovered Line #s
All files 83.8 83.96 64 83.8
_jrpc 94.11 91.66 83.33 94.11
_ index.js 94.11 91.66 83.33 94.11 29-30,38-40
_utils 81.97 82.97 57.89 81.97
_ annotations.js 97.72 73.91 100 97.72 35-36
_ dict.js 88.88 83.33 100 88.88 3-4
_ expressions.js 80.08 82.35 80 80.08 ...7-98,126-128,139-156,188-190,201-218
_ index.js 73.6 100 14.28 73.6 9-17,28-29,43-53,64-65,76-82,93-94

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants