Skip to content

Latest commit

 

History

History
244 lines (179 loc) · 9.35 KB

README.md

File metadata and controls

244 lines (179 loc) · 9.35 KB

DALPHI

Build Status Dependency Status codebeat badge Codacy Badge Code Climate Issue Count Test Coverage

DALPHI DALPHI - Active Learning Platform for Human Interaction

Introduction

DALPHI helps you to build and maintain your annotated data for machine learning tasks. It is completely agnostic regarding the document content of your data to allow for a wide range of labeling problems. Internally, each document is treated as a blob, only the services you provide understand its content. These JSON-based services define the machine learning problem you want to solve.

Due to its rather early stage, the communication protocol between DALPHI and the external services is still evolving and changes constantly. This is just to give you an idea of how the process works. Currently, the following endpoints must be provided by your service in order to run DALPHI:

  • iterate

    • Input: The whole corpus including all labeled and unlabeled documents
    • Output: A list of AnnotationDocuments
  • merge

    • Input: One raw CorpusDocument and one or more corresponding AnnotationDocuments
    • Output: The CorpusDocument containing the merged feedback from the AnnotationDocuments

You can think of an AnnotationDocument as a question to the human annotator. This may be a simple closed question like "Is there a cat on the picture?" or "Is this a valid person name?". Or it may require custom rendering asking for complex feedback. Therefore you have to register a custom HTML interface to render complex questions. We are still working on the concrete API documentation and example services.

Check also out our DALPHI product presentation, or read a paper regarding DALPHI's pre-annotation assistance system "The DALPHI annotation framework & how its pre-annotations can improve annotator efficiency" (Robert Greinacher and Franziska Horn, 2018)

Getting started

Kickstart with Docker

Start just the Ruby on Rails Webapp with

docker build -t DALPHI .
docker run -it -p 3000:3000 DALPHI

or launch the complete bundle including some example services and a worker with

docker-compose up

Starting for development

DALPHI requires Ruby 2.4.0 to work properly. With rvm it can be installed by running the following.

rvm install ruby-2.4.0
rvm use ruby-2.4.0

Get DALPHI by cloning the official repository.

git clone --recursive https://github.com/DALPHI/DALPHI.git

In the cloned repo run the bundler in order to install all dependencies.

cd DALPHI
gem install bundle
bundle install

Start the application with foreman, so that every component is started correctly.

foreman start

Creating an interface

TL;DR: You will get all annotation document's payload data as a rendered mustache.js template. You will have to write back your changes with saveChanges of inherited classes of AnnotationIteration.

Create a template to render your data in order to be annotated by users.

<h1>Paragraph Classification</h1>
<p>{{{content}}}</p>
{{#options}}
    <button
        class="btn btn-secondary"
        onclick="window.text_nominal.annotateWith('{{.}}')">
        {{.}}
    </button>
{{/options}}

It is allowed to use any valid HTML syntax in combination with the mustache.js templating language. The example template above can be evaluated with the following incoming annotation document.

{
	// ...
	"content": "My <strong>content</strong>!",
	"options": ["Yes", "No"]
	// ...
}

DALPHI will automatically render your template with the correct interface and iterate over the accessible annotation documents.

You will note the JavaScript method window.text_nominal.annotateWith in the button's onclick event. This method could look like the following and is part of a coffee class which name has to match the interface type.

class text_nominal extends AnnotationIteration
    # uncomment to overwrite interface registration at AnnotationLifecylce
    # constructor: ->
    #    # implement your registration here or call `super`

    # uncomment to overwrite standard mustache templating
    # iterate: (template, data) ->
    #    # implement your rendering here or call `super`

    annotateWith: (label) ->
        @currentData.label = label
        this.saveChanges(@currentData)

window.text_nominal = new text_nominal()

The method text_nominal.annotateWith is responsible for writing the annotated data back to the iteration's @currentData and saves it by calling this.saveChanges. It is possible to overwrite or hook to the super class' constructor and iterate method to gain full flexibility for implementing the interface.

Finally you can style your interface with all the rich features of SCSS like variables, nesting, mixins, inheritance and many more.

$white: #fff;
$green: #93b449;
$red: #c9302c;

button {
  color: $white;

  &:active,
  &:focus,
  &:hover {
    color: $white !important;
  }

  @mixin button-color-scheme($index, $base-color) {
    &:nth-of-type(#{$index}) {
      background-color: $base-color;
      border-color: darken($base-color, 10);

      &:hover {
        background-color: darken($base-color, 5);
      }

      &:active,
      &:focus {
        background-color: darken($base-color, 10);
      }
    }
  }

  @include button-color-scheme(1, $green);
  @include button-color-scheme(2, $red);
}

API Documentation

DALPHI uses Swagger 2.0 (compatible to OpenAPI) for an interactive documentation of its API. For the most straight forward experience, we ship the latest version of Swagger UI to give you everything you need to understand our API and start developing your own Services for DALPHI. After starting the application, Swagger UI will be available at http://localhost:3000/api/swagger/. The API specification JSON will be served at http://localhost:3000/api/docs.

Testing & Continuous Integration

DALPHI is developed applying the Test Driven Development paradigm. Therefore we're using RSpec to specify the expected behavior of the software. Migrate the database and run RSpec by using the following script:

./bin/test

The Continuous Integration server (Travis CI) is utilizing the following script to additionally run a set of code analyzers (Brakeman, Rails Best Practices, Reek) and linters (Slim-Lint, SCSS-Lint, CoffeeLint, RuboCop).

./bin/ci

Contributing & Citing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

If any of this code was helpful for your research, please consider citing it:

@article{greinacher2018dalphi,
  title     = {The DALPHI annotation framework \& how its pre-annotations can improve annotator efficiency},
  author    = {Greinacher, Robert and Horn, Franziska},
  journal   = {arXiv preprint arXiv:1808.05558},
  year      = {2018}
}

License

Copyright 2018 Implisense GmbH

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About Implisense

Implisense

DALPHI is maintained and funded by Implisense.

We love open source software and are hiring!