Skip to content

Wf RO transformation service API

Soumya Brahma edited this page Feb 5, 2016 · 8 revisions

Table of Contents

Info
The service is running at http://sandbox.wf4ever-project.org/wf-ro/jobs/
*Reference implementation server: https://github.com/wf4ever/wf-ro
*Reference implementation client: https://github.com/wf4ever/wf-ro-client

API function overview

The Workflow Transformation API exposes a service that transforms workflows into research objects. Workflows are often complex data structures that embed data such as their sub-resources, annotations and provenance. The service described by this API creates a research object that exposes these data according to the RO model.

The service API allows to create a transformation job which the client can subsequently monitor. The output of the transformation job contains a list of resources aggregated in the research object that have been created based on the workflow.

API usage

Input:

  1. t2flow workflow
  2. RO identifier
  3. ROSRS URI
  4. OAuth access token
Output:
  1. HTTP Status code: 200 OK or other
Service algorithm:
  1. Extract the UUID from the workflow bundle, which is usually the same as the UUID of the main worfklow.
  2. Check if the RO exists in ROSRS:
    • if yes, do nothing. In the future, previous workflow versions may be deleted/preserved, together with their annotations.
    • if no, create a new one.
  3. Upload the workflow bundle to ROSRS, with the UUID as the resource identifier.
  4. Generate a wfdesc description of the workflow bundle (RDF graph). This describes all workflows inside the bundle, including relations between them.
  5. Upload the wfdesc description as workflow bundle annotation.
  6. Generate a roevo description of the workflow bundle (RDF graph). This includes the chains of UUIDs of all workflows in the bundle.
  7. Upload the roevo description as workflow bundle annotation.

Create a conversion job: POST /jobs/

The clients sends a transformation job parameters in a POST requests, requesting all resources to be extracted to the specified folders.

C: POST http://example.net/translate/jobs/ HTTP/1.1
C: Content-Type: application/json
C:
C: {
C:   "resource": "http://www.example.com/workflow.t2flow",
C:   "format": "application/vnd.taverna.t2flow+xml",
C:   "ro": "http://www.example.org/rodl/ROs/someRO/",
C:   "extract": {
C:       "main": "http://www.example.org/rodl/ROs/someRO/workflows/main/",
C:       "nested": "http://www.example.org/rodl/ROs/someRO/workflows/nested/",
C:       "scripts": "http://www.example.org/rodl/ROs/someRO/config/scripts/",
C:       "services": "http://www.example.org/rodl/ROs/someRO/config/web%20services/"
C:   },
C:   "token": ""47d5423c-b507-4e1c-7"
C: }

S: HTTP/1.1 201 Created
S: Location: http://example.net/translate/jobs/fefe-fefefef-fefefefefefe
The extract key and its subkeys are optional. If no extract is given, then only the main workflow is extracted, and it will be aggregated without being added to an RO folder.

How to create the conversion job in Java using org.apache.commons.httpclient.methods.PostMethod and Form data, here only extracting main workflow and scripts:

PostMethod method = new PostMethod(Constants.apiURL);    
method.addParameter("resource", t2flowURI);//"https://raw.github.com/wf4ever/provenance-corpus/master/Taverna_repository/workflow_2228_version_1/amiga_conesearch_from_a_file_of_targets_positions_268018.t2flow");
method.addParameter("format", Constants.format);//"application/vnd.taverna.t2flow+xml"
method.addParameter("ro", roID);//"http://sandbox.wf4ever-project.org/rodl/ROs/Sample2/"
method.addParameter("token", Constants.authToken);//"541002e2-9ff9-4cff-b85c-2b4af2c33e98"
method.addParameter("extract_main", folder_main);//"http://sandbox.wf4ever-project.org/rodl/ROs/Sample2/workflows/main/"
method.addParameter("extract_scripts", folder_scripts);//"http://sandbox.wf4ever-project.org/rodl/ROs/Sample2/config/scripts/"
method.addRequestHeader("Content-Type", Constants.contentType_URLencoded);//"application/x-www-form-urlencoded"
try{
          int returnCode = client.executeMethod(method);

          if(returnCode == HttpStatus.SC_NOT_IMPLEMENTED) {
                System.err.println("The Post method is not implemented by this URI");            
                method.getResponseBodyAsString();
          }else{
              br = new BufferedReader(new InputStreamReader(method.getResponseBodyAsStream()));
              String readLine;        
              while(((readLine = br.readLine()) != null)) {
                  System.err.println(readLine);
              }
              System.out.println(method.getStatusText());
              System.out.println("Location :"+method.getResponseHeader("Location").getValue());
          }
      } catch (Exception e) {
          System.err.println(e);
      } finally {
          method.releaseConnection();
          if(br != null) try { br.close(); } catch (Exception fe) {}
      }

Check job status: GET /jobs/{id}

The job status may be retrieved with a GET request to the job URI.

Job running

C: GET http://example.net/translate/jobs/fefe-fefefef-fefefefefefe HTTP/1.1
C: Accept: application/json

S: HTTP/1.1 200 OK
S: Content-Type: application/json
S:
S: {
S:   "resource": "http://www.example.com/workflow.t2flow",
S:   "format": "application/vnd.taverna.t2flow+xml",
S:   "ro": "http://www.example.org/rodl/ROs/someRO/",
S:   "extract": {
S:       "main": "http://www.example.org/rodl/ROs/someRO/workflows/main/",
S:       "scripts": "http://www.example.org/rodl/ROs/someRO/config/scripts/"
C:   },
S:   "status": "running"
S: }

Job finished

When the job has finished, the resources added or a reason for the jobs' failure is indicated:

C: GET http://example.net/translate/jobs/fefe-fefefef-fefefefefefe HTTP/1.1
C: Accept: application/json

S: HTTP/1.1 200 OK
S: Content-Type: application/json
S:
S: {
S:   "resource": "http://www.example.com/workflow.t2flow",
S:   "format": "application/vnd.taverna.t2flow+xml",
S:   "ro": "http://www.example.org/rodl/ROs/someRO/",
S:   "extract": {
S:       "main": "http://www.example.org/rodl/ROs/someRO/workflows/main/",
S:       "scripts": "http://www.example.org/rodl/ROs/someRO/config/scripts/"
C:   },
S:   "status": "done",
S:   "added": [
S:       "http://www.example.org/rodl/ROs/someRO/workflows/main/workflow.wfbundle",
S:       "http://www.example.org/rodl/ROs/someRO/config/scripts/ascript.txt", 
S:       "http://www.example.org/rodl/ROs/someRO/config/scripts/anotherscript.txt"
S:   ]
S: }
When the job has finished, the service may provide its status for an arbitrary amount time, large enough to allow clients to check that the job has finished. Retrieving the job status after that time will result in 404 Not Found.

Invalid resource

If the workflow resource is not valid, e.g. can't be found or not a supported workflow definition, the status is invalid_resource:

C: GET http://example.net/translate/jobs/fefe-fefefef-fefefefefefe HTTP/1.1
C: Accept: application/json

S: HTTP/1.1 200 OK
S: Content-Type: application/json
S:
S: {
S:   "resource": "http://www.example.com/non-existing-workflow.t2flow",
S:   "format": "application/vnd.taverna.t2flow+xml",
S:   "ro": "http://www.example.org/rodl/ROs/someRO/",
S:   "status": "invalid_resource",
S:   "reason": "Can't read the workflow: 404 Not Found"
S: }

Job failed

If the job failed, the status is runtime_error and reason shows the error message.

C: GET http://example.net/translate/jobs/fefe-fefefef-fefefefefefe HTTP/1.1
C: Accept: application/json

S: HTTP/1.1 200 OK
S: Content-Type: application/json
S:
S: {
S:   "resource": "http://www.example.com/workflow.t2flow",
S:   "format": "application/vnd.taverna.t2flow+xml",
S:   "ro": "http://www.example.org/rodl/ROs/someRO/",
S:   "status": "runtime_error",
S:   "reason": "It didn't work today"
S: }

Cancel a job

The service MAY support canceling a running job by sending a DELETE request.

DELETE http://example.net/translate/jobs/fefe-fefefef-fefefefefefe HTTP/1.1

204 No content

Link relations

Creating the transformation job is done by a request to the service URI, and all other requests are done using the URI returned by the first one.

HTTP methods

The API uses a POST method to create a transformation job, GET to retrieve the status of a job and DELETE to cancel a running job.

Resources and formats

A job description is a JSON object with the following attributes:

  • resource: URI of the workflow that is transformed
  • format: MIME type of the workflow
  • ro: URI of the research object to which the service saves the resources
  • status: Job status, allowed values are: "running", "done", "failed".
  • token: OAuth 2.0 Bearer token of the research object owner.

Cache considerations

Cacheing can be used to when retrieving statuses of jobs that have not changed.

C: GET http://example.net/translate/jobs/fefe-fefefef-fefefefefefe HTTP/1.1
C: Accept: application/json

S: HTTP/1.1 200 OK
S: Content-Type: application/json
S: Last-Modified: Sat, 29 Oct 2010 19:43:31 GMT
S: ETag: "152"
S:
S: {
S:    (...)
S: }
C: GET http://example.net/translate/jobs/fefe-fefefef-fefefefefefe HTTP/1.1
C: Accept: application/json
C: If-Modified-Since: Sat, 29 Oct 2010 19:43:31 GMT
C: If-None-Match: "152"

S: HTTP/1.1 304 Not Modified
S: Last-Modified: Sat, 29 Oct 2010 19:43:31 GMT
S: ETag: "152"
C: GET http://example.net/translate/jobs/fefe-fefefef-fefefefefefe HTTP/1.1
C: Accept: application/json
C: If-Modified-Since: Sat, 29 Oct 2010 19:43:31 GMT
C: If-None-Match: "152"

S: HTTP/1.1 200 OK
S: Content-Type: application/json
S: Last-Modified: Sat, 29 Oct 2010 20:54:41 GMT
S: ETag: "155"
S: Expires: Sat, 6 Nov 2010 20:54:41 GMT
S:
S: {
S:    (...)
S: }

Discussion

Questions and answers:

  1. Question 1: A t2flow can have many workflows (right?), each with a UUID. Do we assign each UUID to a resource (wf dcterms:identifier uuid), and assign no id to the RO itself?## Yes, but one-and-only-one of them will be the 'main' workflow - and its UUID is used for making the WorkflowBundle URI
  2. Question 2: WorkflowBundle#getMainWorkflow#getWorkflowIdentifier returns a URI - how is it related to the UUID, why should we use UUID not this URI?## It's constructed from the t2flow UUID. However there will be both the workflow bundle ID and the workflow ID - both URIs have the same UUID from the main workflow, but different prefix. We should use the WorkflowBundle URI (that's WorkflowBundle.getGlobalBaseURI()) as the identifier for the RO, and the individual workflow's identifier for the wfdesc (the Workflow.getWorkflowIdentfier).
  3. Question 3: Is there a javadoc for the scufl2 API? I see that the WorkflowBundleIO can save to file, but I'll need something else.## Easiest in Eclipse is to click F3 to get the source code - otherwise see http://mygrid.github.com/scufl2/api/0.9/
The authorization algorithm is rather weak - an access token is shared between the caller and the service. A better solution would be one of the following:
  • The caller sends a single-use authorization code, which is exchanged by the service for an access token. Safer but requires constant reauthorization, especially difficult for offline clients such as ro-manager
  • The service is considered trusted and has its own access token - currently not supported by RODL (actually I'm not sure that this one is better)
Considerations:
  • The translator might behave differently for different formats, like Galaxy, SCUFL2 .wfbundle, WINGS - (arguably this could come from the Content-Type of the resource, but that might only works for single-resource workflows!)
  • The given ro might or might not exist. RODL API should support PUT to create.
  • Translation might take some time, so a status is returned
  • Cache headers tell us
Comments from Piotr
  • Requests to the service don't need to be OAuth-authorized. This would make sense if the user had an account with the service and wanted a 3rd party application (i.e. RO Portal) to act on his behalf. What we need is to authorize the service to interact with RODL on user's behalf.
  • A typical flow to achieve the above goal would be:
    • The user makes an unsigned request to the service.
    • The service recognizes that it needs an access token, so it redirects the user to RODL User Management Application
    • The user logs in, accepts and is redirected again to the service (with the access token / authorization code).
    • The service makes a signed request to RODL.
  • Problems with the above:
    • Difficult to handle for offline clients (how should ro-manager handle a 302 response?)
    • Unless the service stores the access token for some time, requires constant user authorization.
  • I suggest to use OAuth 2.0 instead of OAuth 1.0 which is used below. OAuth 2.0 is much simpler and is supported by RODL unlike OAuth 1.0. However, it's secure only when used over HTTPS.
  • The API does not allow to send a workflow bundle as a request body, does it always have to be a web resource?
  • To cancel a job, shouldn't the request be a DELETE rather than GET?
  • What should be the service response if job parameters are incorrect? In particular, what if the workflow can't be downloaded?