[Draft] Export Tales that have at least one version and run #435

ThomasThelen · 2021-01-12T22:55:32Z

Purpose:

The purpose of this issue is to discuss and track the progress of exporting Tales that have version/run structures.

Background:

Tales were recently updated to include the notions of versions and runs. Users can create multiple versions of a Tale; each version may contain multiple runs. Each run has a results/ directory where computational outputs are stored. Each run also has a workspace & data directory which are symlinked back to the version's respective folders. Note that this is where the mutability of the workspace folder comes into play.

During the Jan 11 2021 dev call we discussed a few different possibilities as to what this might look like when Tales are exported and re-imported.

The main take aways are that we can either export Tales with run/version structures or without.

Proposed Approaches:

The following approaches each have strengths, weaknesses, and varying levels of complexity.

Exporting All Runs in a Version

This approach exports all of the runs under a particular version of a Tale. The advantage of this is that users can have a record of all of the runs in the version rather than a limited view of what happened. When importing, a more complete version of the Tale is reconstructed. Note that the original Tale may have many versions. The versions that aren't exported will be lost on an import.

This may be confusing for some published Tales because (presumebly) only ony of the runs are going to be referenced in a linked paper. This also conflicts with the idea of exporting/publishing individual recorded runs (how do we let users export ALL runs and only a recorded run).

Proposed BagIt structure

bagit.txt
baginfo.txt
.
.
.
data/
  |-workspace/
  |-data/
  |-versions/
     |-version_1/
       |-workspace/
       |-data/
  |-runs/
     |-run1/
       |-workspace/
       |-data/
       |-results/
       |-version
       |-.stderr
       |-.stdout
     |-run2/
       |-workspace/
       |-data/
       |-results/
       |-version
       |-.stderr
       |-.stdout
     |-run3/
       |-workspace/
       |-data/
       |-results/
       |-version
       |-.stderr
       |-.stdout

Importing Changes

To import a Tale with multiple versions, we need to know

The name of the version
The name of each run
A mapping between the run folders on the exported Tale and the name of the run that the user may have specified in Whole Tale.

These constraints can be tackled by

Enforcing a naming convention on the folder names (the version folder name is the name of the Tale's version, each run folder is the name of each run). This can easily be parsed during import.
Adding additional structure to the manifest.json to include metadata about each run and version (most likely requires us to come up with new terms for runs & versions).

eg

{
    @id: "run/1",
    @type: "wt:run"
    schema:name: "run_1"
}

{
    @id: "version/1",
    @type: "wt:version"
    schema:name: "version_1"
}

We can make this arbitrarily complex by inntroducing membership predicates (wasPartOf, etc) to describe relations between versions and runs.

Exporting Individual Runs

This approach exports a particular run of a version, which clearly contrasts exporting all of the runs. The visible difference is that the export looks a little cleaner (personal opinion) and can be useful for users that are interested in a particular result.

This approach is also more streamlined for the use case of exporting reproducible runs: the user interface should look the same for a user exporting a recorded & non-record run.

Proposed BagIt structure (1)

bagit.txt
baginfo.txt
.
.
.
data/
  |-workspace/
  |-data/
  |-versions/
     |-version_1/
       |-workspace/
       |-data/
  |-runs/
     |-run1/
       |-workspace/
       |-data/
       |-results/
       |-version
       |-.stderr
       |-.stdout

Importing Changes (1)

The constraints for exporting are the same as the case for exporting all of the runs. It may be useful to preserve the original naming that was done in the frontend.

Proposed BagIt structure (2)

This BagIt structure is different than the first in that there isn't any indication that the exported Tale is a version/run other than the filesystem artifacts from the run. This is nice because it's conceptually not that confusing (compared to many symlinks that users would be asking about) and much easier to navigate.

bagit.txt
baginfo.txt
.
.
.
data/
  |-workspace/
  |-data/
  |-results/
    |-.stderr
    |-.stdout

Importing Changes (2)

When importing a Tale with this structure there are a few options.

If we want to preserve the version/run names to partially reconstruct the the Tale, these can be encoded in the mannifest.json file.

We can also ignore the version/run information and place the content in the results/ folder into the workspace/ folder.

The third option is to create a generic Version & Run name and place the results/ artifacts in the appropriate place.

The text was updated successfully, but these errors were encountered:

ThomasThelen · 2021-01-13T19:53:41Z

We also need to consider users that want to export Tales without Recorded Runs or versions. I think that this is still a legitimate use case that we should support. On the girer_wholetale side this should be mostly trivial since it's already implemented; the trick is getting a flag from the export endpoint dictating whether a run/Tale is being exported.

ThomasThelen changed the title ~~Export Tales that have at least one version and run~~ [Draft] Export Tales that have at least one version and run Jan 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Export Tales that have at least one version and run #435

[Draft] Export Tales that have at least one version and run #435

ThomasThelen commented Jan 12, 2021

ThomasThelen commented Jan 13, 2021 •

edited

Loading

[Draft] Export Tales that have at least one version and run #435

[Draft] Export Tales that have at least one version and run #435

Comments

ThomasThelen commented Jan 12, 2021

Purpose:

Background:

Proposed Approaches:

Exporting All Runs in a Version

Proposed BagIt structure

Importing Changes

Exporting Individual Runs

Proposed BagIt structure (1)

Importing Changes (1)

Proposed BagIt structure (2)

Importing Changes (2)

ThomasThelen commented Jan 13, 2021 • edited Loading

ThomasThelen commented Jan 13, 2021 •

edited

Loading