Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize _creator field #48

Open
joshmoore opened this issue Nov 13, 2020 · 3 comments
Open

Standardize _creator field #48

joshmoore opened this issue Nov 13, 2020 · 3 comments

Comments

@joshmoore
Copy link
Member

Currently,

(z) /opt/omero-ms-zarr $cat 101.zarr/.zattrs
{
    "_creator": {
        "name": "omero-zarr",
        "version": "0.0.2.dev79+gb361c09"
    },

is added on export. We may want to slightly update this to match with a vocabulary like Dublin Core or W3C PROV.

@sbesson
Copy link
Member

sbesson commented Aug 19, 2021

Another candidate vocabulary would be SoftwareApplication. This is also the vocabulary suggested in https://www.researchobject.org/ro-crate/1.0/#provenance-software-used-to-create-files.

The example above could be translated into:

          "@context": "https://schema.org",
          "@type": "SoftwareApplication",
          "name": "omero-cli-zarr",
          "version": "0.0.2.dev79+gb361c09"

Trying also to include the discussion around additional software information in #76 (comment), softwareAddon would be an option

          "@context": "https://schema.org",
          "@type": "SoftwareApplication",
          "name": "omero-cli-zarr",
          "version": "0.0.2.dev79+gb361c09",
          "softwareAddOn": {
               "@type": "SoftwareApplication",
               "name": "bioformat2raw",
               "version": "0.3.0",
          },

@joshmoore
Copy link
Member Author

Generally looks interesting, but we'll need to figure out where it's attached. Only at the top level? (Do we have a standard structure there?) For each multiscale in case they are generated by different software. etc.

@sbesson
Copy link
Member

sbesson commented Aug 20, 2021

#48 (comment) is a use case where there is a one-to-one mapping between the software and the specification i.e.

multiscales -> bioformats2raw
omero -> omero-cli-zarr

So although it could be at the top-level, there is a case for defining it (or including a reference via @id) at the level of each specification. This is what the multiscales specification currently attempts to do via metadata. Maybe we want to generalize this to allow all specifications to inject provenance metadata in a metadata field?

For more granular provenance i.e. each dataset being generated by different software, maybe we want to allow metadata fields to be defined further down the path e.g.

{
   "multiscales":[
      {
         "version":"0.2",
         "name":"example",
         "datasets":[
            {
               "path":"0",
               "metadata":{
                  "@context":"https://schema.org",
                  "@type":"SoftwareApplication",
                  "name":"bioformat2raw",
                  "version":"0.3.0"
               }
            },
            {
               "path":"1",
               "metadata":{
                  "@context":"https://schema.org",
                  "@type":"SoftwareApplication",
                  "name":"mydownsampler",
                  "version":"0.1.0"
               }
            }
         ]
      }
   ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@joshmoore @sbesson and others