Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unspecified status of extra content in the JSON files #637

Open
Remi-Gau opened this issue Oct 2, 2020 · 12 comments
Open

unspecified status of extra content in the JSON files #637

Remi-Gau opened this issue Oct 2, 2020 · 12 comments
Labels
discussion ongoing discussion opinions wanted Please read and offer your opinion on this matter

Comments

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented Oct 2, 2020

This relates to an issue raised on the bids-validator but that should probably be discussed here before we decide to change the specs: https://github.com/bids-standard/bids-validator/issues/1044

TL;DR

For most JSON files the validator does not allow extra content to be added. This is not mentioned in the specs and probably should be.


Problem

For example, the validator will throw an error if this json file for an iEEG file.

{
    "Instructions": "do this",
    "PowerLineFrequency": 50,
    "SamplingFrequency": 2400,
    "SoftwareFilters": "n/a",
    "TaskName": "implanted Target Practice",
    "iEEGReference": "Cz",
    "extraInfo": {
        "nestedExtraInfo": "something extra"
    }
}

But the specs does not mention that users cannot do that.

From the validator side, it allows for stronger validation (see here).

From the community and user side, this is can seen as a way to try to incite users to ask for their extraInfo to be supported by the specs.

Suggestion

  • makes this rule and its motivations explicit in the specs.
  • change the error message on the validator to point users to open issues to have extra content supported by the specs

Related issue on the validator

https://github.com/bids-standard/bids-validator/issues/1074

@satra
Copy link
Collaborator

satra commented Oct 2, 2020

i don't think the spec should restrict the set of properties in the metadata file. for example, over the years heudiconv has allowed putting in more metadata into the json files than supported by bids. in general the validator could operate in strict (restricted to schema) or non strict mode (validates only the properties in the schema).

since each dataset carries a bids version, one can always validate with respect to that version of the schema, rather than the current state of the spec. this could also be a choice the validator makes (i.e. to validate against a specific version of the schema).

allowing schema + schemaless is more general and perhaps should not be discouraged (see: https://www.martinfowler.com/articles/schemaless/)

@sappelhoff
Copy link
Member

I like the restriction of metadata files for the reasons I outlined in https://github.com/bids-standard/bids-validator/issues/1044 (note: I didn't read the slides linked by Satra yet)

So I would be reluctant to allow any amount of metadata for Ephys ... however given that additional arbitrary metadata was allowed for MRI for a long time now, I would be fine to keep it that way.

In any case, documenting this explicitly would be a welcome addition to the spec. (after more discussion has happened in this issue)

since each dataset carries a bids version, one can always validate with respect to that version of the schema, rather than the current state of the spec. this could also be a choice the validator makes (i.e. to validate against a specific version of the schema).

yes, this is an often requested feature that would indeed be sensible to have

@Remi-Gau
Copy link
Collaborator Author

Just realized that the followings are in the specs when it comes to the data dictionary for TSV files. I wonder why this is not partly the reason why I assumed this applied to all JSON files

If a data dictionary is provided, it MAY contain one or more fields describing the columns found in the TSV file (in addition to any other metadata one wishes to include that describe the file as a whole).

Bold is mine.

https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#tabular-files

See also:

Additional metadata may be included as in any TSV file to specify, for example, the units of the recorded time series.
https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/06-physiological-and-other-continuous-recordings.html#physiological-and-other-continuous-recordings

@neuromechanist
Copy link
Member

To throw my support in relaxing this and provide an example: I am finishing the curation of the Child Mind HBN dataset, and for the movie-watching tasks I made the following description:

{
  "InstitutionName": "Child Mind Institute",
  "InstitutionAddress": "101 E 56th St, New York, NY 10022",
  "Manufacturer": "Magtism EGI",
  "ManufacturersModelName":"128-channel GSN 200 v.2.1",
  "TaskDescription": {
    "GeneralDescription": "Participants viewed four short, age-appropriate video clips taken from television and movies. (Prior to this task, parents were given the opportunity to review the full list of clips and exclude any video clips they deemed unsuitable for their children; no parents had any objections to the clips.).",
    "VideoDescription":{
      "Rating": "No parental guideline rating",
      "Description": "Mother gives her son a poppy as a present, but the poppy has a missing limb.",
      "Rationale": "The clip is included to probe for interpersonal empathy functioning.",
      "Link": "https://youtu.be/3XA0bB79oGc",
      "Length": "3:23"
    }
},
  "Instructions": "Just maintain fixation on the central spot at all times. Press to begin. First, we have to measure the position of your eyes. Just follow the circle with your eyes."
}

This fails because additional tags are not allowed, and everything under TaskDescription is additional.

I understand that not allowing additional properties helps with more stringent validation and prevents people from making typos or making mistakes with the key names.

A happy medium could probably be enforcing the explicit properties on the first level but relaxing the additional properties on the second level and so on. Also, having an ExtraInfo key can be a good choice.

@effigies
Copy link
Collaborator

This fails because TaskDescription is defined to be a string, not because extra fields are not permitted. Extra fields have been permitted for MRI data the entire time.

@neuromechanist
Copy link
Member

Very true, I thought since the same exception is thrown, it might be the same condition.

FWIW, opening the additional properties or allowing key-value objects where there is unregulated string would resolve this issue.

For my datasets, I just slap all the info together in a single string to become compliant, but I think these additional info are worhty especially when there are thousands of subjects.

@VisLab
Copy link
Member

VisLab commented Aug 30, 2024

Maybe a warning rather than an error can be issued if extra fields are included. This would alert users if they made a typo, but still allow additional information to be included.

@effigies
Copy link
Collaborator

Just a note that the schema validator treats all sidecars uniformly, as we do not construct one omnibus JSON schema, but a collection of partial schemas.

An additional check could be devised to look for unvalidated sidecar fields, and I look forward to the contribution. :-)

As to this, I believe we could extend the unspecified data text to include unspecified metadata:

https://bids-specification.readthedocs.io/en/stable/common-principles.html#unspecified-data

Non-standard files and directories should be named with care. Future BIDS efforts may standardize new entities and suffixes, changing the meaning of filenames and setting requirements on their contents or metadata. Validation and parsing tools MAY treat the presence of non-standard files and directories as an error, so consult the details of these tools for mechanisms to suppress warnings or provide interpretations of your filenames.

We could just as easily write:

Non-standard metadata should be named with care. Future BIDS efforts may standardize new metadata fields, changing the meaning or setting constraints on their values.

@effigies
Copy link
Collaborator

I believe @yarikoptic had a proposal to reserve the X- prefix to permit nonstandard fields to be used with no possibility of clashing with future standardization. You could also imagine a DICOM- prefix that dcm2niix and friends could use for fields that are not yet standardized.

@yarikoptic
Copy link
Collaborator

+1 on X- prefix. Here is an issue for that

Please chime in! there

Could be x-dicom- prefix indeed ;-)

@mharms
Copy link

mharms commented Oct 24, 2024

I'm not a fan of requiring nonstandard fields to all start with a common prefix to satisfy a validator.

To give an example, we just had to manually alter the polarity of PhaseEncodingDirection in a set of json's. To represent/capture this, we added a field PhaseEncodingDirectionNote which seemed like a natural way to handle this.

BIDS is just one user of the json metadata, so requiring that all nonstandard fields start with a common preface to avoid an error in a BIDS validator doesn't seem appropriate to me.

@yarikoptic
Copy link
Collaborator

yarikoptic commented Oct 25, 2024

Thanks for chiming in @mharms ! IMHO it is not to "satisfy" validator at all . But better to continue on this in the

which describes some rationale (I extended it a bit now) behind it and since clearly this not to happen in BIDS 1.0 (would brake too many datasets, requires helper to auto-migrate/rename them all etc).

@yarikoptic yarikoptic moved this to Todo in BIDS 2.0 Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion ongoing discussion opinions wanted Please read and offer your opinion on this matter
Projects
Status: Todo
Development

No branches or pull requests

8 participants