Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] BEP044 - Stim-BIDS #2022

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

neuromechanist
Copy link
Member

@neuromechanist neuromechanist commented Dec 22, 2024

This PR addresses #153 by introducing specifications for handling and standardizing stimulus files and their annotations within the BIDS specifications. The changes focus on improving the organization, referencing, and metadata of stimulus files to enhance consistency, reusability, and efficiency.

Here are the relevant links and documents:

Known issues:

  • The validator fails because extensions in /src/schema/rules/files/raw require datatype. Stimuli might be a special data type that can only be present at the root of the dataset. So, the datatype field is missing for now.
  • There are some style errors by the remark validators

cc: @bids-standard/bep044 and @monique2208

Implement the standardization of stimulus files and their annotations within the BIDS specifications.

* **Add new file `src/modality-specific-files/stimuli.md`**
  - Describe the specifications for the stimuli directory.
  - Include guidelines for storing stimulus files and their annotations.
  - Define what goes into `stimuli.tsv/json`, `annotations.tsv/json`, and `stim-<label>.json`.
  - Use the same style as other modality-specific docs to design the tables, variables, and examples.

* **Modify `src/modality-specific-files/task-events.md`**
  - Add a section detailing the standardization of stimulus files and their annotations within the BIDS specifications.
  - Include examples of how to use the `stim_file` and `stim_id` columns in `events.tsv` files.
  - Provide guidelines for storing stimulus files in the `/stimuli` directory.
  - Expand the definition of the `stim_file` column to include `stim_id`.

* **Modify `src/schema/objects/columns.yaml`**
  - Update the definition of the `stim_file` column to ensure consistency in stimulus file references.
  - Add the `stim_id` column definition for `events.tsv` files.

* **Modify `src/schema/rules/checks/events.yaml`**
  - Add a check for missing stimulus files declared in `events.tsv`.
  - Add a check for missing `stim_id` references in `events.tsv`.

* **Modify `src/schema/rules/sidecars/events.yaml`**
  - Specify the `StimulusPresentation` metadata field for `events.tsv` files.
  - Include the `stim_id` column in the metadata field specifications.

* **Modify `src/schema/objects/entities.yaml`**
  - Add entities described in the document with proper requirement levels and descriptions.

* **Modify `src/schema/objects/suffixes.yaml`**
  - Add suffixes for `{audio, image, video, audiovideo}`.
  - Include the file extensions and descriptions for each suffix.

* **Add new file `src/schema/rules/sidecars/stimulus.yaml`**
  - Define sidecar tables for `stimuli.tsv/json`, `annotations.tsv/json`, and `stim-<label>.json`.
  - Use the same style as other modality-specific docs to design the tables.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/neuromechanist/bids-specification?shareId=XXXX-XXXX-XXXX-XXXX).
src/modality-specific-files/stimuli.md Outdated Show resolved Hide resolved
src/modality-specific-files/stimuli.md Show resolved Hide resolved
src/modality-specific-files/stimuli.md Outdated Show resolved Hide resolved

The `stimuli.json` file provides detailed descriptions of the columns in the `stimuli.tsv` file. There can be extra entries in the `stimuli.json` in addition to the columns in the `stimuli.tsv` to provide more details about the stimulus.
In cases where the stimulus is not shared, the `stimuli.tsv` file can be used to provide metadata about the stimuli, including the license, copyright, URL, and description. This is simialr to the use of `stim-<label>_<suffix>.json` files for individual stimuli files. In the case of conflict between the metadata in the `stimuli.tsv` and `stim-<label>_<suffix>.json` files, the metadata in the `stim-<label>_<suffix>.json` file takes precedence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, here _. It should be consistent. The _ appears in several places as does without the underbar. I'm not going to mark them further --- just need to be consistent.

src/modality-specific-files/stimuli.md Outdated Show resolved Hide resolved
@Remi-Gau
Copy link
Collaborator

Remi-Gau commented Jan 9, 2025

will do a bit of clean up to get less red in CI and maybe see if we can get the HTML version of the BEP to render

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented Jan 9, 2025

HTML: stimuli page
https://bids-specification--2022.org.readthedocs.build/en/2022/modality-specific-files/stimuli.html

Copy link
Collaborator

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial quick pass with little recommendations etc

| suffix | extensions | description |
| ----------- | ------------------------------- | ---------------------------- |
| audio | `.wav`, `.mp3`, `.aac`, `.ogg` | Audio-only stimulus files |
| image | `.jpg`, `.png`, `.svg` | Static visual stimulus files |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do include .webm for video below and there is an increasing use of the new https://en.wikipedia.org/wiki/WebP format as "the best" of both jpg and png since provides composition of both words

  • supports lossy and lossless compression
  • supports transparency (alpha channel), not only for lossless like in png
  • supports animation

so I would expect studies to start using .

But may be it is premature since ATM I found no single .webp file among openneuro datasets.

```JSON
{
"License": "CC-BY-4.0",
"Copyright": "Lab 2023",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to clarify what to state here and whether it should include year , and overall format

may be we should follow https://reuse.software/tutorial/#step-2 and SPDX (reuse uses it too) for license definitions.


The `stim_id` in the events file links to corresponding files:

```Text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs MACROS___make_filetree_example until someone smart implements parsing... ref: #2014 (comment)

- audio
extensions:
- .wav
- .WAV
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should allow/encourage mixed casing allowed! Let's stick to lower case . we have .json and .tsv and no .JSON and .TSV . I do not see why these files need to be different

Comment on lines +885 to +886
The JSON file associated with each media file should contain information such as License (RECOMMENDED), Copyright
(RECOMMENDED), URL (OPTIONAL), and Description (OPTIONAL) to describe the origin and the nature of the media.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is advisable to describe all the fields and whether they are required or not -- this duplicates information formalized elsewhere so would be duplication and thus prone to get out of date etc.

Suggested change
The JSON file associated with each media file should contain information such as License (RECOMMENDED), Copyright
(RECOMMENDED), URL (OPTIONAL), and Description (OPTIONAL) to describe the origin and the nature of the media.
The JSON file associated with each media file should contain information to describe the origin and the nature of the media.

here and similarly below for others

}
}) }}

Note: The presence of `stimuli.tsv` file indicates that the content of the `/stimuli` directory follows this BIDS specification for stimulus organization. This structure is planned to become mandatory in BIDS 2.0.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a first time we have something like this? I believe we do not even have similar conditioning for derivaties/ yet... I just wonder if validator is "ready" or what needs tobe done? WDYT @effigies ?

@@ -20,6 +20,7 @@ nav:
- Near-Infrared Spectroscopy: modality-specific-files/near-infrared-spectroscopy.md
- Motion: modality-specific-files/motion.md
- Magnetic Resonance Spectroscopy: modality-specific-files/magnetic-resonance-spectroscopy.md
- Stimuli: modality-specific-files/stimuli.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it there for now as the file is in modality specific folder, but I think this stimuli BEP should be 'modality agnostic".

@oesteban

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree @Remi-Gau

Copy link
Member Author

@neuromechanist neuromechanist Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.
IMHO, the same goes for (task) events. Hopefully, if stimuli moves to the modality agnostic section, the events will also move.

Comment on lines +197 to +202
This entity is used to indicate which component of a complex
representation is being stored. For MRI data, it indicates which component
of the complex signal is represented in voxel data. For stimulus files, it can
be used to distinguish different parts of a single stimulus, such as chapters
in an audiobook or segments of a long movie (for example, `part-1`, `part-2`,
`part-epilog`, `part-chapter1`).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably do not want to erase all the details from the previous definition

can we have several definition for an entity : mean different things depending on the datatype or suffix... ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very hacky ways to make the tests pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants