diff --git a/content/resources/tutorials/data-dictionary.md b/content/resources/tutorials/data-dictionary.md index 5fb0db8..1033807 100644 --- a/content/resources/tutorials/data-dictionary.md +++ b/content/resources/tutorials/data-dictionary.md @@ -1,10 +1,151 @@ --- -title: Data dictionary +title: Creating a BIDS Data Dictionary (Annotation) linkTitle: Data dictionary type: docs weight: 5 --- -TODO: [GH 122](https://github.com/ReproNim/repronim.org/issues/122) +**ReproNim principles** -https://docs.google.com/document/d/19NE1fRhaBNpu90JJ7nWIjC5WnLimg_l4Sz1FDTxZ-bE/edit?tab=t.0#heading=h.k1s2szmakxpb +- 2b. Data and Metadata management: Use standard data formats and extend them for your needs + +**Actions** + +- Standards +- Annotation + +**Standards** + +- BIDS + +**Tools** + +- PyNIDM + +## Challenge + +Converting your DICOM data to BIDS is a great first step in making your data more reusable. Part of the BIDS dataset are optional files containing demographics or other assessment variables collected as part of the study and provided in the “participants.tsv” file included in the BIDS dataset (sometimes called phenotype information). Without a file describing the variables included with the “participants.tsv” file, it is unclear what each variable means, the data type expected, and, if categorical, how to interpret the categories and their meaning. This file is named “participants.json” in BIDS and its description can be found in the [BIDS manual](https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files.html#participants-file). This could result in the BIDS phenotype data being uninterpretable by others beyond the person(s) who created this BIDS dataset and thus making it less reusable. Here we show you a simple python command line method for creating a **data dictionary** describing the variables contained within the “phenotype.tsv” file and any other assessment files contained within a BIDS phenotype directory. Finally, we’d like to be able to find this BIDS dataset when querying for datasets that contain specific types of variables (e.g. a dataset that has a measure of age) and since each dataset may specify a variable such as age using a different variable name (e.g. “age at study intake”, “MRI age”, etc.) we can associate selected variables with **concepts** that describe their meaning at a high level which allows us to find datasets containing a variable measured that is associated with, in this example, the concept of age. Below is an exercise to help you do this. + +## Exercise + +Here we will learn how to create a data dictionary / annotation file “participants.json” for a simple BIDS dataset containing 2 variables: “Gender” and “Age”. Gender is a categorical variable encoding “M” for male and “F” for female. Age is an integer variable encoding the age of the participant in years after birth. Further, because we want to enable searching for this dataset across varying BIDS datasets, we’ll associate concepts with the “gender” and “age” variables, creating a “participants.json” data dictionary file to make your BIDS dataset somewhat self-describing and annotated for other uses and for search engines. + +![image](/images/data_dictionary.png) + +[BIDS with participants.json data dictionary](https://bids.neuroimaging.io/assets/img/dicom-reorganization-transparent-white_1000x477.png) + +## Before you start + +Before diving into the conversion process, it's important to familiarize yourself with the following: + +* **BIDS (Brain Imaging Data Structure):** A widely adopted standard for organizing neuroimaging data. BIDS enforces a specific directory structure and file naming conventions, promoting data organization, shareability, and compatibility with various analysis tools. We recommend visiting the [BIDS website](https://bids.neuroimaging.io/index.html) and browsing the [BIDS starter kit](https://bids-standard.github.io/bids-starter-kit/) +* **Python**: The PyNIDM bidsmri2nidm.py requires the ability to run a python script from the terminal window. + +## Step by step guide + +### Step 1: Installing the Necessary Tools + +* **PyNIDM tools:** To install PyNIDM tools, complete the following steps: +* Install the dependencies here: [https://pynidm.readthedocs.io/en/latest/](https://pynidm.readthedocs.io/en/latest/) +* [Create an empty python environment with Python 3.9 using Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) +* From a terminal window run with the new Conda environment active, run: $ pip install pynidm + +### Step 2: Run bidsmri2nidm.py PyNIM Tool + +``` +bidsmri2nidm.py \-d \~/…/bids\_dataset \-bidsignore \-o \~/…/bids\_dataset/nidm.ttl +``` + +### Step 3: Annotation Process for a variable in participants.tsv called "gender" + +#### Step 3a: Define the Term + +Enter a full name and a description for the term. Example: + +``` +Full Name: Gender +Description: Gender of participant +``` + +#### Step 3b: Choose a Datatype + +Select a datatype from the predefined list: + +``` +1: string +2: categorical +3: boolean +4: integer +``` + +Example selection: 2 (categorical). + +#### Step 3c: Enter Categories and Labels + +Input the categories and their corresponding labels: + +``` +Category 1: Male +Category 2: Female +``` + +#### Step 3d: Concept Association + +Search for an ontology concept related to the term (e.g., gender). Example: + +Query String: gender Results: 1\. Label: gender Definition: Sex or gender of the subject 2\. … + +Select the appropriate concept: 1\. + +### Step 4: Annotation Process for a variable in participants.tsv called "age" + +#### Step 4a: Define the Term + +Enter a full name and a description for the term. Example: + +``` +Full Name: Age +Description: Age of participant at time of scan +``` + +#### **Step 4b: Choose a Datatype + +Select a datatype from the predefined list: + +``` +1: string +2: categorical +3: boolean +4: integer +``` + +Example selection: 4 (integer). + +#### Step 4c: Set Value Range + +Specify minimum and maximum values for your variable "age": + +``` +Minimum Value: 18 +Maximum Value: 120 +``` + +#### Step 4d: Specify Units + +Provide units for the term: + +``` +Units: years +``` + +#### Step 4e: Concept Association + +Search for an ontology concept related to the term (e.g., age). Example: + +Query String: age Results: 1\. Label: Age Definition: A time quality…. 2\. … + +Select the appropriate concept: 1\. + +### Step 4: Validation and Next Steps + +After the conversion completes, look at the “participants.json” file in a text browser that is now included in your BIDS directory. It should contain annotations of all the variables in your participants.tsv file. If you made any typos, you can modify this file in a text editor and save it. In addition, a “nidm.ttl” file has been added to your BIDS directory which can be used for search in the figure. Don’t worry, it has been added to a .bidsignore file so your BIDS dataset is still valid. For a full video showing the annotation process see [PyNIDM Annotations](https://www.youtube.com/watch?v=egFGzdna0eo). diff --git a/static/images/data_dictionary.png b/static/images/data_dictionary.png new file mode 100644 index 0000000..e0e0b64 Binary files /dev/null and b/static/images/data_dictionary.png differ