Skip to content
Dawn E. Smith edited this page May 31, 2018 · 4 revisions

This configuration file helps locate the study config files, configures servers that Datman may need to communicate with, configures the structure of datman managed study folders, and details expected scan tags and how to handle them.

NOTE: All keys are case sensitive. If Datman is complaining about KeyErrors for any of these settings check that your spelling and case are correct.

Table of Contents

Projects Block

The projects block contains a list of short-hand codes for each study that act as keys with the name of the file holding that study's configuration as the value. These study codes are usually, but not necessarily, the same as the study code that begins the Datman ID for sessions in that study and the code must be unique for each study even if the study field of the Datman ID is not.

New studies must be added to this list in order for Datman to manage them. While you can change the name of the file here you can't set the location of the study config file. All config files for a 'system' are expected to be found in one single directory pointed to by the CONFIG_DIR setting in SystemSettings

Example Projects

Note the capital "P" on projects, which is case sensitive. Study codes do not need to be all caps, but however you type them in the Projects block is how you must type them at the command line since the 'Projects' block is how Datman locates a study and all of its data.

Projects:
  STUDY1: <study_1 filename here>
  STUDY2: <study_2 filename here>
   ...
  STUDYN: <study_n filename here>

And an example excerpt from our lab's tigrlab_config.yaml

Projects:
  ANDT: ANDT_settings.yml
  ASDD: ASDD_settings.yml
  COGBDO: COGBDO_settings.yml

SystemSettings Block

At least one system must be configured within this block. This block can allow multiple users to have their own separately managed Datman projects for one installation or allow multiple systems with different folder structures to work with the same NFS mounted one.

When running scripts the system configuration to use must be specified by providing the name in the environment variable DM_SYSTEM (i.e. for many shells this should work: export DM_SYSTEM=my_system_name_here)

Required Keys

  • DATMAN_PROJECTSDIR: Must be the full path to the folder where a set of datman managed projects will be kept. For example, on our local system this is /archive/data/
  • DATMAN_ASSETSDIR: The full path to datman's assets folder on your system. For example, on our local system this is /archive/code/datman/assets/
  • CONFIG_DIR: The full path to the folder of study config files to use for this system. For example, on our system this is /archive/code/config/

Optional Keys

  • QUEUE: This defines the queue type that jobs will be submitted to if a queue is available. Currently this can be either 'sge' or 'pbs'.

Example SystemSettings

Note the capitalization on 'SystemSettings', and that the needed/optional keys are all caps. The system name can be whatever case you prefer, however, you must match the spelling and capitalization exactly when you set DM_SYSTEM in your shell

SystemSettings:
  system_name:
    DATMAN_PROJECTSDIR: <your projects path here>
    DATMAN_ASSETSDIR: <full path to the assets folder here>
    CONFIG_DIR: <full path to your study config file folder>
    QUEUE: <'pbs' or 'sge' goes here>

And an example excerpt from our tigrlab_config.yaml where we have two systems, 'kimel' and 'scc', configured:

SystemSettings:
  kimel:
    DATMAN_PROJECTSDIR: '/archive/data/'
    DATMAN_ASSETSDIR: '/archive/code/datman/assets/'
    CONFIG_DIR: '/archive/code/config/'
    QUEUE: 'sge'
  scc:
    DATMAN_ASSETSDIR: '/KIMEL/quarantine/datman/latest/src/datman/assets/'
    DATMAN_PROJECTSDIR: '/external/rprshnas01/tigrlab/archive/data/'
    CONFIG_DIR: '/external/rprshnas01/tigrlab/archive/code/config/'
    QUEUE: 'pbs'

Paths Block

This block determines the structure of each Datman managed study. Each time a new pipeline folder or other resource is added to your projects a new entry needs to be added to the list. The keys determine how the path will be referenced within the code and the values are a relative path that gets appended to the study folder (which is itself the path from DATMAN_PROJECTSDIR with the PROJECTDIR from the Study Config appended).

Required Paths

Below is a list of paths that must be configured for Datman to function correctly. Most of the core scripts read from or write to the directories configured here.

  • meta: Points to the folder meant to hold metadata like scans.csv, blacklist.csv, checklist.csv, etc.
  • data: Parent folder for the original dicom data and its other raw formats like nifti, mnc, etc.
  • dcm: The folder that will hold raw dicom data. Only one dicom image per series is stored here
  • dicom: The folder that will hold the raw zip files of dicoms before the site naming convention is applied
  • zips: The folder that holds correctly named links that point to the raw zip files in data/dicom
  • resources: The folder that holds all non-dicom data that was present in the raw zip files
  • nii, mnc, nrrd: Folders that hold the converted data in nifti, mnc and nrrd formats respectively
  • qc: Holds all the QC pipeline outputs
  • logs: The folder that will store log output from various scripts and nightly pipelines

Optional Paths

These paths must be configured if the scripts listed are in use, but may be omitted otherwise

  • std
    • Description: Points to the folder that will hold any defined 'standards' for use in the QC pipeline. Usually these are DICOMs or json files that contain expected DICOM header fields.
    • Used by:
      • dm_qc_report.py - Reads from this folder
  • dtiprep
    • Description: Points to the destination folder for dtiprep pipeline outputs
    • Used by:
      • dm_proc_dtiprep.py - Generates the contents
      • dm_proc_tractmap.py - Reads from this folder
  • freesurfer
    • Description: Points to the destination for freesurfer outputs
    • Used by:
      • dm_proc_freesurfer.py - Generates the contents
      • dm_proc_fs2hcp.py - Reads from this folder
  • hcp
    • Description: Points to the folder holding ciftify's HCP format outputs. These are a subset of the HCP pipelines outputs, where temp files and folders have been deleted to save space, and unlike the original HCP pipelines code this can be generated from legacy datasets
    • Used by:
      • dm_proc_fs2hcp.py - Generates the contents
      • dm_proc_ea.py - Reads from this folder
      • dm_proc_fmri.py - Reads from this folder
  • fmri
    • Description: Points to the folder that holds epitome pipeline fmri outputs (e.g. rest, imobs, ea)
    • Used by:
      • dm_proc_ea.py - Generates contents of 'ea' subfolder
      • dm_proc_fmri.py - Generates contents of 'fmri' subfolder
      • dm_proc_imob.py - Generates contents of 'imob' subfolder
      • dm_proc_rest.py - Generates contents of 'rest' subfolder
  • hcp_fs
    • Description: Points to the folder that holds the HCP Pipelines full FreeSurfer pipeline outputs.
    • Used by:
      • dm_hcp_freesurfer.py - Generates the contents
  • unring
    • Description: Points to the folder that holds the outputs of unring.
    • Used by:
      • dm_proc_unring.py - Generates the contents

Example Paths

This example is a subset of all keys available.

Paths:
  meta: metadata/
  std:  metadata/standards/
  data: data/
  dcm:  data/dcm/
  nii:  data/nii/
  nrrd: data/nrrd/
  qc:   qc/
  log:  logs/
  fmri: pipelines/fmri/
  freesurfer: pipelines/freesurfer/

Assuming a configuration where the DATMAN_PROJECTSDIR is /archive/data (as it is in ours) and a PROJECTDIR of SPINS the above settings would generate a project with the following folder structure:

/archive/data/SPINS/   
                   │
                   └─── metadata
                   │   │
                   │   └─── standards
                   │
                   └─── data
                   │   │
                   │   └─── dcm
                   │   │
                   │   └─── mnc
                   │   │
                   │   └─── nrrd
                   │   
                   └─── qc
                   │   
                   └─── logs
                   │   
                   └─── pipelines
                       │
                       └─── fmri
                       │
                       └─── freesurfer

ExportSettings Block

This block defines the expected tags. Each tag has its own dictionary of config values that defines which formats to convert to, which QC function to use for human data, and which QC function to use for phantoms with that tag.

NOTE: Any of the settings in this block can be overridden by the ExportInfo settings in the Study Config file

Required tag settings

The following settings must be used to define site wide defaults for each tag

  • formats
    • Description: This should be a list of formats to convert any series matching this tag to.
    • Accepted values: 'nii', 'dcm', 'mnc', 'nrrd'
  • qc_types
    • Description: This defines the QC function to use in dm_qc_report.py to process human data with the matching tag
    • Accepted values: 'anat', 'fmri', 'dti', 'ignore'
  • qc_pha
    • Description: This setting defines the QC function to use in dm_qc_report.py to process phantom data with a matching tag
    • Accepted values: 'qa_dti', or 'default'. You can also omit 'qc_pha' entirely and the default will be chosen.

Optional tag settings

Site wide default values can optionally be set for any of the keys usually set in ExportInfo. To see available keys and advice for setting them see the ExportInfo section in 'Sites Block'.

Example ExportSettings

The following is a small excerpt from our own ExportSettings

ExportSettings:
  T1:         {formats: ['nii', 'dcm', 'mnc'], qc_type: anat, qc_pha: default}
  T2:         {formats: ['nii', 'dcm'], qc_type: anat}
  RST:        {formats: ['nii', 'dcm'], qc_type: fmri}
  SPRL:       {formats: ['nii'], qc_type: fmri}
  DTI60-1000: {formats: ['nii', 'dcm', 'nrrd'], qc_type: dti, qc_pha: qa_dti}
  FMAP:       {formats: ['nii', 'dcm'], qc_type: ignore}
  DTI-ABCD:   { formats: ['nii', 'dcm'], qc_type: dti, Pattern: 'ABCD_dMRI$'}

In this example all of the tags except DTI60-1000 will use the default phantom QC for their respective qc_type. The last tag (DTI-ABCD) provides an example of using one of the optional settings to set up a site wide default series description pattern that will match this tag. Any study with the same tag in their ExportInfo can override this by including their own 'Pattern' setting.

Misc. Settings

There are a few site wide settings that are not part of any configuration block and that are only needed for a few datman scripts. These are documented here, along with the name of the scripts that use these values.

  • FTPSERVER
    • Description: This should contain the fully qualified domain name of an sftp server that new scans will be pulled from.
    • Used by:
      • dm_sftp.py
  • XNATSERVER
    • Description: Contains the full URL to the XNAT server this site will use to archive its data
    • Used by:
      • datman/xnat.py - Reads this value to find a server to read from / write to
      • dm_xnat_upload.py - Uploads new scans to this server
      • dm_xnat_extract.py - Downloads new scans from this server
      • dm_link_shared_ids.py - Adds shared ID / study alias info to this server
  • XNATPORT
    • Description: Only used alongside XNATSERVER. Specifies which port to connect to on the server.
    • Used by:
      • Same scripts as XNATSERVER
  • REDCAPAPI
    • Description: The full URL to the site's REDCap server where 'scan completed' forms are stored
    • Used by:
      • dm_link_shared_ids.py - Reads shared IDs / session aliases from this server
  • LOGSERVER
    • Description: The IP of the machine that will run the logging server.
    • Used by:
      • dm_log_server.py - This is the actual log server, needs to know this IP to start up
      • The following scripts read this setting to get the destination for the log output ONLY when the remote option is used:
        • dm_hcp_freesurfer.py
        • dm_proc_freesurfer.py
        • dm_qc_report.py
        • nii_to_bids.py
        • xnat_fetch_sessions.py
  • SERVER_LOG_DIR
    • Description: The full path to the folder where all server logs will be stored. Should be accessible to the machine running the log server (i.e. a path local to that machine or a path NFS mounted to it). Only needed if LOGSERVER is set.
    • Used by:
      • dm_log_server.py

Example Template

This template can be copied, modified and expanded when setting up a new site config file. Remember to save it as a '.yml' or '.yaml' file and to add the full path to it in the environment variable DM_CONFIG. Also choose one of your configured systems as DM_SYSTEM

# Misc settings can go here (or anywhere really)
FTPSERVER: some.sftpserver.ca
XNATSERVER: xnat.somedomain.ca
XNATPORT: 443
REDCAPAPI: somedomain.ca/redcap/api/
LOGSERVER: 1.1.1.1
SERVER_LOG_DIR: /some/path/on/1.1.1.1/logs

# Projects defined here
Projects:
  ANDT: ANDT_settings.yml
  ASDD: ASDD_settings.yml
  COGBDO: COGBDO_settings.yml
  COGBDY: COGBDY_settings.yml
  DBDC: DBDC_settings.yml

# Add in Systems (be sure to set one as your current DM_SYSTEM when running datman scripts)
SystemSettings:
  kimel:
    DATMAN_PROJECTSDIR: '/archive/data/'
    DATMAN_ASSETSDIR: '/archive/code/datman/assets/'
    CONFIG_DIR: '/archive/code/config/'
    QUEUE: 'sge'

# Set up the structure of your installation
Paths:
  meta: metadata/
  data: data/
  dcm:  data/dcm/
  nii:  data/nii/
  mnc:  data/mnc/
  nrrd: data/nrrd/
  dicom: data/dicom/
  resources: data/RESOURCES/
  qc:   qc/
  std:  metadata/standards/
  log:  logs/
  zips: data/zips/
  dtiprep: pipelines/dtiprep/
  fmri: pipelines/fmri/
  hcp:  pipelines/hcp/
  hcp_fs: pipelines/hcp_freesurfer/
  freesurfer: pipelines/freesurfer/
  unring: pipelines/unring/

# Configure tags available at this site
ExportSettings:
  T1:         {formats: ['nii', 'dcm', 'mnc'], qc_type: anat, qc_pha: default}
  T2:         {formats: ['nii', 'dcm'], qc_type: anat}
  RST:        {formats: ['nii', 'dcm'], qc_type: fmri}
  SPRL:       {formats: ['nii'], qc_type: fmri}
  DTI60-1000: {formats: ['nii', 'dcm', 'nrrd'], qc_type: dti, qc_pha: qa_dti}
  FMAP:       {formats: ['nii', 'dcm'], qc_type: ignore}
  DTI-ABCD:   { formats: ['nii', 'dcm'], qc_type: dti, Pattern: 'ABCD_dMRI$'}