Skip to content

02. Unpack raw data

Xin Niu edited this page Oct 15, 2024 · 8 revisions

This step reads the raw CSC (Continuously Sample Channel) data (.ncs for Neuralynx) on Long Term Storage, fills NaNs in incomplete blocks, and saves the result in .mat files on Hoffman.

Input file structure

We should avoid hard-coding the structure of the input file into the pipeline, as it may change in the future, or new data sources with different structures might be introduced.

(G[A-D] is not necessary in the output files)

neuralynx:

UCLA:

D<patient_id>
 - EXP<exp_id>_<exp_name>
   - <exp_date>
     - G[A-D][1-8]-<channel_name>.ncs (micro data)
     - G[A-D][1-8]-<channel_name>_<suffix>.ncs (micro data)
     - <channel_name>.ncs (macro data)
     - <channel_name>_<suffix>.ncs (macro data)
     - Events_<suffix>.nev

Iowa:

For Iowa data, we need the montage setting to rename the macro and micro channels:

montage_Patient-<patientId>_exp-<expId>_<datetime>.json
<patient_id>[R,L]
 - <patient_id>-<exp_id><exp_name>
   - <exp_date>
     - CSC[193-288].ncs (this is not unpacked)
     - PDes[1-96].ncs
     - LFPx[1-192].ncs
     - Events.nev

Using the connection table to map the Iowa data name to the name defined by the montage setting.

Header file: start time (Unix time)

Order files for each channel: Both micro and macro channels may have multiple files for a single channel, with the suffix ‘_00001’. Unfortunately, these files do not always match the temporal order of recording. We need to check the header and reorder files for each channel.

Event files must also be ordered (maybe with a different temporal order). We assume the suffix order is the same across all channels within the experiment. So, only the first channel will be checked to correct the order. Checking all the files will be time-consuming.

Incomplete blocks in .ncs files:

Neuralynx system sends signals in blocks with 512 samples with a single timestamp of the beginning of the block. Sometimes, a block is not complete. In this case, the IO software we use (Nlx2MatCSC_v3 developed by Ueli Rutishauser) will fill the block with duplicated values. (See similar discussion here: https://groups.google.com/g/neuralensemble/c/m7BJlJRKuI0)

Emily detects incomplete blocks and fills them with NaNs in her function: Nlx_readCSC.m in PDM. We use her solution to unpack the raw data.

Blackrock:

Example data:

montage_Patient-<patientId>_exp-<expId>_<datetime>.json
D<patient_id>
 - <exp_date>-<unknown_id>
   - <exp_date>-<unknown_id>.ns3 (macro, 2k Hz)
   - <exp_date>-<unknown_id>.ns5 (micro, 30k Hz)
   - <exp_date>-<unknown_id>.nev

Output file structure:

The output files are consistent across devices and stored in the Hoffman directory: HoffmanMount/data/PIPELINE_vc/ANALYSIS The file path for all channels is saved in channelFileNames.csv. Each row may have one/multiple file paths for segments/experiments of the same channel. As timestamps are consistent across channels, they are extracted for only one channel.

After the spike sorting and LFP extraction, raw CSC data (CSC_micro and CSC_macro) are moved to LTS to save storage on Hoffman.

<exp_group> (e.g MovieParadigm, Screening)
  - <patient_id>_<exp_group>
    - Experiment<exp_id> (as patients typically attend different experiments, exp_id is not consistent across patients)
      - channelFileNames.csv
      - CSC_micro (can be moved to LTS after analysis)
        - G[A-D][1-8]-<channel_name>_<suffix1>.mat
        - G[A-D][1-8]-<channel_name>_<suffix2>.mat
        - lfpTimestamps_<suffix1>.mat
        - lfpTimestamps_<suffix2>.mat
      - CSC_macro (can be moved to LTS after analysis)
        - <channel_name>_<suffix1>.mat 
        - <channel_name>_<suffix2>.mat
        - lfpTimestamps_<suffix1>.mat
        - lfpTimestamps_<suffix2>.mat
      - CSC_events
        - Events_<suffix>.mat 

G[A-D][1-8]-<channel_name>.mat (micro), <channel_name>.mat (macro): data will be saved in int16 type with a scaling variable ADBitVolts. The raw signal can be recovered by:

signal = data * ADBitVolts;

Timestamps:

The raw data timestamps are in Unix time (in seconds) in the .ncs files. They are directly read by Nlx_readCSC.m and saved in the file lfpTimeStamps_001.mat for macro and micro channels, respectively.

Note: Unix time must be saved with double type to keep precision. (relative timestamps can be saved in a single type and thus save memory usage).

samplingInterval returned by Nlx_readCSC is in milliseconds. This is changed to a Matlab duration object to make it easier to use and understand. The original double-typed samplingInterval is also saved as samplingIntervalSeconds.

Note: To compute firingRateAroundSpikeTime, the timestamps are converted to milliseconds.

Blackrock does not use Unix time

Blackrock does not save long files in multiple segments. I need to figure out how to stitch timestamps of multiple segments.

One more thing:

For microelectrodes, we may use remove Power Line Interference (removePLI) to increase data quality for spike detection. In this case, signalRemovePLI will be added to the output of unpacked files. This takes extra space and running time. It can be removed by setting arguments in read_CSC.m. See:

https://github.com/NxNiki/nwbPipeline/blob/968f3323bc6627e8d43ce32b3c6375719ce6abaa/src/utils/readCSC.m#L50-L53

Storage usage: twice the raw data (int16) as the added signal is in a single type. Running time: for two hours of recording, each channel takes an additional 5 minutes to run removePLI.

Clone this wiki locally