-
Notifications
You must be signed in to change notification settings - Fork 17
Details
The spike sorting part of this repository borrows many ideas from WaveClus (Journal Article, Website) by Rodrigo Quian Quiroga.
Combinato Spike Sorting consists of the following main executables
Executable | Used for |
---|---|
css-extract |
automatic spike extraction |
css-prepare-sorting , css-cluster , css-combine
|
automatic spike sorting (suited for very long recordings) |
css-gui |
visualization and manual improvement of sorting |
They come along with a few other convenient tools
Executable | Used for |
---|---|
css-find-concurrent , css-mask-artifacts
|
pre-sorting artifact detection |
css-plot-rawsignal |
generating raw data quality plots |
css-plot-extracted |
plotting overview figures of extracted spikes and artifacts |
css-plot-sorted |
plotting overview figures of sorted spikes and artifacts |
css-overview-gui |
displaying various overview plots |
If you expect all channels to be usable, simply extract spikes from all channels (see below).
If instead you think a number of channels should be excluded from spike extraction, do a raw data overview plot first by calling
css-plot-rawsignal
When the plots are ready, use
css-overview-gui
to view them and to deselect broken/noisy channels from spike extraction.
Important to know: Spike extraction in Combinato is currently implemented for Neuralynx .ncs and Matlab .mat files only. In the following, the workflow for .ncs files is described.
Use
css-extract
to extract spikes from channels, either from all channels (css-extract --files *.ncs
) or from the ones selected in css-overview-gui
before (css-extract --jobs do_extract.txt
).
css-extract --help
displays more information.
css-extract
creates a folder for each file from which it extracts spikes. Inside each folder, a file data_xyz.h5
is created, where xyz
is the channel name. These files contain the extracted, re-aligned, spikes. Of course, you can also create these files by yourself and then continue with spike sorting.
The file format of the data_xyz.h5
files is the versatile HDF5 format. Combinato accesses these files through the PyTables library. The main advantages of the HDF5 format are that it allows access to parts of arrays (closed source technologies such as Matlab only have rudimentary support for partial array manipulation). Furthermore, the HDF5 format is open source, cross-platform and optimized for large datafiles.
The data is stored in the data_xyz.h5
files in the following way
Node | Type | Meaning |
---|---|---|
pos | Group | For spikes, times with “positive” amplitude |
neg | Group | For spikes, times with “negative” amplitude |
pos.spikes | (n_spikes, n_samples) array | Spikes with positive amplitude |
pos.times | (n_spikes) array | Times of spikes with positive amplitude |
pos.artifacts | (n_spikes, 2) array | Artifact types per spike |
neg.spikes | same for negative | ... |
neg.times | ... | ... |
neg.artifacts | ... | ... |
thr | (n_windows, 3) array | extraction thresholds with their start and end times |
The following rules apply
- Spikes are in microvolts, one row per spike, one column per sample
- Times are in milliseconds
Value | Artifact type | Default criterion |
---|---|---|
0 | No artifact | |
1 | Firing rate too high | > 200 Hz |
2 | Amplitude too high | > 1 millivolt |
4 | Too many active channels | > 50 % of all channels |
It is debatable whether it is a good idea to store positive and negative spikes in one file. Of course, for spike sorting and other analyses it suffices to have only one of them. The main advantage of this structure is that for bipoloar recordings, a spike extraction procedure should filter the data only once and then extract both positive and negative spikes at the same time.
css-extract
uses 8 cores by default. Change the variable nWorkers
if you run into memory issues. If the extraction takes a long time, check the data transfer rate of your network, which is the most likely bottleneck especially on 100 Mbit/s networks.
For long-term recordings, it's often important to remove extracted events that are not neural in origin, such as movement artifacts or periods of excessive 50/60 Hz noise. To identify these artifacts, run
css-find-concurrent
which creates a file concurrent_times.h5 containing information on events that occur on several channels concurrently, and then
css-mask-artifacts
and use css-mask-artifacts --help
to read more about command line options.
Use css-plot-extracted
to create plots of the spikes after artifact removal. These plots display the different artifact types, and also visualize cumulative spike counts over time. css-overview-gui
displays these plots in a convenient way.
Combinato Spike Sorting works with job files, but using them is not mandatory.
If you simply want to sort all extracted spikes, go to a folder containing a data_xyz.h5
file and create a job file by calling
css-prepare-sorting [--neg] --datafile data_xyz.h5
and then
css-cluster --jobs sort_pos_abc.txt
or
css-cluster --jobs sort_neg_abc.txt
where pos
or neg
is the amplitude polarity of the spikes, and abc
are the first three letters of your username (introduced automatically so that several users can smoothly analyze the same datasets at the same time without interference).
Note that sort_pos_abc.txt
is just a text file in the current folder, created by css-prepare-sorting
. You can read, edit, and rename it as you prefer.
Warning: If you run css-prepare-sorting
more than once, it appends new jobs to sort_pos_abc.txt
. That means that you have to be careful not to end up with a sort_pos_abc.txt
containing several identical jobs, which would mess up the clustering process. Therefore, css-cluster
and css-combine
raise an error when duplicate jobs are requested.
If you want to exclude channels from sorting (e.g. because they contain only artifacts), call css-plot-extracted
and then css-overview-gui
to exclude those channels that you do not want to sort.
Some paramters in options.py
control details of the sorting process. The most relevant ones are listed here.
Name in options.py
|
Meaning | Default | When to change |
---|---|---|---|
MinSpikesPerClusterMultiSelect |
Minimal size of a cluster | 15 | Raise when you see too many small clusters |
MaxClustersPerTemp |
Maximal number of clusters at one temperature | 5 | Raise (up to about 7-10) when you see under-clustering |
RecursiveDepth |
Clustering iterations | 1 | Set to 2 or 3 when you see under-clustering, especially under-clustered large clusters |
ReclusterClusters |
Should large clusters be re-clustered? | True | Only set to False when you see substantial over-clustering |
MinInputSizeRecluster |
Minimal size of a cluster to be re-clustered | 2000 | Lower when you see under-clustering of medium size clusters |
FirstMatchFactor |
Acceptance radius for first template matching step | .75 | Set to higher values when you need more template-matching |
SecondMatchFactor |
Acceptance radius for second template matching step | 3 | Set to lower values when you see overly liberal template matching |
MaxDistMatchGrouping |
Stop threshold for automatic cluster grouping | 1.8 | Most important parameter. Set to lower values (down to about 1.5) when you see too much grouping |
negative spikes
use
css-overview-gui
also to mark channels where you would like to sort "negative" spikes.
You will see that css-prepare-sorting
creates a folder for each block of spikes that you want to sort. By default, each folder contains the sorting result for 20000 (non-artifact) spikes. The number 20000 seems to be a good trade-off between speed (sorting scales super-linearly with the number of spikes) and the ability to detect small clusters (typically, you need at least 15-25 spikes for a cluster to be found). css-cluster
runs separate processes in parallel for each of these folders, which in turn enables efficient use of multi-core machines.
Of course, css-overview-gui
creates only one folder if the total number of spikes is smaller than the block size.
After sorting, use css-combine
to concatenate the data contained in the various folders. If you have a job file already, simply call
css-combine --jobs sort_pos_abc.txt
where sort_pos_abc.txt
is the job file you used for clustering already.
If you do not use job files, you have to specify the options --datafile
, --sessions
, and --label
.
Even if all sorted spikes are contained in one folder, you have to call css-combine
to identify artifact clusters and to produce plots of the individual clusters.
Each concatenated session consists of one folder sort_SIGN_USER
, where SIGN
indicates the spike polarity, and USER
is a shorthand for the user name. Combinato stores plots of individual clusters and general sorting information in this folder. The structure of the file sort_cat.h5
is as follows:
Node | Array size | Meaning |
---|---|---|
index | (n_spikes) | Index to the data_xyz.h5 file |
classes | (n_spikes) | Cluster number for each spike |
matches | (n_spikes) | Template matching status for each spike |
distance | (n_spikes) | Distance at which template matching occured |
artifacts | (n_clusters, 2) | Artifact status for each cluster |
groups | (n_clusters, 2) | Group membership for each cluster |
types | (n_groups, 2) | Group type for each group |
- the classes, matches, and distance nodes contain information about spikes indexed by the index node. The reason for this additional indexing layer is that in this way, spikes from the
data_xyz.h5
file can be ignored without the need to delete them, by simply skipping them in the index node. - distance contains the distance at which the second template matching step was performed for each spike.
- artifacts contains, for each cluster, whether it is an artifact cluster (1) or not (0).
- groups contains the group to which a cluster belongs.
Value | Spike assigned to cluster during |
---|---|
0 | SPC clustering |
1 | First template matching step |
2 | Second template matching step |
Value | Group contains |
---|---|
-1 | Artifacts |
0 | Unassigned spikes |
1 | A multi-unit |
2 | A single-unit |
One of the main advantages of Combinato is that it has a relatively efficient GUI that enables you to modify sorting results, and displays a variety of visual cluster quality checks. These plots are updated on the fly, so that you can instantly see how a change in clustering influences various properties of the clustering.
Start the GUI by calling css-gui
. Make sure to start the GUI in a folder containing the folders that contain the data_xyz.h5
files. For example, if you have a file /home/myname/experiment1/channel5/data_channel5.h5
, first move to experiment1
, then start css-gui
.
Use one of the following commands to load a sorting session. You can either open a sorting session with File-Open or you can use job files.
Here, a job file is simply a list of paths to data_xyz.h5
files, saved as do_manual_pos.txt
or do_manual_neg.txt
, depending on the polarity of spikes. If you use job files, you can move from one sorting session to the next by key press, which can speed up the sorting review process considerably if you have many recording channels.
Menu | Command | Shortcut | Description |
---|---|---|---|
File | Open | Ctrl+O | Load a sorted session for optimization |
File | Open Jobs | Load a list of sorting jobs | |
Actions | Next Job | Ctrl+J | Load next job from joblist |
Actions | Goto Job | Load a specific job from joblist | |
File | Save | Ctrl+S | Save changes to file |
Once a sorting session is loaded, the various groups of clusters appear as a pull-down menu. If you select a group in the pull-down menu, information about it is displayed in the different tabs.
This tab collects informative plots about one group of clusters. Some of these are inspired by this article by D. Hill, S. Mehta, and D. Kleinfeld.
Press Enter to additionally highlight information about a single cluster within a group. The information in this tab helps decide if a group (or a cluster in a group) is an artifact, or a multi or single-unit. Use the information in the following way.
Subplot | How to use |
---|---|
(log) Density | If the density is inhomogeneous across sampling points, it is more likely that the group is a multi-unit. |
Mean spikes | If some of the mean spikes differ a lot in shape, consider splitting the group into two or more groups. |
Inter-spike intervals | If you see a clear comb-like pattern with peaks at 20, 40, 60 ms, the unit is an artifact or contains clusters that are artifacts. If the percentage of lags < 3 ms is high (> 5 %), it is unlikely that the group is a single-unit. |
cumulative spike count | If the cumulative spike count contains clear |
This tab displays all groups, by plotting the mean spikes of all clusters in a group to represent that group. The main function of this tab
Menu | Command | Shortcut | Description |
---|---|---|---|
Actions | Merge all | Ctrl+M | Merge all clusters into group 1 |
Actions | New Group | Ctrl+N | Create a new group, to which clusters can be added |
Enter | Highlight information about selected cluster | ||
N | Move to next group in group list | ||
A | Move selected cluster to artifacts group | ||
G | Move selected cluster to closest group |
Let's start with a file CSC1.ncs
, placed in a folder testdata
. Open a shell and go to the folder testdata
. All following calls assume that testdata
is your current working directory.
- optional If you are using Linux, you can plot the continuous signal. Call
css-plot-rawsignal
.
Look at the figure in testdata/overview/
.
On Windows, plotting the raw signal is currently not implemented.
- Extract spikes by calling
css-extract --files CSC1.ncs
.
The process takes some time.
Check that there now is a folder testdata/CSC1
containing the file data_CSC1.h5
.
- optional, but recommended Mask artifact spikes by calling
css-mask-artifacts --no-concurrent
.
The option --no-concurrent
indicates that artifact masking should not try to use information from several channels, because you extracted spikes from one channel only. If you have multiple channels, you can call css-concurrent
first, and then css-mask-artifacts
with no option.
- optional To plot the extracted spikes, call
css-plot-extracted
.
The figure is placed in testdata/overview
, its name begins with spikes_
. The different artifact types are marked in the figure.
- Now you have to tell Combinato which spikes to sort. Call
css-prepare-sorting --data CSC1/data_CSC1.h5
.
This step might seem unnecessarily complicated, but it is extremely handy when you would like to sort only spikes from certain time periods, e.g. for sleep recordings or multi-part experiments.
There is now a file sort_pos_abc.txt
, where abc
is a shorthand for your username. In the following, use the real shorthand instead of the abc
appearing in this manual. This file lists all sessions that you can now begin to sort.
There also is a folder CSC1/sort_pos_abc_000...
, where temporary files from the spike sorting process will reside.
- Ready to sort the spikes. Call
css-cluster --jobs sort_pos_abc.txt
.
This starts the clustering process.
- Combine the many clusters into cluster groups by calling
css-combine --jobs sort_pos_abc.txt
.
This step may seem overly complicated again, but it is useful to separate clustering and cluster combinining, especially when you would like to optimize the automatic combination process by parameter tuning, without repeating the whole clustering procedure all over.
There is now a folder testdata/CSC1/sort_pos_abc
, which contains plots of all clusters.
-
optional Plot the sorting result by calling
css-plot-sorted --label sort_pos_abc
The label
parameter comes in because Combinato allows to store several alternative “sortings”, which can then be plotted independently.
The resulting plot is stored in testdata/overview
again.
- optional Use the GUIs to optimize results
Call
css-overview-gui
,
enter the sorting label sort_pos_abc
and initialize the folder (from the menu or by pressing Ctrl+I
).
You will see a representation of the sorting result. If you would like to change something, call
css-gui
,
load the channel and change the clustering as you like.
I'm always happy to see pull requests, feature suggestions and bug reports. Please see the GitHub site for more information.