Skip to content

Tutorial Synthetic Data II

Johannes Niediek edited this page May 18, 2016 · 4 revisions

Tutorial Part II

In this part we will tune some parameters to optimize the automatic sorting result from Part I.

1. What's the Problem?

We saw that Combinato created the following clustering result:

Clustering result of Simulation 5

The problems are (the numbers refer to the red numbers in the plot):

  1. A cluster was wrongly designated an artifact.
  2. Some spikes were not assigned to any cluster.
  3. This is a multi-unit that should be further split apart.
  4. There are some spikes in this unit that should not be part of it.

2. Fixing the Problems by Parameter Tuning

Create a file called local_options.py in the same folder that contains the simulation_5 folder. The content of the file is the following:

options = {'MaxClustersPerTemp': 7,
           'RecursiveDepth': 2,
           'MinInputSizeRecluster': 1000,
           'MaxDistMatchGrouping': 1.6,
           'MarkArtifactClasses': False,
           'RecheckArtifacts': False}

Then re-run the clustering procedure. At this point, you should use a different label. Labels are names under which the clustering results are stored. By using different labels, you can save different clustering results from the same data and compare them later. So just enter

css-simple-clustering --datafile simulation_5/data_simulation_5.h5 --label optimized.

When the process is finished, enter

css-plot-sorted --label sort_pos_optimized.

(The prefix sort_pos_ is automatically prepended to the label).

The sorting results are much better now:

Optimized clustering results from Simulation 5

As you can see, with the optimized options, Combinato generated 10 units. Each unit is displayed as a density plot along with its cumulative spike count (see the red frame for an example). Just next to the density plots, there is a list of all subclusters the unit consists of.

  1. Unit 1 consists of 8 subclusters. Probably the 5th and 7th subclusters should be made a different unit.
  2. Unit 3 consists of 2 subclusters. These are very different and should be split into two units using css-gui.
  3. Unit 7 consists of 2 subclusters. The first of these could be split further apart.

3. Manual optimization

As explained in Part I, use css-gui to further split apart under-clustered units. You can also set all units to Single Unit in css-gui (all units are considered multi-units by default).