Skip to content

Create Segmentation

Stuart Berg edited this page Sep 6, 2016 · 2 revisions

The algorithm space for image segmentation is vast, and so is the size of the datasets. Because of this, the CreateSegmentation workflow implements a simple solution that simply divides the volume into overlapping regions, performs segmentation on each one (in parallel), and stitches the results into a single segmentation with globally consistent labels.

Architectural Decisions

It is important to provide reasonable context for a segmentation algorithm to perform well. The package currently defaults to 512x512x512 regions as a size that will fit in a single-machine's memory and also provide reasonable context. It is also important to provide some overlap to enable stitching between the subvolumes.

The current workflow is rather restricted to this batch and stitch technique. However, implementing this simple infrastructure in Spark allows for potential future enhancements in solving large global constraints or performing operations that require fast, distributed in-memory operations. The goal of this workflow long-term is to become more modular/componentized and to allow for more sophisticated workflows going forward. It is also should make large-volume segmentation and exploration accessible to a larger audience.

Generic segmentation config options

Besides the Segmentor customization options explained below, the CreateSegmentation workflow supports several options that can be applied regardless of plugin behavior. See the explanations in the CreateSegmentation schema for details.

Iterations

Instead of scheduling all the work in one big job, you may request that the segmentation workload (the set of blocks to process) be split up into multiple batches, a.k.a. "iterations". If you set the "iteration-size" config parameter, then each batch of blocks will be compeltely processed (grayscale -> voxel prediction -> supervoxels -> agglomeration), before the next batch starts. This is particularly useful when used in combination with checkpoints.

Checkpoints

The CreateSegmentation workflow has a concept of "checkpoints", in which the block-wise segmentation (before stitching) is serialized to disk. To enable saving of the checkpoint, provide a "checkpoint-dir" in your config file. To enable using a previously saved checkpoint, provide both a "checkpoint-dir" and specify "checkpoint": "segmentation" in your config.

In a future version of DVIDSparkServices, checkpointing after the voxel predictions step will also be supported.

Note: The checkpoint is updated at the end of each iteration.

Defining a Custom Segmentation Plugin

The CreateSegmentation workflow implements the basic outline of the computation, but delegates the actual segmentation and stitching work to a different class, the Segmentor.

The basic outline of CreateSegmentation is as follows:

  1. Divide the ROI into a list of (overlapping) blocks (e.g. 512^3 each). Store this as a list of lightweight Subvolume objects, which specify the block geometries.
  2. Split the list of Subvolumes into groups ("iterations"), to be processed in sequence.
  3. For each iteration group:
  4. Fetch the grayscale data for each block from DVID
  5. Call Segmentor.segment() with the list of Subvolumes and the corresponding list of grayscale data.
  6. Retain the results (the segmentation blocks), as well as the list of max_ids for each block.
  7. Once all iterations have completed, call Segmentor.stitch()
  8. Write the stitched results to DVID

In the Segmentor.segment() method, the workload is divided into the following steps:

  1. Detect large 'background' regions that lie outside the area of interest for segmentation.
  2. Predict voxel classes for every grayscale pixel (return an N-channel volume, float32).
  3. Create a label volume of supervoxels (uint32).
  4. Aggregate supervoxels into final segments (uint32).

Each of the above steps can be customized with an arbitrary Python function. The function to use for each step is specified in your config file for the CreateSegmenation workflow. For example:

{
    "dvid-info": {
        "dvid-server": "emdata1:7000",
        "uuid": "deadbeef",
        "segmentation-name": "my_segmentation",
        "roi": "seven_column_roi",
        "grayscale": "grayscale"
    },
    "options": {
        "segmentor": {
            "class" : "DVIDSparkServices.reconutils.Segmentor.Segmentor",
            "configuration": {
                "background-mask" : {
                    "function": "mymodule.my_special_function_for_finding_non_neuropil"
                },
                "predict-voxels" : {
                    "function": "mymodule.awesome_voxel_prediction_function"
                },
                "create-supervoxels" : {
                    "function": "mymodule.awesome_watershed"
                },
                "agglomerate-supervoxels" : {
                    "function": "mymodule.agglomerate_with_gusto"
                }
            }
        },
        "stitch-algorithm" : "medium"
    }
}

Using the above config, your custom segmentation functions will be called, assuming mymodule is available somewhere on Python's sys.path (e.g. via PYTHONPATH).

The expected signature for each of the above functions is as follows:

# gray (3D zyx) -> mask (3D zyx)
background_mask(gray, **parameters)

# gray (3D zyx), mask (3D zyx) -> predictions (4D zyxc)
predict_voxels(gray, mask, **parameters)

# predictions (4D zyxc), mask (3D zyx) -> supervoxels (3D zyx)
create_supervoxels(prediction, mask, **parameters)

# gray (3D zyx), predictions (4D zyxc), supervoxels (3D zyx) -> segmentation (3D zyx)
agglomeration_supervoxels(gray, predictions, supervoxels, **parameters)

Parameters are optional. If you provide them in your config as shown below, they are passed to your function as keyword arguments (as shown above):

...
    "predict-voxels" : {
      "function": "mymodule.awesome_voxel_prediction_function",
      "parameters": {
        "classifier_file": "/path/to/myclassifier.h5",
        "beta": 0.5
      }
    },
...

If the above pipeline (mask, predictions, supervoxels, agglomeration) is not suitable at all for your needs, the current architecture also allows users to define their own plugins from scratch -- as long as they from the Segmentor class in DVIDSparkServices.reconutils.Segmentor. In your config, replace the "segmentor:class" with your own Segmentor subclass. A simple placeholder example is located in DVIDSparkServices/reconutils/plugins called DefaultGrayOnly.

Simply define a class with the same name as the file and add to the plugin directory. That plugin can be specified, along with any custom options, in the json config file for CreateSegmentation. The main function that needs customization is segment():

def segment(self, gray_chunks):
       """Top-level pipeline (can overwrite) -- gray RDD => label RDD.                              
                                                                                                     
        Defines a segmentation workflow consisting of voxel prediction,                              
        watershed, and agglomeration.  One can overwrite specific functions                          
        or the entire workflow as long as RDD input and output constraints                           
        are statisfied.  RDD transforms should preserve the partitioner --                           
        subvolume id is the key.                                                                     
                                                                                                     
        Args:                                                                                        
            gray_chunks (RDD) = (subvolume key, (subvolume, numpy grayscale))                        
        Returns:                                                                                     
            segmentation (RDD) as (subvolume key, (subvolume, numpy compressed array))               
                                                                                                     
        """                                                           

Built-in plugins

Several plugins are already implemented in the DVIDSparkServices repo itself. See DVIDSparkServices/reconutils/plugins/README.md for details.

Example config files

FIB-25

FIB-25 was originally segmented using a different system (not DVIDSparkServices). An example config file for segmenting FIB-25 can be found in the following directory: /groups/flyem/data/scratchspace/classifiers/fib25-july/fib25-config.json

FIB-19

FIB-19 was segmented using this config: /groups/flyem/data/scratchspace/classifiers/fib19_experimental/two-stage-ilp/production-run-config.json

CX/PB

The current version of the CX segmentation used the following config: /groups/flyem/data/scratchspace/classifiers/pb-june2016/pb-config-simple-predict.json