From b1c5bf446161e6daf3a28705df4e7f16455e48a1 Mon Sep 17 00:00:00 2001
From: GFleishman <greg.nli10me@gmail.com>
Date: Tue, 11 Feb 2025 18:12:47 -0500
Subject: [PATCH 1/6] adding distributed.rst

---
 docs/distributed.rst | 10 ++++++++++
 1 file changed, 10 insertions(+)
 create mode 100644 docs/distributed.rst

diff --git a/docs/distributed.rst b/docs/distributed.rst
new file mode 100644
index 00000000..dccffbb5
--- /dev/null
+++ b/docs/distributed.rst
@@ -0,0 +1,10 @@
+Distributed Cellpose for Larger-Than-Memory Data
+------------------------------------------------
+
+The `cellpose.contrib.distributed_cellpose` module is intended to help run cellpose on datasets
+that are too large to fit in system memory. The dataset is divided into overlapping blocks and
+each block is segmented separately. Results are stitched back together into a seamless segmentation
+of the whole dataset.
+
+Blocks can be run in parallel, in series, or both. Compute resources (GPUs, CPUs, and RAM) can be
+arbitrarily partitioned. 

From 3eb864a44133a6d35878259af809ddf537fbedc0 Mon Sep 17 00:00:00 2001
From: GFleishman <greg.nli10me@gmail.com>
Date: Tue, 11 Feb 2025 18:16:38 -0500
Subject: [PATCH 2/6] updated index

---
 docs/index.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/index.rst b/docs/index.rst
index 9df60a80..9e6af4d6 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -55,6 +55,7 @@ Cellpose: a generalist algorithm for cellular segmentation
    restore
    train
    benchmark
+   distributed
    openvino
    faq
 

From d46a232186d32ff39078083e1ecae472503fe0fc Mon Sep 17 00:00:00 2001
From: GFleishman <greg.nli10me@gmail.com>
Date: Tue, 11 Feb 2025 18:42:16 -0500
Subject: [PATCH 3/6] updated distributed.rst

---
 docs/distributed.rst | 59 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/docs/distributed.rst b/docs/distributed.rst
index dccffbb5..ae9b6163 100644
--- a/docs/distributed.rst
+++ b/docs/distributed.rst
@@ -1,10 +1,61 @@
-Distributed Cellpose for Larger-Than-Memory Data
+Big Data
 ------------------------------------------------
 
-The `cellpose.contrib.distributed_cellpose` module is intended to help run cellpose on datasets
+Distributed Cellpose for larger-than-memory data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``cellpose.contrib.distributed_cellpose`` module is intended to help run cellpose on 3D datasets
 that are too large to fit in system memory. The dataset is divided into overlapping blocks and
 each block is segmented separately. Results are stitched back together into a seamless segmentation
 of the whole dataset.
 
-Blocks can be run in parallel, in series, or both. Compute resources (GPUs, CPUs, and RAM) can be
-arbitrarily partitioned. 
+Built to run on workstations or clusters. Blocks can be run in parallel, in series, or both. 
+Compute resources (GPUs, CPUs, and RAM) can be arbitrarily partitioned for parallel computing.
+Currently workstations (your own machine) and LSF clusters are supported. SLURM clusters are
+an easy addition - if you need this to run on a SLURM cluster `please post a feature request issue
+to the github repository<https://github.com/MouseLand/cellpose/issues>`_ and tag @GFleishman.
+
+The input data format must be a zarr array. Some functions are provided in the module to help
+convert your data to a zarr array, but not all formats or situations are covered. These are
+good opportunities to submit pull requests. Currently, the module must be run via the Python API,
+but making it available in the GUI is another good PR or feature request.
+
+Examples
+~~~~~~~~
+
+Run distributed Cellpose on half the resources of a workstation with 16 cpus, 1 gpu, and 128GB system memory:
+............................
+
+.. code-block:: python
+   from cellpose.contrib.distributed_segmentation import distributed_eval
+
+    # parameterize cellpose however you like
+    model_kwargs = {'gpu':True, 'model_type':'cyto3'}  # can also use 'pretrained_model'
+    eval_kwargs = {'diameter':30,
+                   'z_axis':0,
+                   'channels':[0,0],
+                   'do_3D':True,
+    }
+    
+    # define compute resources for local workstation
+    cluster_kwargs = {
+        'n_workers':1,    # if you only have 1 gpu, then 1 worker is the right choice
+        'ncpus':8,
+        'memory_limit':'64GB',
+        'threads_per_worker':1,
+    }
+    
+    # run segmentation
+    # outputs:
+    #     segments: zarr array containing labels
+    #     boxes: list of bounding boxes around all labels (very useful for navigating big data)
+    segments, boxes = distributed_eval(
+        input_zarr=large_zarr_array,
+        blocksize=(256, 256, 256),
+        write_path='/where/zarr/array/containing/results/will/be/written.zarr',
+        model_kwargs=model_kwargs,
+        eval_kwargs=eval_kwargs,
+        cluster_kwargs=cluster_kwargs,
+    )
+
+

From 501a0865e0fb40e290dbae02dbbe1329f777928b Mon Sep 17 00:00:00 2001
From: GFleishman <greg.nli10me@gmail.com>
Date: Tue, 11 Feb 2025 18:51:33 -0500
Subject: [PATCH 4/6] distributed.rst

---
 docs/distributed.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/distributed.rst b/docs/distributed.rst
index ae9b6163..8b68697f 100644
--- a/docs/distributed.rst
+++ b/docs/distributed.rst
@@ -13,7 +13,7 @@ Built to run on workstations or clusters. Blocks can be run in parallel, in seri
 Compute resources (GPUs, CPUs, and RAM) can be arbitrarily partitioned for parallel computing.
 Currently workstations (your own machine) and LSF clusters are supported. SLURM clusters are
 an easy addition - if you need this to run on a SLURM cluster `please post a feature request issue
-to the github repository<https://github.com/MouseLand/cellpose/issues>`_ and tag @GFleishman.
+to the github repository <https://github.com/MouseLand/cellpose/issues>`_ and tag @GFleishman.
 
 The input data format must be a zarr array. Some functions are provided in the module to help
 convert your data to a zarr array, but not all formats or situations are covered. These are
@@ -23,10 +23,11 @@ but making it available in the GUI is another good PR or feature request.
 Examples
 ~~~~~~~~
 
-Run distributed Cellpose on half the resources of a workstation with 16 cpus, 1 gpu, and 128GB system memory:
+Run distributed Cellpose on half the resources of a workstation that has 16 cpus, 1 gpu, and 128GB system memory:
 ............................
 
 .. code-block:: python
+
    from cellpose.contrib.distributed_segmentation import distributed_eval
 
     # parameterize cellpose however you like

From d093f4266dfe5463f8d15f602f347ae91dd06ddb Mon Sep 17 00:00:00 2001
From: GFleishman <greg.nli10me@gmail.com>
Date: Tue, 11 Feb 2025 19:00:56 -0500
Subject: [PATCH 5/6] distributed.rst

---
 docs/distributed.rst | 73 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 71 insertions(+), 2 deletions(-)

diff --git a/docs/distributed.rst b/docs/distributed.rst
index 8b68697f..5bd5c282 100644
--- a/docs/distributed.rst
+++ b/docs/distributed.rst
@@ -23,8 +23,8 @@ but making it available in the GUI is another good PR or feature request.
 Examples
 ~~~~~~~~
 
-Run distributed Cellpose on half the resources of a workstation that has 16 cpus, 1 gpu, and 128GB system memory:
-............................
+Run distributed Cellpose on half the resources of a workstation that has 16 cpus, 1 gpu,
+and 128GB system memory:
 
 .. code-block:: python
 
@@ -60,3 +60,72 @@ Run distributed Cellpose on half the resources of a workstation that has 16 cpus
     )
 
 
+Test run a single block before distributing the whole dataset (always a good idea):
+
+.. code-block:: python
+
+    from cellpose.contrib.distributed_segmentation import process_block
+
+    # parameterize cellpose however you like
+    model_kwargs = {'gpu':True, 'model_type':'cyto3'}
+    eval_kwargs = {'diameter':30,
+                   'z_axis':0,
+                   'channels':[0,0],
+                   'do_3D':True,
+    }
+    
+    # define a crop as the distributed function would
+    starts = (128, 128, 128)
+    blocksize = (256, 256, 256)
+    overlap = 60
+    crop = tuple(slice(s-overlap, s+b+overlap) for s, b in zip(starts, blocksize))
+    
+    # call the segmentation
+    segments, boxes, box_ids = process_block(
+        block_index=(0, 0, 0),  # when test_mode=True this is just a dummy value
+        crop=crop,
+        input_zarr=my_zarr_array,
+        model_kwargs=model_kwargs,
+        eval_kwargs=eval_kwargs,
+        blocksize=blocksize,
+        overlap=overlap,
+        output_zarr=None,
+        test_mode=True,
+    )
+
+
+Run distributed Cellpose on an LSF cluster with 128 GPUs (e.g. Janelia cluster):
+
+.. code-block:: python
+
+    from cellpose.contrib.distributed_segmentation import distributed_eval
+    
+    # parameterize cellpose however you like
+    model_kwargs = {'gpu':True, 'model_type':'cyto3'}
+    eval_kwargs = {'diameter':30,
+                   'z_axis':0,
+                   'channels':[0,0],
+                   'do_3D':True,
+    }
+    
+    # define LSFCluster parameters
+    cluster_kwargs = {
+        'ncpus':2,                # cpus per worker
+        'min_workers':8,          # cluster adapts number of workers based on number of blocks
+        'max_workers':128,
+        'queue':'gpu_l4',         # flags required to specify a gpu job may differ between clusters
+        'job_extra_directives':['-gpu "num=1"'],
+    }
+    
+    # run segmentation
+    # outputs:
+    #     segments: zarr array containing labels
+    #     boxes: list of bounding boxes around all labels (very useful for navigating big data)
+    segments, boxes = distributed_eval(
+        input_zarr=large_zarr_array,
+        blocksize=(256, 256, 256),
+        write_path='/where/zarr/array/containing/results/will/be/written.zarr',
+        model_kwargs=model_kwargs,
+        eval_kwargs=eval_kwargs,
+        cluster_kwargs=cluster_kwargs,
+    )

From ded9f7e83ffb01b72f6349bc5b5fa9659cda3420 Mon Sep 17 00:00:00 2001
From: GFleishman <greg.nli10me@gmail.com>
Date: Tue, 11 Feb 2025 19:10:56 -0500
Subject: [PATCH 6/6] distributed.rst

---
 docs/distributed.rst | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/docs/distributed.rst b/docs/distributed.rst
index 5bd5c282..452c67eb 100644
--- a/docs/distributed.rst
+++ b/docs/distributed.rst
@@ -11,7 +11,7 @@ of the whole dataset.
 
 Built to run on workstations or clusters. Blocks can be run in parallel, in series, or both. 
 Compute resources (GPUs, CPUs, and RAM) can be arbitrarily partitioned for parallel computing.
-Currently workstations (your own machine) and LSF clusters are supported. SLURM clusters are
+Currently workstations  and LSF clusters are supported. SLURM clusters are
 an easy addition - if you need this to run on a SLURM cluster `please post a feature request issue
 to the github repository <https://github.com/MouseLand/cellpose/issues>`_ and tag @GFleishman.
 
@@ -20,6 +20,14 @@ convert your data to a zarr array, but not all formats or situations are covered
 good opportunities to submit pull requests. Currently, the module must be run via the Python API,
 but making it available in the GUI is another good PR or feature request.
 
+All user facing functions in the module have verbose docstrings that explain inputs and outputs.
+You can access these docstrings like this:
+
+.. code-block:: python
+
+    from cellpose.contrib.distributed_segmentation import distributed_eval
+    distributed_eval?
+
 Examples
 ~~~~~~~~
 
@@ -94,6 +102,31 @@ Test run a single block before distributing the whole dataset (always a good ide
     )
 
 
+Convert a single large (but still smaller than system memory) tiff image to a zarr array:
+
+.. code-block:: python
+
+    # Note full image will be loaded in system memory
+    import tifffile
+    from cellpose.contrib.distributed_segmentation import numpy_array_to_zarr
+
+    data_numpy = tifffile.imread('/path/to/image.tiff')
+    data_zarr = numpy_array_to_zarr('/path/to/output.zarr', data_numpy, chunks=(256, 256, 256))
+    del data_numpy  # assumption is data is large, don't keep in memory copy around
+
+
+Wrap a folder of tiff images/tiles into a single zarr array without duplicating any data:
+
+.. code-block:: python
+
+    # Note tiff filenames must indicate the position of each file in the overall tile grid
+    from cellpose.contrib.distributed_segmentation import wrap_folder_of_tiffs
+    reconstructed_virtual_zarr_array = wrap_folder_of_tiffs(
+        filname_pattern='/path/to/folder/of/*.tiff',
+        block_index_pattern=r'_(Z)(\d+)(Y)(\d+)(X)(\d+)',
+    )
+
+
 Run distributed Cellpose on an LSF cluster with 128 GPUs (e.g. Janelia cluster):
 
 .. code-block:: python
@@ -129,3 +162,4 @@ Run distributed Cellpose on an LSF cluster with 128 GPUs (e.g. Janelia cluster):
         eval_kwargs=eval_kwargs,
         cluster_kwargs=cluster_kwargs,
     )
+