giotto-ph version 0.1.0
Major Features and Improvements
Introduction
We introduce giotto-ph
, a high-performance, open-source software package for the computation of Vietoris--Rips barcodes. giotto-ph
is based on Morozov and Nigmetov's lockfree (multicore) implementation of Ulrich Bauer's Ripser package. It also contains a re-implementation of Boissonnat and Pritam's "Edge Collapser", implemented so far only in the GUDHI library. Our contribution is twofold: on the one hand, we integrate existing state-of-the-art ideas coherently in a single library and provide Python bindings to the C++ code. On the other hand, we increase parallelization opportunities and improve overall performance by adopting higher performance data structures. The final implementation of our persistent homology backend establishes a new state of the art, surpassing even GPU-accelerated implementations such as Ripser++ when using as few as 5-10 CPU cores. Furthermore, our implementation of the edge collapser algorithm has reduced dependencies and significantly improved run-times.
Python API
There is one unique main function that can be called: ripser_parallel
To call the main function in python, simply do:
from gph.python import ripser_parallel
import numpy as np
# generate your data
data = np.random.rand(100, 3)
# compute the persistence diagram
dgm = ripser_parallel(data)
Here a description of the different parameters of this function:
Compute persistence diagrams for X data array using a high performance, parallel version of Ripser.
If X is a point cloud, it will be converted to a distance matrix using the chosen metric.
-
Xndarray
of shape(n_samples, n_features)
A numpy array of either data or distance matrix. Can also be a sparse distance matrix of type scipy.sparse -
maxdim
int, optional, default:1
Maximum homology dimension computed. Will compute all dimensions lower than or equal to this value. For 1, both H_0 and H_1 will be computed. -
thresh
float, optional, default:numpy.inf
Maximum distances considered when constructing filtration. If numpy.inf, compute the entire filtration. -
coeff
int prime, optional, default:2
Compute homology with coefficients in the prime field Z/pZ forp=coeff
. -
metric
string or callable, optional, default:'euclidean'
The metric to use when calculating distance between instances in a feature array. If set to 'precomputed', input data is interpreted as a distance matrix or of adjacency matrices of a weighted undirected graph. If a string, it must be one of the options allowed by scipy.spatial.distance.pdist() for its metric parameter, or a or a metric listed in sklearn.pairwise.PAIRWISE_DISTANCE_FUNCTIONS, including 'euclidean', 'manhattan' or 'cosine'. If a callable, it should take pairs of vectors (1D arrays) as input and, for each two vectors in a pair, it should return a scalar indicating the distance/dissimilarity between them. -
metric_params
dict, optional, default: {}
Additional parameters to be passed to the distance function. -
weights
"DTM", ndarray or None, optional, default: None
If notNone
, the persistence of a weighted Vietoris-Rips filtration is computed as described in 3, and this parameter determines the vertex weights in the modified adjacency matrix. "DTM" denotes the empirical distance-to-measure function. -
weight_params
dict or None, optional, default: None
Parameters to be used in the case of weighted filtrations, see weights. In this case, the key "p" determines the power to be used in computing edge weights from vertex weights. It can be one of 1, 2 or np.inf and defaults to 1. If weights is "DTM", the additional keys "r" (default: 2) and "n_neighbors" (default: 3) are available (see weights, where the latter corresponds to n). -
collapse_edges
bool, optional, default: False
Whether to use the edge collapse algorithm as described in 2 prior to calling ripser_parallel. -
n_threads
int, optional, default:1
Maximum number of threads available to use during persistent homology computation. When passing-1
, it will try to use the maximal number of threads available on the host machine.
For more details about the performance and benchmarks, you can have a look here
Bug Fixes
None.
Backwards-Incompatible Changes
None.
Thanks to our Contributors
This release contains contributions from many people:
Julián Burella Pérez, Sydney Hauke, Umberto Lupo, Matteo Caorsi.
We are also grateful to all of those who filed issues or helped resolve them, asked and
answered questions, and were part of inspiring discussions.