You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HDBSCAN's flat clustering results rely on cluster stability calculation which is prone to output erroneous clustering partitions if there are duplicate data objects, which means that some data objects would have zero core distances => making the mutual reachability distances between such duplicates also zeros.
This current implementation handles this case as follows to avoid the division by zero error:
let info = mst[node - n];
let lambda = if info.2 > A::zero() {
A::one() / info.2
} else {
A::max_value()
};
children = hierarchy[node - num_points]
left = <np.intp_t> children[0]
right = <np.intp_t> children[1]
if children[2] > 0.0:
lambda_value = 1.0 / children[2]
else:
lambda_value = INFTY
When the lambda values are set to infinity, the extraction of flat clusters from the cluster hierarchy are meaningless: the clusters with infinite lambdas are always selected during the cluster stability comparison in the hierarchy.
Basically, this behavior depends on the parameter selection for minPts, as long as any core distance is not zero, flat clustering results should not be affected.
Do you think it would be a good idea to warn the users about this behavior? The original Java implementation of HDBSCAN leaves a warning message, advising that the user should increase his minPts. Somehow the Python version is silent about this (so is this Rust version), which may leave the users believe in the flat clustering results and wrongly confuse them to search for other alternatives.
The text was updated successfully, but these errors were encountered:
HDBSCAN's flat clustering results rely on cluster stability calculation which is prone to output erroneous clustering partitions if there are duplicate data objects, which means that some data objects would have zero core distances => making the mutual reachability distances between such duplicates also zeros.
This current implementation handles this case as follows to avoid the division by zero error:
Python HDBSCAN uses
infinity
instead:When the lambda values are set to infinity, the extraction of flat clusters from the cluster hierarchy are meaningless: the clusters with infinite lambdas are always selected during the cluster stability comparison in the hierarchy.
Basically, this behavior depends on the parameter selection for
minPts
, as long as any core distance is not zero, flat clustering results should not be affected.Do you think it would be a good idea to warn the users about this behavior? The original Java implementation of HDBSCAN leaves a warning message, advising that the user should increase his
minPts
. Somehow the Python version is silent about this (so is this Rust version), which may leave the users believe in the flat clustering results and wrongly confuse them to search for other alternatives.The text was updated successfully, but these errors were encountered: