-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix statistics for coverage #684
Conversation
rio_tiler/utils.py
Outdated
) -> float: | ||
i = numpy.argsort(values) | ||
c = numpy.cumsum(weights[i]) | ||
return values[i[numpy.searchsorted(c, numpy.array(quantiles) * c[-1])]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be removed with numpy 2.0
b1e8d73
to
699a902
Compare
# Avoid non masked nan/inf values | ||
numpy.ma.fix_invalid(data, copy=False) | ||
|
||
for b in range(data.shape[0]): | ||
keys, counts = numpy.unique(data[b].compressed(), return_counts=True) | ||
data_comp = data[b].compressed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data[b].compressed()
was called multiple times
# Population standard deviation of cell values, taking into account coverage fraction. | ||
"std": _weighted_stdev(data_comp, masked_coverage.compressed()), | ||
# Median value of cells, weighted by the percent of each cell that is covered. | ||
"median": _weighted_quantiles(data_comp, masked_coverage.compressed()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std
and median
are now weighted by the coverage array
assert stats[0]["count"] == 1.75 | ||
assert stats[0]["median"] == 3 # 2 in exactextract |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea why median
gives a different results. I've tested a new numpy 2.0
method and it gives 3 while exactextract give 2. I don't want to over engineer the median
calculation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a raster of type T
exactextract is returning type T
for the quantile and median calculations. T
here is int64, so the median of 2.5 is getting truncated to 2. Maybe quantile/median should be returning float64 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oO that makes sense 🙏 thanks for having a look
assert stats[0]["max"] == 9 | ||
# exactextract takes coverage into account, we don't | ||
assert stats[0]["minority"] == 1 # 1 in exactextract | ||
assert stats[0]["majority"] == 1 # 5 in exactextract |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minority
and majority
do not take coverage
into account. We might do this later if needed
Note: I've been working on making exactextract available on pypi so we can integrate into our CI isciences/exactextract#87 |
closes #680
overtake #681
Better take
coverage
into account. This PR tries to matchexactextract
results!cc @j08lue