-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception during histogram add operation #220
Comments
The line that you point to in your comment refer to this code: public double cdf(double x) {
if (Double.isNaN(x) || Double.isInfinite(x)) {
throw new IllegalArgumentException(String.format("Invalid value: %f", x));
} As you can see, the test is checking to see if the argument you are passing in is a valid number. Thus, it appears that the number you are giving to the code is invalid and that means that the exception is a correct response. But I also think that you are worried about the validity of the of the t-digest itself. That makes me wonder if the line that you pointed to is not the line that where you are getting an exception. Or maybe it means that I am misunderstanding entirely. Can you provide a stack trace of the exception you are seeing? Do you have some code to replicate the error that you can post here? |
Sorry. It was my mistake. Actually the exception is getting thrown from this line.
The above function is getting called from the below function in
I am wondering if values are getting added from an invalid tdigest. Exception is getting thrown when a Double.NaN value is encountered. Here is the full stacktrace
We are currently not sure how a tdigest instance end up in this state when a |
Aha. This sounds much more problematic.
Which version?
…On Sat, Sep 21, 2024, 21:16 sumanshil ***@***.***> wrote:
Sorry. It was my mistake. Actually the exception is getting thrown from
this line
<https://github.com/tdunning/t-digest/blob/main/core/src/main/java/com/tdunning/math/stats/MergingDigest.java#L256>
.
private void add(double x, int w, List<Double> history) {
if (Double.isNaN(x)) {
throw new IllegalArgumentException("Cannot add NaN to t-digest");
}
The above function is getting called from the below function in
AbstractTDigest.java.
@OverRide
public void add(TDigest other) {
for (Centroid centroid : other.centroids()) {
add(centroid.mean(), centroid.count(), centroid);
}
}
I am wondering if values are getting added from an invalid tdigest.
Exception is getting thrown when a Double.NaN value is encountered. Here is
the full stacktrace
java.lang.IllegalArgumentException: Cannot add NaN to t-digest
at com.tdunning.math.stats.MergingDigest.add(MergingDigest.java:199) ~[flink-job.jar:?]
at com.tdunning.math.stats.MergingDigest.add(MergingDigest.java:189) ~[flink-job.jar:?]
at com.tdunning.math.stats.AbstractTDigest.add(AbstractTDigest.java:143) ~[flink-job.jar:?]
at visibility.mabs.src.main.java.com.pinterest.utils.MabsBaseMetric.plus(MabsBaseMetric.java:350) ~[flink-job.jar:?]
at visibility.mabs.src.main.java.com.pinterest.mabs.MabsFlinkJob$1.reduce(MabsFlinkJob.java:92) ~[flink-job.jar:?]
at visibility.mabs.src.main.java.com.pinterest.mabs.MabsFlinkJob$1.reduce(MabsFlinkJob.java:88) ~[flink-job.jar:?]
at
—
Reply to this email directly, view it on GitHub
<#220 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB5E6VM2TH34MTWS5G2VNDZXXAXVAVCNFSM6AAAAABOTBBELWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRVGI4TEOBXGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
We are using version 3.2 |
We are looking for a solution to handle invalid tdigest without throwing an exception as we process a large number of tdigest in a flink job. It causes high cpu usage in the job and results in instability. We can add a check like I mentioned before adding tdigest. But I am wondering if we can avoid that too as it is expensive to check for all tdigest instance. Thanks for your help on this issue. |
The existence of any invalid digest is itself evidence of a bug.
Can you say more about how you construct these digests?
Scanning the digest for NaN or Inf entries in centroids and counts should
eliminate the exceptions, but as you point out, this scanning will be a
pretty expensive way to find a problem that doesn't occur particularly
commonly.
If possible, I would prefer to dig at the root cause and eliminate the
problem itself.
Have you tried upgrading to 3.3?
…On Sat, Sep 21, 2024 at 9:43 PM sumanshil ***@***.***> wrote:
We are looking for a solution to handle invalid tdigest without throwing
an exception as we process a large number of tdigest in a flink job. It
causes high cpu usage in the job and results in instability. We can add a
check like I mentioned before adding tdigest. But I am wondering if we can
avoid that too as it is expensive to check for all tdigest instance. Thanks
for your help on this issue.
—
Reply to this email directly, view it on GitHub
<#220 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB5E6RGCRS3PIKZIBAAIFDZXXD6VAVCNFSM6AAAAABOTBBELWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRVGMYDAMRWG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
We are running metric collector in hosts. Metric collector which is part of application is aggregating tdigests in hosts for 1 minute interval. The aggregated tdigests are enqueued in Kafka. The flink job consumes from kafka queue and aggregates the tdigests further based on cluster to reduce cardinality. At the collector level, we have added a check to prevent
But this check is still not able to prevent the exception in the Flink job. It looks like when we aggregate at the collector in hosts, the aggregation results in |
BTW We are using
|
We are adding histograms in a flink job which is aggregating histograms from a kafka queue. During the aggregation we are seeing an exception which is getting thrown from this function. We see flink job instability when it encounters too many such exceptions. I am thinking to check the
centroid.mean()
value before calling the merge operation.I wanted to check if this would be the right logic or will it result in inaccurate histogram calculation.
The text was updated successfully, but these errors were encountered: