Modifying T-digest that handle deletion #217

petrpan26 · 2024-02-08T03:29:04Z

Is there anyway to change the algorithm or approach to handle deletion of data point. I was reading the algorithm but I haven't yet got a sense if introducing deletion will affect the optimization of the algorithm

tdunning · 2024-02-10T01:32:49Z

Hey there ... sorry to be slow responding. There are two issues with deletions: a) there is no known way to ensure that the digest invariant is preserved (as a key example, if you deleted a bunch of data from the left half of a normal distribution, you would be left with a really big centroid on the left edge). You can try to guess how to split centroids, but the point of a centroid is that it loses information and the point of the digest invariant is that this loss is non-critical for estimating tails. If you delete, you may have to split and splitting accurately is impossible without that lost information. It is conceivable that you could keep a second digest containing the distribution of deleted points, but I don't think that preserves the accuracy that you want. b) in practice, people keep digests of relatively short time periods (typically 5 minute intervals) and then combine these short intervals using a merge when necessary rather than trying to keep long intervals and subtract. This makes the desire to delete much less. So given that there is no known way to do it accurately and people don't seem to need it, we haven't ever tried to do this.

…

On Wed, Feb 7, 2024 at 8:29 PM Hoang Phan ***@***.***> wrote: Is there anyway to change the algorithm or approach to handle deletion of data point. I was reading the algorithm but I haven't yet got a sense if introducing deletion will affect the optimization of the algorithm — Reply to this email directly, view it on GitHub <#217>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAB5E6RHGV4A5SP5YFLWNN3YSRBAXAVCNFSM6AAAAABC65NI4SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZDIMRZGEYTANA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modifying T-digest that handle deletion #217

Modifying T-digest that handle deletion #217

petrpan26 commented Feb 8, 2024

tdunning commented Feb 10, 2024 via email

Modifying T-digest that handle deletion #217

Modifying T-digest that handle deletion #217

Comments

petrpan26 commented Feb 8, 2024

tdunning commented Feb 10, 2024 via email