Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research - An Evaluation of Change Point Detection Algorithms @bekahma #168

Open
Tracked by #103
RobotPsychologist opened this issue Nov 20, 2024 · 2 comments
Open
Tracked by #103
Labels
research Background research and research publication team.

Comments

@RobotPsychologist
Copy link
Owner

No description provided.

@RobotPsychologist
Copy link
Owner Author

@bekahma I'll assign this to you, feel free to put your comments/info in a comment here.

@RobotPsychologist RobotPsychologist changed the title - An Evaluation of Change Point Detection Algorithms @bekahma Research - An Evaluation of Change Point Detection Algorithms @bekahma Nov 22, 2024
@bekahma
Copy link
Contributor

bekahma commented Nov 27, 2024

An Evaluation of Change Point Detection Algorithms

Summary:

  • Change point indicates abrupt/significant change in the data generation process
  • Goal is to evaluate existing algorithms on real-world data
  • Paper conducts a benchmark study to evaluate 14 algorithms
  • Two types of metrics: 1) clustering, which divides time series into distinct regions with constant data generating process, and 2) classification (change point vs. non-change point

Change Point Detection Algorithms (14):
Table 1 in the paper has hyperlinks to explain each algorithm in more detail & what each abbreviation stands for.
image

Name Description Applicable
AMOC Assumes there is at most one change point in the data and focuses on detecting a single change. Compares likelihood of data being split into two segments vs. remaining unchanged N, more than one change point
BINSEG Recursively split data into segments by detecting the most significant change within the entire dataset. Efficient when detecting multiple change points but can struggle in noisy environments when changes are subtle. Maybe?
BOCPD Probabilistic method that models the likelihood of a change point occurring at each time step using Bayesian inference. Suited for online/real-time detection, and can be computationally expensive for large datasets. Y
BOCPDMS Extends BOCPD. Incorporating multiple timescales, detect short-term and long-term changes. Handle datasets with varying levels of granularity in change patterns. Y, but prioritize BOCPD first
CPNP Makes minimal assumptions about the data’s underlying distribution. Can detect a variety of change types such as changes in variance or higher order moments, but requires more data to achieve reliable results. N
ECP Comparing similarity between segments. Non-parametric and can detect both univariate and multivariate changes without strong distributional assumptions. Y
KCPA Kernel methods to map data into high dimensional feature space, where change points are detected based on changes in the distribution. Detects subtle changes in non-linear relationships. N, not high-dimensional
PELT Optimization based method that uses DP (dynamic programming) to minimize a penalized cost function, allowing detection for multiple change points. Performs best with clear, distinct changes. Y
PROPHET Forecasting tool, analyzes residuals from trend and seasonality components. Detects changes in time-series structure such as shifts in trend. N, interesting for predictions but not detecting change points
RBOCPDMS Enhances BOCDPMS. Additional robustness to outliers and noise, making it well suited for real-world data. Combines Bayesian inference with multiple timescales to provide reliable, multi-granularity change detection. Maybe? challenging implementation
RFPOP Improves PELT. Adds robustness to outliers and non-standard data distributions. Minimizes penalized cost function to enhance segmentation accuracy in noisy datasets. Maybe? prioritize PELT
SEGNEIGH Uses DP to optimize segmentation of data into predefined neighbourhoods. Well suited for detecting multiple change points but can be computationally expensive for large datasets. N
WBS Randomized selection of subintervals to identify changes and increase sensitivity to subtle changes. May require careful parameter tuning for optimized performance. Maybe?
ZERO N/A, this is their baseline N/A

Code and data used in benchmark study

In general, BOCPD and WBS for online detection of meal inducted spikes/insulin induced drops. PELT and ECP to analyze historical data, identifying long term patterns or events. If the data has complex distributional changes or high variability, use ECP or WBS.

With a quick Google search on how to implement the algorithms, it should be pretty straight forward if we want to use existing libraries for the algorithms/don't need to make additional modifications for these. There seems to be many types of data the algorithms can handle (i.e. multivariate), and I made the assumption that our BGL data is fairly simple in comparison.

Here are some existing libraries/implementations for the algorithms I highlighted above:

Name Implementation Libraries
BOCPD Define prior distribution for change points and use a predictive distribution to update beliefs as new data arrives. Compute run-length probabilities using recursive updates. Python bocpd
ECP Compute energy statistics between segments and optimize segmentation to maximize between-segment differences. Python ecp
PELT Define cost function (variance of MSE) to quantify the "fit" of a segment. Use penalty term to balance model complexity and apply DP to find optimal segmentation. Python ruptures
WBS Randomly select subintervals of the data and apply a segmentation algorithm within each subinterval. Aggregate results to detect overall change points. Python ruptures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research Background research and research publication team.
Development

No branches or pull requests

2 participants