Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question/Docs] how to access S-R trend analysis statistical details #894

Closed
lisphilar opened this issue Jul 19, 2021 · 10 comments
Closed
Assignees
Labels
question Further information is requested

Comments

@lisphilar
Copy link
Owner

Summary of question

@AnujTiwari wrote at issue #851.

Which trend analysis technique S-R trend is using? Is it possible to access the trend type, p-value, z-value, slope, and other parameters related to the trend results? Also, is it possible to collect the resultant parameters for homogeneity analysis?

@lisphilar lisphilar added the question Further information is requested label Jul 19, 2021
@lisphilar
Copy link
Owner Author

Dear @AnujTiwari,
Could you me the details of trend analysis technique and homogeneity analysis?

@lisphilar
Copy link
Owner Author

lisphilar commented Jul 19, 2021

We can get MSE, MEPE, RMSLE score etc. with TrendDetector class.

import covsirphy as cs
data_loader = cs.DataLoader()
jhu_data = data_loader.jhu()
country = "Italy"
record_df, _ = jhu_data.records(country=country)

detector = cs.TrendDetector(record_df, min_size=7)
_ = detector.sr(algo="Binseg-normal")
df = detector.summary(metrics="MSE")
print(df)

[Update] simplified the script.

Please also refer to #670.

@AnujTiwari
Copy link

AnujTiwari commented Jul 19, 2021

A change-point is usually related to an abrupt or structural change in the distributional properties of data, whereas trend detection is an analysis that looks for the existence of gradual departure of data from its past. Change-point and trend detection are both long-lived research questions that have been frequently raised in statistical and non-statistical
communities for decades. We have parametric (assumption: data follows a normal distribution) and non-parametric (assumption: data does not follow a normal distribution) trend assessment techniques. In the case of COVID-19, there are a lot of articles that are in the favor of one or another considering their pros and cons.

We can discuss more about them but my current understanding with CovSirPhy is that you have implemented a change point detection algorithm (ruptures package) to find out the abrupt changes and obviously dataset between the two change points represents a trend. But there is no any trend assessment technique is implemented for accessing the nature of the test. So we are using these small-small time series for computing the reproduction number and other SIR-specific results.

@lisphilar
Copy link
Owner Author

Yes, change point detection is a challenging task and it is the time to update our trend analysis to academic level. Because the accuracy of trend analysis affect the outcome of the subsequent analysis, improvements and detailed assessments are necessary.

However, please confirm the background of implementation of our trend analysis. We consider the series of the following steps as one workflow.

snl.trend()
snl.estimate(cs.SIRF)
snl.score()

S-R trend analysis was just created to improve the accuracy of parameter estimation by splitting the time series data to phases. The purpose of S-R trend analysis is to find change points which ensure that the records between change points follow a SIR-derived model with stable parameter values.

(I think we can know "trend of outbreak" with the history of model parameter values. S-R trend analysis is not a tool to know the "trend of outbreak" and this is confusing for experts?)

Accuracy of trend analysis + parameter estimation is assessed with Scenario.score(), and I thought that of trend analysis is assessed with TrendDetector.summary() with MSE etc. as I mentioned in the previous comment.

I'm not familier with the scientific background of change point analysis (I just studied it with documentation of ruptures and quick reading of its papers). I have two questions.

  1. It is possible to get slope by updating TrendDetector, but what information we will get with slope?
  2. How can we calculate p-value and z-score with S-R trend analysis?

@AnujTiwari
Copy link

Yes - It is possible to compute the slope and all the other statistical parameters too (p-value and z-score) using some non-parametric trend analysis techniques like Mann Kendall and Sen Slope Trend Analysis. I can try if it is possible to access the S-R time series?

@lisphilar
Copy link
Owner Author

S-R time series data has Date/log10(Susceptible)/Recovered and we can create it (sr_df) with JHUData as follows.

import covsirphy as cs
import numpy as np
import pandas as pd
loader = cs.DataLoader()
jhu_data = loader.jhu()
subset_df, _ = jhu_data.records(country="country name", province="province name")
sr_dict = {
    "Date": subset_df["Date"],
    "R": subset_df["Recovered"],
    "log10S": np.log10(subset_df["Susceptible"].astype(np.float64)),
}
sr_df = pd.DataFrame(sr_dict).set_index("Date")

@AnujTiwari
Copy link

Thanks, Lisphilar for the code. I will definitely provide you update on the trend analysis soon.

@lisphilar
Copy link
Owner Author

This may be off topic, but time series clustering (find patterns) of S-R trend can also be a new object of study. (This is also related to #396.)

Time Series Clustering — Deriving Trends and Archetypes from Sequential Data
https://towardsdatascience.com/time-series-clustering-deriving-trends-and-archetypes-from-sequential-data-bb87783312b4

@lisphilar
Copy link
Owner Author

I'm preparing for version 3 release after some 2.x versions and revising class structures and methods of analysis.
We are using ruptures package to find structural changes of ODE parameter value sets indirectly by decting chainge points of logS and R. We called it as "S-R trend analysis," but "S-R change point analysis" should be used?

@lisphilar lisphilar added this to the Release v2.25.0 milestone Aug 9, 2022
@lisphilar
Copy link
Owner Author

"S-R change point analysis" will be used from the next stable version 2.25.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants