Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

brier_score and cumulative_dynamic_auc fail when there is a test time greater than a training time #511

Open
aliciaolivaresgil opened this issue Jan 31, 2025 · 2 comments

Comments

@aliciaolivaresgil
Copy link

I'm creating a new open issue related to #478_ which is currently cosed because no reproducible example was provided.

Another user (@dpellow) reported that "brier_score" produces a ValueError when test time is greater than training time. I experience the same issue with "cumulative_dynamic_auc". This does not happen systematically but only in some cases.

The documentation sais that time points in "times" must be within the range of times in "survival_test", but says nothing about times in "survival_test" being within the range of times in "survival_train". In fact, this happens in many other examples and no error occurs.

Here I leave a reproducible example where this error happens (sksurv version 0.22.2): https://github.com/aliciaolivaresgil/Reproduce_errors/blob/main/Error_example.ipynb

I could not find any difference between this example and the others where survival_test is greater than training test but no error occurs. Am I doing something wrong in the example? Why is this error occurring?

@sebp
Copy link
Owner

sebp commented Jan 31, 2025

Thanks for providing the example.

Let me explain what is happening. The estimator of the time-dependent ROC (and the Brier score) relies on inverse-probability of censoring weighting. This means for each time point in survial_test, it estimates how likely it is to get censored, based on the data provided as survival_train. If survial_test has a time point larger than what survival_train contains, the probability of censoring at that time point is unknown, because it wasn't seen before and extrapolation is typically not possible.

You can just pass the concatenation of survival_train and survial_test as survival_train and this issue won't arise. This is often acceptable, because the test data is only used to estimate censoring weights.

I agree that the documentation should be improved to explain this.

@aliciaolivaresgil
Copy link
Author

Thank you! That worked but i still believe the documentation should explain this, specially the explanation of the parameter survival_train.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants