Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation statistics for SDMs #138

Open
tpoisot opened this issue Dec 22, 2023 · 6 comments
Open

Validation statistics for SDMs #138

tpoisot opened this issue Dec 22, 2023 · 6 comments
Labels
pipeline Linked to a pipeline or underlying scripts

Comments

@tpoisot
Copy link

tpoisot commented Dec 22, 2023

I was going to bring this up at the next meeting in 2024, but we should indeed make sure that there's a systematic validation of the SDMs with a series of reliable measures. There's been recent literature in ML showing that AUC (ROC-AUC anyways) can be high even for "bad" models.

(cc. @glaroc -- #137)

@frousseu
Copy link
Contributor

@glaroc While validation statistics would be cool, I don't think there is currently any reliable way to validate models built with biased data if no external standardized data or data not subject to the same bias is available. I see a lot of models with better AUC when bias is ignored than when it is taken into account (even though the former may look like complete crap). As such, I think that performance measures may be misleading and give a false sense of confidence in the outputs. Perhaps with more severe block CV performance measures are less misleading (maybe Laura would have something to say about this)? I may be wrong, but I don't think that other measures would be any better if the validation is done on the same bias data. Perhaps this https://doi.org/10.1016/j.ecolind.2022.109487 offers some ideas.

@tpoisot
Copy link
Author

tpoisot commented Dec 22, 2023

I think we are talking about two different purposes for measure of model performance here. What I have in mind is a measure of how well the model learned from the data, and I see it more as "absolutely mandatory" than "cool", but because I'm looking at this problem from (in part) an applied ML point of view. This requires some sort of validation, which can be done using crossCV or other packages.

Maybe a script is not the correct way to approach this problem, and instead we could mandate a series of statistics with average + 95% CI for some form of cross-validation.

@frousseu
Copy link
Contributor

Ok, what I had in mind for a performance measure was more in the sense of measuring how well does a model likely represents the actual distribution of a species.

@tpoisot
Copy link
Author

tpoisot commented Dec 22, 2023

Agreed, but this is a different (second?) step. Ideally, before having this discussion, it's important to know whether the training of the model went well at all, and that's what a standard set of performance measures would indicate. I don't think it's an either/or situation at all, and both can be built in parallel.

And we can inform pipeline users / developers that if the model doesn't have good performance, the question of its fit to the actual distribution shouldn't even be considered.

@frousseu
Copy link
Contributor

Ok, I understand now that what you refer to is how well does the model (and its predictors) are able to explain/predict the patterns in the data. My concern here would be is good performance in this sense positively correlated to good performance in representing the true distribution of species.

@glaroc
Copy link
Contributor

glaroc commented Dec 22, 2023

I think in any case, the SDM pipelines should provide the fit statistics as outputs. Whether we use them or not for selecting the right model is an open debate.

@jmlord jmlord added the pipeline Linked to a pipeline or underlying scripts label Aug 21, 2024
@jmlord jmlord moved this to Backlog in BON in a Box pipelines Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pipeline Linked to a pipeline or underlying scripts
Projects
Status: Backlog
Development

No branches or pull requests

4 participants