Validation statistics for SDMs #138

tpoisot · 2023-12-22T14:35:00Z

I was going to bring this up at the next meeting in 2024, but we should indeed make sure that there's a systematic validation of the SDMs with a series of reliable measures. There's been recent literature in ML showing that AUC (ROC-AUC anyways) can be high even for "bad" models.

(cc. @glaroc -- #137)

frousseu · 2023-12-22T15:20:53Z

@glaroc While validation statistics would be cool, I don't think there is currently any reliable way to validate models built with biased data if no external standardized data or data not subject to the same bias is available. I see a lot of models with better AUC when bias is ignored than when it is taken into account (even though the former may look like complete crap). As such, I think that performance measures may be misleading and give a false sense of confidence in the outputs. Perhaps with more severe block CV performance measures are less misleading (maybe Laura would have something to say about this)? I may be wrong, but I don't think that other measures would be any better if the validation is done on the same bias data. Perhaps this https://doi.org/10.1016/j.ecolind.2022.109487 offers some ideas.

tpoisot · 2023-12-22T15:32:51Z

I think we are talking about two different purposes for measure of model performance here. What I have in mind is a measure of how well the model learned from the data, and I see it more as "absolutely mandatory" than "cool", but because I'm looking at this problem from (in part) an applied ML point of view. This requires some sort of validation, which can be done using crossCV or other packages.

Maybe a script is not the correct way to approach this problem, and instead we could mandate a series of statistics with average + 95% CI for some form of cross-validation.

frousseu · 2023-12-22T16:26:37Z

Ok, what I had in mind for a performance measure was more in the sense of measuring how well does a model likely represents the actual distribution of a species.

tpoisot · 2023-12-22T16:35:20Z

Agreed, but this is a different (second?) step. Ideally, before having this discussion, it's important to know whether the training of the model went well at all, and that's what a standard set of performance measures would indicate. I don't think it's an either/or situation at all, and both can be built in parallel.

And we can inform pipeline users / developers that if the model doesn't have good performance, the question of its fit to the actual distribution shouldn't even be considered.

frousseu · 2023-12-22T16:45:21Z

Ok, I understand now that what you refer to is how well does the model (and its predictors) are able to explain/predict the patterns in the data. My concern here would be is good performance in this sense positively correlated to good performance in representing the true distribution of species.

glaroc · 2023-12-22T16:50:05Z

I think in any case, the SDM pipelines should provide the fit statistics as outputs. Whether we use them or not for selecting the right model is an open debate.

jmlord added the pipeline Linked to a pipeline or underlying scripts label Aug 21, 2024

jmlord added this to BON in a Box pipelines Aug 21, 2024

jmlord moved this to Backlog in BON in a Box pipelines Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation statistics for SDMs #138

Validation statistics for SDMs #138

tpoisot commented Dec 22, 2023

frousseu commented Dec 22, 2023

tpoisot commented Dec 22, 2023

frousseu commented Dec 22, 2023

tpoisot commented Dec 22, 2023

frousseu commented Dec 22, 2023

glaroc commented Dec 22, 2023

Validation statistics for SDMs #138

Validation statistics for SDMs #138

Comments

tpoisot commented Dec 22, 2023

frousseu commented Dec 22, 2023

tpoisot commented Dec 22, 2023

frousseu commented Dec 22, 2023

tpoisot commented Dec 22, 2023

frousseu commented Dec 22, 2023

glaroc commented Dec 22, 2023