-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation statistics for SDMs #138
Comments
@glaroc While validation statistics would be cool, I don't think there is currently any reliable way to validate models built with biased data if no external standardized data or data not subject to the same bias is available. I see a lot of models with better AUC when bias is ignored than when it is taken into account (even though the former may look like complete crap). As such, I think that performance measures may be misleading and give a false sense of confidence in the outputs. Perhaps with more severe block CV performance measures are less misleading (maybe Laura would have something to say about this)? I may be wrong, but I don't think that other measures would be any better if the validation is done on the same bias data. Perhaps this https://doi.org/10.1016/j.ecolind.2022.109487 offers some ideas. |
I think we are talking about two different purposes for measure of model performance here. What I have in mind is a measure of how well the model learned from the data, and I see it more as "absolutely mandatory" than "cool", but because I'm looking at this problem from (in part) an applied ML point of view. This requires some sort of validation, which can be done using crossCV or other packages. Maybe a script is not the correct way to approach this problem, and instead we could mandate a series of statistics with average + 95% CI for some form of cross-validation. |
Ok, what I had in mind for a performance measure was more in the sense of measuring how well does a model likely represents the actual distribution of a species. |
Agreed, but this is a different (second?) step. Ideally, before having this discussion, it's important to know whether the training of the model went well at all, and that's what a standard set of performance measures would indicate. I don't think it's an either/or situation at all, and both can be built in parallel. And we can inform pipeline users / developers that if the model doesn't have good performance, the question of its fit to the actual distribution shouldn't even be considered. |
Ok, I understand now that what you refer to is how well does the model (and its predictors) are able to explain/predict the patterns in the data. My concern here would be is good performance in this sense positively correlated to good performance in representing the true distribution of species. |
I think in any case, the SDM pipelines should provide the fit statistics as outputs. Whether we use them or not for selecting the right model is an open debate. |
I was going to bring this up at the next meeting in 2024, but we should indeed make sure that there's a systematic validation of the SDMs with a series of reliable measures. There's been recent literature in ML showing that AUC (ROC-AUC anyways) can be high even for "bad" models.
(cc. @glaroc -- #137)
The text was updated successfully, but these errors were encountered: