Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kuenm_cal_swd does not produce omission rate statistics #17

Open
taprs opened this issue May 1, 2021 · 22 comments
Open

kuenm_cal_swd does not produce omission rate statistics #17

taprs opened this issue May 1, 2021 · 22 comments

Comments

@taprs
Copy link

taprs commented May 1, 2021

Hi, I am once again struggling to make friends with kuenm_cal_swd. This time I could not retrieve any omission rate statistics with it.

My SWD files look like this:

Occurrence dataset (all of them look like this):

species,lon,lat,bio02,bio03,bio10,bio11,bio15,bio18,bio19
Vaccinium_myrtillus,37.633,54.867,70,202,185,-77,31,235,98
Vaccinium_myrtillus,-115.39306,40.5925,114,309,141,-96,32,58,141
Vaccinium_myrtillus,37.563163,55.574375,68,199,182,-79,29,229,106
Vaccinium_myrtillus,16.561161,67.96668,47,184,137,-59,28,247,251

Bias files (two of them in the folder):

background,lon,lat,bio02,bio03,bio10,bio11,bio15,bio18,bio19
background,-101.67930586895,47.01236056295,101,239,208,-95,68,162,31
background,-101.68763920225,38.68736059625,122,313,245,-11,65,194,27
background,-117.76263913795,49.62902721915,93,285,158,-68,28,151,313
background,-93.60430590125,39.71236059215,92,245,253,-16,39,276,98

My command is the following:
kuenm_cal_swd('Vaccinium_myrtillus_joint_swd.csv', 'Vaccinium_myrtillus_train_swd.csv', 'Vaccinium_myrtillus_test_swd.csv', './background', 'kuenm_cal_swd.sh', 'vm_mod', c(seq(0.1, 1, 0.1), seq(2, 6, 1), 8, 10), c('lqpth', 'lq'), 2000, maxent.path = '.', out.dir.eval = 'vm_mod/eval')

And this is the output:

If asked, RUN as administrator

A total of 68 candidate models will be created

Starting evaluation process
bash: /home/tapirus/miniconda3/lib/libtinfo.so.6: no version information available (required by bash)
Evaluation using partial ROC, omission rates, and AICc
  |==========================================================================================================| 100%None of the significant candidate models met the AICc criterion,
delta AICc will be recalculated for significant models

Writing calibration results
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In min(ku_enm_best[, 5], na.rm = TRUE) :
  no non-missing arguments to min; returning Inf
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
4: In min(x) : no non-missing arguments to min; returning Inf
5: In max(x) : no non-missing arguments to max; returning -Inf

calibration_results.csv looks like this:

"Model","Mean_AUC_ratio","pval_pROC","Omission_rate_at_5%","AICc","delta_AICc","W_AICc","N_parameters"
"M_0.1_F_lqpth_bias_kde_xy_swd",NA,NA,NA,672.317380352645,660.179222457908,4.14263618252283e-145,217
"M_0.1_F_lqpth_bias_lat_xy_swd",NA,NA,NA,639.356435643564,627.218277748828,5.95189137881709e-138,210
...
"M_8_F_lq_bias_kde_xy_swd",NA,NA,NA,12.1381578947368,0,0.0940531801013416,6
"M_8_F_lq_bias_lat_xy_swd",NA,NA,NA,12.1381578947368,0,0.0940531801013416,6
"M_10_F_lqpth_bias_kde_xy_swd",NA,NA,NA,24.5182724252492,12.3801145305123,0.000192781684793773,12
"M_10_F_lqpth_bias_lat_xy_swd",NA,NA,NA,45.7094594594595,33.5713015647226,4.82456291758231e-09,22
"M_10_F_lq_bias_kde_xy_swd",NA,NA,NA,12.1381578947368,0,0.0940531801013416,6
"M_10_F_lq_bias_lat_xy_swd",NA,NA,NA,12.1381578947368,0,0.0940531801013416,6

Am I doing something wrong? Why does the output lack some statistics? (By the way, is it really possible that two dissimilar background samples result in identical AIC values?) These results differ so much from what I have got for similar data with kuenm_cal...

@marlonecobos
Copy link
Owner

Can you tell me if the data (complete set, training, and testing sets, as well as background folder) for running the analysis were prepared using the functionprepare_swd?

@taprs
Copy link
Author

taprs commented May 2, 2021

No, I prepared the SWD files myself (I used my own background points and test dataset which is, to my knowledge, currently impossible with prepare_swd; in addition, raster values extraction without random selection of rows goes faster for orders of magnitude). However, I tried to produce the '_joint.csv' file and dummy background data with prepare_swd to check if they had identical structure -- and yes, the column set, names and values extracted from the rasters (for occurrences) were exactly the same to what my script has produced.

@marlonecobos
Copy link
Owner

OK, that is the problem.

You are right, until now you cannot use your own set of background points. The problem with preparing your occurrences yourself is that one required analysis is missing. When you prepare the data with prepare_swd longitude and latitude are slightly changed so they coincide exactly with the closest background point. That is required to measure omission rates and pROC. I should do something about it in the future, but right now preparing your data with the function above is the only way to warranty the analyses will run correctly.

The other thing you can do is modify the coordinates of your occurrences yourself, so they coincide with the closest background point. I haven't done that before but it should not be that hard if you create an algorithm to measure geographic distances from each point to all your background and then select the closest background point (long and lat) for each of your records and replace them.

Sorry I cannot help more now.

Best,

@taprs
Copy link
Author

taprs commented May 2, 2021

Isn't simple addition of ocurrences points to background the best solution? I think I remember there was an always-ticked option in Java Maxent that sounded like that.

@marlonecobos
Copy link
Owner

Simple things are not always better. Adding samples to the background will be problematic because you will have duplicate information in terms of environmental conditions but not in terms of geographic coordinates. I think that introduces bias to the background, you can try, but I am not totally sure how maxent will deal with that.

@taprs
Copy link
Author

taprs commented May 7, 2021

The following three lines of code did the job for me. Adding samples' locations to the background influenced the selection of optimal model parameters, so I am finally disenchanted with this option.

Assuming f1 is the SWD with occurrences' locations and f2 is the SWD with background data:

nearpt <- function (coords) which.min(colSums((t(f2[,2:3]) - coords)^2))
nearest.ids <- apply(f1[,2:3], 1, nearpt)
f1[,2:3] <- f2[nearest.ids, 2:3]

I finally obtained final models with (hopefully) a reasonable parametrization. Thank you for this package!

@marlonecobos
Copy link
Owner

Glad you found a workaround. I am leaving this issue open so I remember to work on this part later on.
Thanks for sharing your question and solution.

@taprs
Copy link
Author

taprs commented Jun 6, 2021

Hi Marlon, it's me once again.

I found sort of a justification for adding the samples points to background. Considering the Appendix 1b in Guillera-Arroita et al., 2014, I think Maxent is intended to work OK that way. (I did not look into the formulas, hope these guys know what they say.)

Another concern is that, when it comes to the continent scale (with the same 10000 background points) and/or when the grid cells are small, the slight distortion of the presence locations may have a notable effect on the model.

I tried both 'distorting' and 'adding' the presence locations and finally got better models for Eurasia with the latter approach (at least, they are closer to the known species ranges).

@marlonecobos
Copy link
Owner

Happy to hear that your results got better.

You are right about the number of background points and I am glad you played with that and experienced the effects on models.

I have to make significant improvements in kuenm regarding SWD format, probably this month. I will add a comment on all relevant issues to let you guys known when that happens.

@jmburgos
Copy link

Hi Marlon,
I am having a similar issue, in my case not getting AICc values. I am also making my own SWD files, and I have included my presence locations in the background points. When I run kuenm_cal_swd, I get this:

A total of 35 candidate models will be created

Starting evaluation process
Evaluation using partial ROC, omission rates, and AICc
  |======================================================================| 100%None of the significant candidate models met the omission rate criterion,
models with the lowest omission rate and lowest AICc will be presented


Writing calibration results
Error in plot.window(...) : need finite 'xlim' values
In addition: There were 43 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In aicc(oc, mod, par_num) :
  AICc not valid: too many parameters, or likelihood = Inf... returning NA.
2: `mutate_()` was deprecated in dplyr 0.7.0.
Please use `mutate()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
3: In aicc(oc, mod, par_num) :
  AICc not valid: too many parameters, or likelihood = Inf... returning NA.
4: In aicc(oc, mod, par_num) :
  AICc not valid: too many parameters, or likelihood = Inf... returning NA
.... etc.

The calibration_results.csv file looks like this:

"Model","Mean_AUC_ratio","pval_pROC","Omission_rate_at_10%","AICc","delta_AICc","W_AICc","N_parameters"
"M_0.1_F_l_Set_1",1.89739428190759,0,0.096551724137931,NA,NA,NA,16
"M_0.1_F_lq_Set_1",1.92706060526074,0,0.113793103448276,NA,NA,NA,32
"M_0.1_F_lqp_Set_1",1.92316380061621,0,0.1,NA,NA,NA,89
"M_0.1_F_lqpt_Set_1",3.81794578017398,0,0.244827586206897,NA,NA,NA,344
"M_0.1_F_lqpth_Set_1",7.06266209131053,0,0.262068965517241,NA,NA,NA,339
...etc.

Do you have any idea what could be happening?
Many thanks!

@jmburgos
Copy link

I tried running the model after using prepare_swd(), and everything works. So there must be something missing from my "hand made" swd files. I will dig into the prepware_swd file to understand what it is doing.

@marlonecobos
Copy link
Owner

Hi @jmburgos,
Yes, something must be different in your data. There should not be a problem if you add your occurrences in the background. That may be an issue on my side. I am working on a major update to kuenm and I think that would solve this kind of issue. By the end of July, it should be ready. I hope you can wait.

@jmburgos
Copy link

Thanks Marlon, I am looking forward for the updated kuenm. I think it is important to allow users to provide their own background points, for example to account for sampling bias in the occurence data.

@SDMENM
Copy link

SDMENM commented Oct 12, 2021

Hi Marlon, I am having a similar issue, in my case not getting AICc values. I am also making my own SWD files, and I have included my presence locations in the background points. When I run kuenm_cal_swd, I get this:

A total of 35 candidate models will be created

Starting evaluation process
Evaluation using partial ROC, omission rates, and AICc
  |======================================================================| 100%None of the significant candidate models met the omission rate criterion,
models with the lowest omission rate and lowest AICc will be presented


Writing calibration results
Error in plot.window(...) : need finite 'xlim' values
In addition: There were 43 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In aicc(oc, mod, par_num) :
  AICc not valid: too many parameters, or likelihood = Inf... returning NA.
2: `mutate_()` was deprecated in dplyr 0.7.0.
Please use `mutate()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
3: In aicc(oc, mod, par_num) :
  AICc not valid: too many parameters, or likelihood = Inf... returning NA.
4: In aicc(oc, mod, par_num) :
  AICc not valid: too many parameters, or likelihood = Inf... returning NA
.... etc.

The calibration_results.csv file looks like this:

"Model","Mean_AUC_ratio","pval_pROC","Omission_rate_at_10%","AICc","delta_AICc","W_AICc","N_parameters"
"M_0.1_F_l_Set_1",1.89739428190759,0,0.096551724137931,NA,NA,NA,16
"M_0.1_F_lq_Set_1",1.92706060526074,0,0.113793103448276,NA,NA,NA,32
"M_0.1_F_lqp_Set_1",1.92316380061621,0,0.1,NA,NA,NA,89
"M_0.1_F_lqpt_Set_1",3.81794578017398,0,0.244827586206897,NA,NA,NA,344
"M_0.1_F_lqpth_Set_1",7.06266209131053,0,0.262068965517241,NA,NA,NA,339
...etc.

Do you have any idea what could be happening? Many thanks!

Hello jmburgos, Did you find any solution for AICc problem??
I made swd background files myself. When I do not add all occurrences to background it produces AICc values, but doesn't produce omission rate, pROC and mean AUC, which I understand why.. but interestingly, when I add all my occurrences to sets of backgrounds then it will produce omission rate, pROC and mean AUC but not AICc.. Anyone has any idea ??

@jmburgos
Copy link

No Arif, I have not found a solution. I get the same results as you. If I add occurrences to background points I get AICc but no omission rates and the other parameters. Hopefully Marlon will have some way around this.

@SDMENM
Copy link

SDMENM commented Oct 13, 2021

Thanks Julian,
I changed the coordinates of occurrence data to match exactly like the background sets and it worked. However, in my case the independent evaluation data is on a different continent and I can't match the coordinates of that independent data with background (on a different continent). Also, if instead of matching coordinates if I add them to background then it wouldn't give me AICc. If I dont then the "kuenm_feval_swd" function won't work, because like "kuenm_mod_swd" it also requires occurrences that should have exactly same coordinates as in background. Every solution will left me with least one thing that would not be possible at the end.
By the way, I know this is a simple question but it is driving me crazy.. I do args="togglelayertype=grid_code" and for some reason it still plots grid_code as continuous.. Is there anything wrong with this args code?

@fbocean
Copy link

fbocean commented Nov 18, 2021

Hi, just a minor note, also on the kuenm_cal_swd function.

I found this thread when I was trying to find out why in one case the function returned "incorrect" npar and aicc values for me and resultingly selected unsuitable models.

I now understand that this was due to the fact that I had run the function a second time with exactly the same settings after modifying the input data (one of my predictor variables had faulty values which I corrected for the second run). My mistake was that I used "kept=TRUE" and did not delete the models from the first run before starting the second run as I knew they would be overwritten. However, I didn't consider that the function runs the maxent batch and the R based evaluation in parallel, the latter being quicker. Since all of my model names remained the same, the evaluation process was not waiting for the new models to be created, but simply evaluated a mix of the old ones and the new ones that were already created, leading to my confusing results. This may not happen to many users, and of course, the function already gives a warning when the directories already exist, but I wonder if it would be possible to make a modification along the lines of either

  • adding a more clear warning saying, that rerunning a model without deleting the previous models can lead to confounded evaluation statistics (or even warning the user in case the evaluation process finished before the maxent batch as an indication that something might be off)
  • or have the function delete the existing models at the start or, in case the flexibility of including "old" models is desired, to include an option to only let models be evaluated that are not older than the start time of the calibration?

Anyway, I thought I'd document it in case someone ever runs into the same issue.

@SDMENM I am not sure this will solve it for you and realise it is a while ago since you asked, however, I remember also having trouble with this before, and for me the args parameter works when I store it as a character value first, i.e.
args<-"togglelayertype=grid_code"
and then run the function:
kuenm_cal_swd( [...] , args = args, [...])

@jocelynvelazquezmaira
Copy link

Hola buen día, disculpen me podría ayudar tengo este problema con el paquete kuenm me indica lo siguiente:
Writing kuenm_ceval results...
Warning messages:
1: package ‘dplyr’ was built under R version 4.2.3
2: mutate_() was deprecated in dplyr 0.7.0.
ℹ Please use mutate() instead.
ℹ See vignette('programming') for more help
ℹ The deprecated feature was likely used in the kuenm package.
Please report the issue to the authors.
This warning is displayed once every 8 hours.
Call lifecycle::last_lifecycle_warnings() to see where this warning was
generated.

como lo soluciono

@jmburgos
Copy link

Jocelyn, eso es simplemente un warning avisando que kuenm está usando una función obsoleta, mutate_(). Eso es algo que Marlon debería eventualmente corregir pero no debería afectar tu uso del paquete.

@jocelynvelazquezmaira
Copy link

jocelynvelazquezmaira commented Jun 21, 2023 via email

@jmburgos
Copy link

jmburgos commented Jun 21, 2023 via email

@Galagb06
Copy link

Hola, buena noche

Disculpa quería ver si me podian ayudar

Al utilizar kuenm_ceval me aparece el siguiente mensaje: 'parallel' is a deprecated argument, aún cuando no incluí parallel dentro de los argumentos y no me deja avanzar de ahí

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants