-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NA/NaN gradient evaluation #383
Comments
Ben,
I'm guessing that you have a mismatch between Matrix and TMB. Have you
confirmed that TMB is working using the example here (
https://github.com/pfmc-assessments/geostatistical_delta-GLMM/wiki/Steps-to-install-TMB)
...?
Also, feel free to email me about tinyVAST (
https://vast-lib.github.io/tinyVAST/). I'm maintaining VAST for the next
couple years, but my development efforts are focused on tinyVAST which has
similar functionality using a smaller code base and regression interface.
Jim
…On Wed, Jan 31, 2024 at 11:04 AM Blevy2 ***@***.***> wrote:
Hi Jim!
I am running into an issue that I am having trouble understanding that I
wanted to run by you.
I am running VAST models on samples taken from spatial population
simulation output for fish species that I have developed. In the spatial
population models, the probability a fish moves to discrete cell $(i,j)$
in week $w$ is given by probability $Move_{w,i,j}$, where $Move_{w,i,j}$
depends on factors such as the water temperature in the given cell,
$Temp_{w,i,j}$.
I want to compare the performance of VAST models with and without
covariates, so I could reasonably provide VAST with either $Move_{w,i,j}$
and/or $Temp_{w,i,j}$ as the covariate. Since $Temp_{w,i,j}$ is just one
component of the actual movement probability $Move_{w,i,j}$, my
assumption is that $Move_{w,i,j}$ would provide more information to VAST
about species spatial preferences and thus provide a more accurate estimate
compared to $Temp_{w,i,j}$. I am using a second degree polynomial
response when including covariates. For example, if using $Move$ as the
covariate I input
X2_formula = ~ poly(Move, degree=2 )
X1_formula = ~ poly(Move, degree=2 )
My models without covariates are all converging fine and my models that
use $Temp_{w,i,j}$ as the covariate are also converging, but models that
use $Move_{w,i,j}$ as the covariate all seem to run nearly to completion
before they produce the error
<simpleError in nlminb(start = startpar, objective = fn, gradient = gr, control = nlminb.control, lower = lower, upper = upper): NA/NaN gradient evaluation>
There are a few github issues for VAST that involve NA/NaN *function*
evaluation, but not NA/NaN *gradient* evaluation. The closest thing I
could find related to this is this issue thread in glmmTMB:
glmmTMB/glmmTMB#164 <glmmTMB/glmmTMB#164>
Based on the discussion in the above issue link our best guess is that
maybe the gradient function created in VAST has a term something like
log(exp(X)) where X is some parameter. During the final stage of a VAST
run, possibly when the final Hessian is being calculated, the X value gets
really really small (e.g., 1E-320) and the exp(X) goes to zero due to
computer rounding and thus the log(exp(X)) is undefined. Does that sound
reasonable?
Are you familiar with this error? Do you know how to fix this so we can
use $Move$ as a covariate?
Thanks for your help!
Ben
—
Reply to this email directly, view it on GitHub
<#383>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB46UTIAKL5M2WUXRTWHHQTYRKIVLAVCNFSM6AAAAABCTS7YBKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTANZXGU2DSNA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi Jim, I just ran the code in the install TMB link and confirmed that TMB is working. Do you think I need a different version of the Matrix package? My sessionInfo shows Matrix_1.6-1.1 and TMB_1.9.6 attached base packages: other attached packages: loaded via a namespace (and not attached): Thanks for your help! Ben |
When you start a new session and type library(TMB) do you get a warning
message?
…On Wed, Jan 31, 2024 at 11:23 AM Blevy2 ***@***.***> wrote:
Hi Jim,
I just ran the code in the install TMB link and confirmed that TMB is
working.
Do you think I need a different version of the Matrix package?
My sessionInfo shows Matrix_1.6-1.1 and TMB_1.9.6
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.1.3 VAST_3.10.1 FishStatsUtils_2.12.1
[4] marginaleffects_0.15.1 units_0.8-4 TMB_1.9.6
loaded via a namespace (and not attached):
[1] utf8_1.2.4 R6_2.5.1 tidyselect_1.2.0 Matrix_1.6-1.1
[5] lattice_0.20-44 magrittr_2.0.3 INLA_23.05.30-1 splines_4.3.2
[9] glue_1.6.2 tibble_3.2.1 pkgconfig_2.0.3 generics_0.1.2
[13] lifecycle_1.0.4 cli_3.6.1 fansi_1.0.5 vctrs_0.6.4
[17] grid_4.3.2 data.table_1.14.4 compiler_4.3.2 sp_1.5-0
[21] pillar_1.9.0 Rcpp_1.0.11 rlang_1.1.2
[1] "input is"
Thanks for your help!
Ben
—
Reply to this email directly, view it on GitHub
<#383 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB46UTJMB2HDJAIVLW7BMZDYRKK2JAVCNFSM6AAAAABCTS7YBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJZG43TQNJQGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi Jim, To clarify, this is being run in a High Performance Computing environment (HPC), which means I am running command line inputs that call R files and then looking at .out files. I don't think there is a warning when library(TMB) is called, but I just searched the outfile for "warning" and found the following: In addition: Warning messages: Do you agree that this implies I should have Matrix 1.5.4.1 instead of 1.6.1.1? I just want to be clear because this is a High Performance Computing environment so I need to communicate with the HPC administrator to coordinate package installations/changes. If the Matrix package is off, any idea why I have never had a problem running models on the HPC previously? I have run thousands of models over the last few months on the HPC and only this one covariate type is causing an issue. I just want to try to understand the problem. thanks Jim! Ben |
There's a huge number of issue threads about the Matrix / TMB mismatch in
glmmTMB, sdmTMB etc. It's been a headache :0 I'm guessing that TMB got
updated and now the version mismatch is causing some problem with
sparse-matrix stuff. Sorry that the HPC stuff is hard to debug! I'm
hoping that this Matrix/TMB issue doesn't happen again.
…On Wed, Jan 31, 2024 at 11:32 AM Blevy2 ***@***.***> wrote:
Hi Jim,
To clarify, this is being run in a High Performance Computing environment
(HPC), which means I am running command line inputs that call R files and
then looking at .out files.
I don't think there is a warning when library(TMB) is called, but I just
searched the outfile for "warning" and found the following:
In addition: Warning messages:
1: In checkMatrixPackageVersion() :
Package version inconsistency detected.
TMB was built with Matrix version 1.5.4.1
Current Matrix version is 1.6.1.1
Please re-install 'TMB' from source using install.packages('TMB', type =
'source') or ask CRAN for a binary version of 'TMB' matching CRAN's
'Matrix' package
2: In dir.create(file.path(paste0(getwd(), "/sim_", sim_num, "/", CN, :
'/mnt/research/b.levy/VAST_Stuff/VAST_MixFishSim/10xKnots_MFS/sim_1/YTF/ConTemp/WCov_MoveCov/FALL/AllStrata'
already exists
3: In file(file, "rt") :
cannot open file 'Index_wYearSeason.csv': No such file or directory
Do you agree that this implies I should have Matrix 1.5.4.1 instead of
1.6.1.1? I just want to be clear because this is a High Performance
Computing environment so I need to communicate with the HPC administrator
to coordinate package installations/changes.
If the Matrix package is off, any idea why I have never had a problem
running models on the HPC previously? I have run thousands of models over
the last few months on the HPC and only this one covariate type is causing
an issue. I just want to try to understand the problem.
thanks Jim!
Ben
—
Reply to this email directly, view it on GitHub
<#383 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB46UTOU2KPQW2NRXUV3533YRKL6HAVCNFSM6AAAAABCTS7YBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJZG44TGNRUGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
To follow up on this issue, I worked with the HPC administrator to try two different combinations of Matrix and TMB that were shown in some of the issues related to this problem. Unfortunately they both produced the same error. Next I then tried a different covariate combination in the model and the model also did not converge. In this case I was using both a static (
I then removed (
There is no problem when no covariate information is included. So the problem only shows up for specific combinations of covariate input. |
Hi @James-Thorson-NOAA , I think I am seeing something else related to this issue, which is why I am posting here. I may consider a new issue though as it could be something different. Do you think this problem is related to having a package incompatibility? In some models with covariate I am getting the error: <simpleError in if (any(On_bounds)) { problem_found = TRUE if (quiet == FALSE) { stop(paste0("\nCheck bounds for the following parameters: ", parameter_estimates$diagnostics[which(On_bounds), ])) }}: missing value where TRUE/FALSE needed> I went ahead and printed out On_bounds and parameter_estimates at this step and see the following: [1] "On_bounds is" $objective $iterations $evaluations $time_for_MLE $max_gradient $Convergence_check $number_of_coefficients $AIC $diagnostics |
Hi Jim!
I am running into an issue that I am having trouble understanding that I wanted to run by you.
I am running VAST models on samples taken from spatial population simulation output for fish species that I have developed. In the spatial population models, the probability a fish moves to discrete cell$(i,j)$ in week $w$ is given by probability $Move_{w,i,j}$ , where $Move_{w,i,j}$ depends on factors such as the water temperature in the given cell, $Temp_{w,i,j}$ .
I want to compare the performance of VAST models with and without covariates, so I could reasonably provide VAST with either$Move_{w,i,j}$ and/or $Temp_{w,i,j}$ as the covariate. Since $Temp_{w,i,j}$ is just one component of the actual movement probability $Move_{w,i,j}$ , my assumption is that $Move_{w,i,j}$ would provide more information to VAST about species spatial preferences and thus provide a more accurate estimate compared to $Temp_{w,i,j}$ . I am using a second degree polynomial response when including covariates. For example, if using $Move$ as the covariate I input
My models without covariates are all converging fine and my models that use$Temp_{w,i,j}$ as the covariate are also converging, but models that use $Move_{w,i,j}$ as the covariate all seem to run nearly to completion before they produce the error
There are a few github issues for VAST that involve NA/NaN function evaluation, but not NA/NaN gradient evaluation. The closest thing I could find related to this is this issue thread in glmmTMB: glmmTMB/glmmTMB#164
Based on the discussion in the above issue link our best guess is that maybe the gradient function created in VAST has a term something like log(exp(X)) where X is some parameter. During the final stage of a VAST run, possibly when the final Hessian is being calculated, the X value gets really really small (e.g., 1E-320) and the exp(X) goes to zero due to computer rounding and thus the log(exp(X)) is undefined. Does that sound reasonable?
Are you familiar with this error? Do you know how to fix this so we can use$Move$ as a covariate?
Here is my Seesion.Info():
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.1.3 VAST_3.10.1 FishStatsUtils_2.12.1
[4] marginaleffects_0.15.1 units_0.8-4 TMB_1.9.6
loaded via a namespace (and not attached):
[1] utf8_1.2.4 R6_2.5.1 tidyselect_1.2.0 Matrix_1.6-1.1
[5] lattice_0.20-44 magrittr_2.0.3 INLA_23.05.30-1 splines_4.3.2
[9] glue_1.6.2 tibble_3.2.1 pkgconfig_2.0.3 generics_0.1.2
[13] lifecycle_1.0.4 cli_3.6.1 fansi_1.0.5 vctrs_0.6.4
[17] grid_4.3.2 data.table_1.14.4 compiler_4.3.2 sp_1.5-0
[21] pillar_1.9.0 Rcpp_1.0.11 rlang_1.1.2
[1] "input is"
Thanks for your help!
Ben
The text was updated successfully, but these errors were encountered: