Performance of recruitment dev_vector with MCMC using ADNUTS #593

alexpbell · 2024-04-24T22:55:09Z

alexpbell
Apr 24, 2024

Describe the bug

Inability to find suitable epsilon in hybrid monte carlo epsilon search when attempting to use NUTS MCMC. SS3 input files have been provided to @Rick-Methot-NOAA.

To Reproduce

ss3_win.exe -hbf -nuts -mcmc 15

Expected behavior

MCMC iterations to proceed

Screenshots

Screenshot for the command above:

In this screenshot we have -verbose_find_epsilon added to the command, showing more detail:

Which OS are you seeing the problem on?

Windows

Which version of SS3 are you seeing the problem on?

3.30.22.1

Additional Context

This issue has not come up for this model when Q_extra_SD is not an estimated parameter, but it is not clear whether this is exclusively a Q_extra_SD issue.
The verbose find epsilon flag output suggests the issue is occurring on the first leapfrog jump where epsilon is at its default value of 1.
In hmc_functions.cpp (admb source) there is this check after the first jump: "if(alpha < 0.5 || std::isnan(alpha)){". Here it seems that we have alpha=Inf rather than NaN because in this case the likelihood is getting vastly better rather than worse (as you might expect).
I am not sure why the likelihood is getting so much better, whether that is meant to be possible, and whether upgrading this check to include std::isinf might resolve things or whether there is some deeper issue with the input files that is causing this.

alexpbell · 2024-04-25T01:31:21Z

alexpbell
Apr 25, 2024
Author

Tested now on a linux platform (Ubuntu 22.04.4 LTS, intel processor) and same issue.

0 replies

Rick-Methot-NOAA · 2024-04-25T03:10:27Z

Rick-Methot-NOAA
Apr 25, 2024
Maintainer

@kellijohnson-NOAA and all associated with hake. Do you see this happening in your situation?

0 replies

alexpbell · 2024-04-25T03:11:52Z

alexpbell
Apr 25, 2024
Author

I think what is happening is that when extra s.e. is being estimated for Q, and hbf (hybrid bounded flag) is 1, then in the initiation of the hybrid monte carlo step size search when epsilon is 1, there is a big jump through "unbounded" (or at least differently bounded) parameter space such that the s.d. term (Svy_se_use(f,i) for the surv_like component of the likelihood becomes very large: Svy_like_I(f,i) = 0.5 * square((Svy_obs_log(f, i) - Svy_est(f, i) + 0.5 * square(Svy_se_use(f, i))) / Svy_se_use(f, i)) + sd_offset * log(Svy_se_use(f, i));
so the nll becomes unexpectedly small ("good"). The fact that a positive definite hessian minima is still found suggests the shape of the nll objective function surface is locally concave but globally convex and that some kind of bounds / variation to the bounds when hbf is on are needed for this kind of "extra s.e." parameter.

0 replies

Cole-Monnahan-NOAA · 2024-04-25T15:59:47Z

Cole-Monnahan-NOAA
Apr 25, 2024
Collaborator

The first thing I notice is that the NLL after optimization (374) does not match the initial NLL during NUTS (31415). If you turn on SS3 printing to see the SSB and others it probably would indicate that you're starting in some unrealistic part of the posterior, potentially with a crashed population. Why this occurs I'm not sure, and I don't remember seeing this before for other models (but this is precisely why I print it out in ADMB). Does the population get very close to 0 at some point? Do you use any dev_vectors? If so I'd turn those off.

I would try a few tests. (1) set it to start from the .par file and use phasing and max phasing to turn off most parameters, but leave estimation of Q_extra_SD on. See how far back you have to strip the model for it to work fine. (2) Try adding the flag -mcdiag to the command (see here). This will tell ADMB to start from a diagonal mass matrix (ignore the admodel.cov file). (3) Try the old RWM algorithm to see what it does. If the initial NLL matches the optimizer then that's really helpful to know. (4) Try specifying initial values (active pars only) using e.g. -mcpin init.txt

My intuition is that this is a model configuration/SS3 issue. Catching serious issues like this in the initial stepsize calcs would probably be prudent. If need be we can file an issue at the ADMB repo. But for now I don't recommend it because ADMB is no longer actively being developed, this is a pretty extreme, and the error message and console output is pretty sufficient to identify where to look for issues.

0 replies

alexpbell · 2024-04-26T05:20:28Z

alexpbell
Apr 26, 2024
Author

Thanks Cole it seems you've hit on the main underlying issue. I did not expect the NLL to jump like that at the onset of MCMC. The reason appears to be related to dev vectors as you suspect, and the best documentation of the issue seems to be from the starter file section of the SS manual:

"A bias adjustment of 1.0 is applied to recruitment deviations in the MCMC phase, which could result in reduced recruitment estimates relative to the MLE when a lower bias adjustment value is applied."

In this case we have 45 rec devs estimated and a strong bias adjustment ramp applied, and a generally low biomass. The combination produced a strong "bias shock" and reduced relative recruitment at the onset of MCMC such that crash penalties kicked in.

I am finding that the SS3 manual suggestion for handling this ("A small value, called the “bump”, is added to the ln(R0) for the first call to MCMC in order to prevent the stock from hitting the lower bounds when switching from MLE to MCMC.") still creates quite a difference between MLE NLL and MCMC NLL when using nuts (whereas it doesn't seem to with rwm). So I am handling the bias shock by setting my "Max Bias Adjustment" value in rec devs settings to -1, which sets it to 1.0 for all years. I am not using this for any MLE inference, just to prime the nuts MCMC. Let me know if you think there is a better way to go. With this done, nuts appears to be operating correctly with q extra as an estimated parameter, though I still have more testing to do to be sure.

RWM NLL difference due to a small ln(R0) "bump":

NUTS NLL difference with same bump:

NUTS NLL difference with no bump:

0 replies

alexpbell · 2024-04-26T05:39:14Z

alexpbell
Apr 26, 2024
Author

I should clarify the above screenshots all relate to Max Bias Adjustment=-1 models, so they deal with "other" (non rec dev) bias adjustments and are therefore not testing whether I can find a suitable bump that turns crash penalties off (crash is zero in all these cases) just noticing that small change in ln(R0) seem to make a big difference to the likelihood components in the hbf/nuts case, so I'm staying away from the bump option for now:

0 replies

Cole-Monnahan-NOAA · 2024-04-26T16:03:44Z

Cole-Monnahan-NOAA
Apr 26, 2024
Collaborator

Note that the concept of dev_vector options for recdevs and the bias ramp are different.

I don't understand why the MLE ramp would be different since SS3 detects the -mcmc flag internally and disables it. The differences could be from using dev_vectors in other model components. These are not compatible with MCMC and should not be used. See the manual for how to change this option.

The NUTS code is robust to initial values so as long as it can find an epsilon and get started then all should be good. It just failed this time b/c the crashed portion of the likelihood space is not reliable. You don't need to (and ideally shouldn't) start from the MLE. So don't worry about those small differences, as long as they're not caused by dev_vectors.

0 replies

iantaylor-NOAA · 2024-04-26T16:08:10Z

iantaylor-NOAA
Apr 26, 2024
Maintainer

I had forgotten about the bias adjustment bump option. It was created long before Cole developed adnuts in response to models that crashed when the -mcmc flag replaced the bias adjustment ramp with 100% bias adjustment for all recdevs. Perhaps it was around the time of this Stewart et al. paper: https://doi.org/10.1016/j.fishres.2012.07.003.

I think running the initial estimation with Max Bias Adjustment=-1 is a very reasonable solution and only would NOT work well if the resulting parameter estimates and associated correlation structure led to worse mixing than the MLE without that change. I see Cole's response just now. Yes, adnuts is so efficient that I think poor mixing is unlikely to be a problem.

0 replies

Cole-Monnahan-NOAA · 2024-04-26T16:12:37Z

Cole-Monnahan-NOAA
Apr 26, 2024
Collaborator

@iantaylor-NOAA yeah thanks. I don't see that option being super useful b/c the user should just optimize with the -mcmc flag, or turn bias correction off completely so it would not be needed.

@alexpbell I also recommend running from R through the adnuts R package. Command line is fine for diagnosing issues but for real runs you can't pass up parallel execution and the tie in with rstan diagnostics and other outputs.

Wasn't there an overhaul of the SS3 manual to include some of this advice? I can't remember where we left off with that.

0 replies

Rick-Methot-NOAA · 2024-04-26T16:19:20Z

Rick-Methot-NOAA
Apr 26, 2024
Maintainer

dev_vector is only used for recruitment deviations. All other parameter deviation vectors are simple devs.
Recruitment uses dev_vector because at the time (a very long time ago), it was deemed desirable that the mean recruitment as derived from the spawner-recruitment curve's log(R0) would be the same as the mean recruitment from the time series of recruitments.
That gets broken easily when using regular dev vectors and SS3 MAKES NO INTERNAL ADJUSTMENT for the situations in which log(R0) moves one direction while the mean of the recdevs moves the other direction as an offset. MSY-like quantities are based on the spawner-recruitment parameters, not the mean of the recruitments after devs are applied. Note that the SUM of the recdevs is reported as a stepping stone towards such an adjustment.
I think this internal adjustment to base MSY on the arithmetic mean of the derived recruitments should be a priority for SS3 development because we already are advocating that users switch away from the dev_vector option.

0 replies

alexpbell · 2024-04-28T21:21:24Z

alexpbell
Apr 28, 2024
Author

@iantaylor-NOAA @Cole-Monnahan-NOAA The bias adjustment issue is happening with -mcmc flag and rec devs = simple devs, hence the use of Max Bias Adjustment = -1 (which then still allows quick switch to/from MLE).

@Rick-Methot-NOAA did you just explain why? I.e. simple devs are a "regular" deviation as opposed to a "dev_vector" and the mcmc flag is switching off bias adjustment during MLE phase only for the latter?

@Cole-Monnahan-NOAA Yes real runs are in parallel with rstan diagnostics.

1 reply

Rick-Methot-NOAA Apr 29, 2024
Maintainer

@alexpbell - Bias adjustment should be turned off in MCMC for regular devs and for dev_vector. Is it staying on for regular devs?

Rick-Methot-NOAA · 2024-04-29T18:20:50Z

Rick-Methot-NOAA
Apr 29, 2024
Maintainer

given sigmaR, we could calculate the se for the mean dev and use it to add an additional component to the logL

0 replies

alexpbell · 2024-04-29T21:03:47Z

alexpbell
Apr 29, 2024
Author

@Rick-Methot-NOAA and @Cole-Monnahan-NOAA The issue we see in the first screenshot above where the NLL jumps a huge amount (~300 to ~30,000) at the onset of MCMC occurs when using -hbf -nuts -mcmc, unless I set Max Bias Adjustment to -1. I was putting this down to crash penalties but now I look at the likelihood components crash=0 before and after:

1 reply

Rick-Methot-NOAA Apr 29, 2024
Maintainer

Alex,
Are you using the decimal component of this feature to attempt to minimize that jump? What you are seeing could simply be due to the fact that arithmetic mean recruitment (hence the number of fish available to be caught) shifts abruptly when bias adjustment gets turned off after MLE and going into MCMC.
0 # MCMC output detail: integer part (0=default; 1=adds obj func components; 2= +write_report_for_each_mceval); and decimal part (added to SR_LN(R0) on first call to mcmc)

Cole-Monnahan-NOAA · 2024-04-29T21:17:18Z

Cole-Monnahan-NOAA
Apr 29, 2024
Collaborator

But shouldn't the -mcmc flag negate this effect completely, @Rick-Methot-NOAA ? I would think that optimizing with that flag would ensure consistency between the MLE and MCMC inits.

2 replies

Rick-Methot-NOAA Apr 29, 2024
Maintainer

Optimizing MLE without bias adjustment produces a biased result relative to the MPD of the MCMC for all the reasons that Ian and I demonstrated with our 2011 paper. This was first noticed while reviewing the west coast Pacific Ocean Perch assessment in the late 1990s. It is less of a problem if the time series is short with uniform data regarding recdevs throughout the time series because the estimated log(R0) compensates. But with long time series of heterogeneous data quality the bias adjustment allows the MLE and MCMC to be consistent.

iantaylor-NOAA Apr 29, 2024
Maintainer

I think some of the confusion here is about when the bias adjustment gets set to 1.0 for all years. It looks like it gets triggered here: https://github.com/nmfs-ost/ss3-source-code/blob/main/SS_popdyn.tpl#L22-L26, so even if you have -mcmc set in the call to SS3, the standard bias adjustment inputs apply during optimization. I just confirmed this by looking at the ParmTrace.sso.

Cole-Monnahan-NOAA · 2024-04-29T21:34:16Z

Cole-Monnahan-NOAA
Apr 29, 2024
Collaborator

Oh wow yeah that was a misunderstanding on my part. I thought it turned it off during optimization too but that's clearly not the case. That's good to know, I've been misleading people for a while now. This e.g., my advice in the 2019 paper to optimize with the -mcmc flag to get better NUTS performance is inaccurate.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of recruitment dev_vector with MCMC using ADNUTS #593

{{title}}

Replies: 15 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Performance of recruitment dev_vector with MCMC using ADNUTS #593

alexpbell Apr 24, 2024

Describe the bug

To Reproduce

Expected behavior

Screenshots

Which OS are you seeing the problem on?

Which version of SS3 are you seeing the problem on?

Additional Context

Replies: 15 comments · 4 replies

alexpbell Apr 25, 2024 Author

Rick-Methot-NOAA Apr 25, 2024 Maintainer

alexpbell Apr 25, 2024 Author

Cole-Monnahan-NOAA Apr 25, 2024 Collaborator

alexpbell Apr 26, 2024 Author

alexpbell Apr 26, 2024 Author

Cole-Monnahan-NOAA Apr 26, 2024 Collaborator

iantaylor-NOAA Apr 26, 2024 Maintainer

Cole-Monnahan-NOAA Apr 26, 2024 Collaborator

Rick-Methot-NOAA Apr 26, 2024 Maintainer

alexpbell Apr 28, 2024 Author

Rick-Methot-NOAA Apr 29, 2024 Maintainer

Rick-Methot-NOAA Apr 29, 2024 Maintainer

alexpbell Apr 29, 2024 Author

Rick-Methot-NOAA Apr 29, 2024 Maintainer

Cole-Monnahan-NOAA Apr 29, 2024 Collaborator

Rick-Methot-NOAA Apr 29, 2024 Maintainer

iantaylor-NOAA Apr 29, 2024 Maintainer

Cole-Monnahan-NOAA Apr 29, 2024 Collaborator

alexpbell
Apr 24, 2024

Replies: 15 comments 4 replies

alexpbell
Apr 25, 2024
Author

Rick-Methot-NOAA
Apr 25, 2024
Maintainer

alexpbell
Apr 25, 2024
Author

Cole-Monnahan-NOAA
Apr 25, 2024
Collaborator

alexpbell
Apr 26, 2024
Author

alexpbell
Apr 26, 2024
Author

Cole-Monnahan-NOAA
Apr 26, 2024
Collaborator

iantaylor-NOAA
Apr 26, 2024
Maintainer

Cole-Monnahan-NOAA
Apr 26, 2024
Collaborator

Rick-Methot-NOAA
Apr 26, 2024
Maintainer

alexpbell
Apr 28, 2024
Author

Rick-Methot-NOAA Apr 29, 2024
Maintainer

Rick-Methot-NOAA
Apr 29, 2024
Maintainer

alexpbell
Apr 29, 2024
Author

Rick-Methot-NOAA Apr 29, 2024
Maintainer

Cole-Monnahan-NOAA
Apr 29, 2024
Collaborator

Rick-Methot-NOAA Apr 29, 2024
Maintainer

iantaylor-NOAA Apr 29, 2024
Maintainer

Cole-Monnahan-NOAA
Apr 29, 2024
Collaborator