-
-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: alternative approach for censored data #1657
Comments
Thank you for opening this issue. Once I have more time, I will take a detailed look. |
Just to note: One real performance issue with the current brms implementation for censoring is to loop line by line through the data set and calling the LPDF/CDF or CCDF function every time. This avoids using the vectorised versions of these functions, which is really slow. One can easily use the vectored versions if one creates first an index vector which contains all indices of data rows which are not censored, right/left censored and then uses the vectorised function calls. This should solve to some extent the performance issue seen with large data sets. |
Makes sense. How would we combine this with threading though, where we need to obtain the write subsets of these index vectors. We would need some kind of Stan version of Rs "intersect". Do we have something like this? |
I do not think an intersect command exists, but a first version which just loops over the data set to find out which indices are from which observation type in range of the current slice will do. Not really efficient, but since this is a loop over integers it's basically for free. That should still be faster as we gain vectorized statements. ..or you make one reduce sum call per censoring type...then this can be done fast. |
Actually it's simpler: the current approach loops over all rows and adds line by line to the log lik. The new approach should loop again line by line over the data and collect all the indices for each censoring type. Then one adds at the end of that the entire log lik for all the observations in vectorized form. When doing reduce sum then the loop is not over the entire data, but only over the sub slice. If it helps I can share some dummy code demonstrating the above. |
Thanks! Some dummy code would be helpful I think! |
Something along these lines: / code pattern now (with reduce_sum):
for (n in 1:N) {
int nn = n + start - 1;
// special treatment of censored data
if (cens[nn] == 0) {
ptarget += cox_log_lpdf(Y[nn] | mu[n], bhaz[n], cbhaz[n]);
} else if (cens[nn] == 1) {
ptarget += cox_log_lccdf(Y[nn] | mu[n], bhaz[n], cbhaz[n]);
} else if (cens[nn] == -1) {
ptarget += cox_log_lcdf(Y[nn] | mu[n], bhaz[n], cbhaz[n]);
}
}
// better code pattern:
{
int num_event = 0;
int num_censor_right = 0;
int num_censor_left = 0;
array[N] int idx_event;
array[N] int idx_censor_right;
array[N] int idx_censor_left;
for (n in 1:N) {
int nn = n + start - 1;
// special treatment of censored data
if (cens[nn] == 0) {
num_event += 1;
idx_event[num_event] = nn;
} else if (cens[nn] == 1) {
num_censor_right += 1;
idx_censor_right[num_censor_right] = nn;
} else if (cens[nn] == -1) {
num_censor_left += 1;
idx_censor_left[num_censor_left] = nn;
}
}
ptarget += cox_log_lpdf(Y[idx_event[1:num_event] - start + 1] | mu[idx_event[1:num_event]], bhaz[idx_event[1:num_event]], cbhaz[idx_event[1:num_event]]);
ptarget += cox_log_lccdf(Y[idx_censor_right[1:num_censor_right] - start + 1] | mu[idx_censor_right[1:num_censor_right]], bhaz[idx_censor_right[1:num_censor_right]], cbhaz[idx_censor_right[1:num_censor_right]]);
ptarget += cox_log_lcdf(Y[idx_censor_left[1:num_censor_left] - start + 1] | mu[idx_censor_left[1:num_censor_left]], bhaz[idx_censor_left[1:num_censor_left]], cbhaz[idx_censor_left[1:num_censor_left]]);
} |
makes sense! Thank you! |
Shouldn't the pattern rather be:(?)
This way, |
I guess you are right here on the indexing stuff. I did not test my code, but the key idea is to collect the indices during the loop and then doing the appropriate vectored statements when calling the lpdfs. |
Okay. The feature is almost complete. However, when I parse it with threading, I get the error |
no idea... brute force would be to create at the same time two index vectors per case... one for the shifted and one for the not shifted stuff.. not nice, but should work for sure... |
or you create a function which does the offset to the first array in a loop element wise. That's another option. |
yeah that is what I will go for I guess.
wds15 ***@***.***> schrieb am Mi., 18. Sept. 2024, 17:36:
… or you create a function which does the offset to the first array in a
loop element wise. That's another option.
—
Reply to this email directly, view it on GitHub
<#1657 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2AEWIWNPT5VJ2SM4KPDZXGFVZAVCNFSM6AAAAABH6LDEMKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJYGY2TEMZZGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
This feature is now implemented. I am pretty sure it is correct, but I would appreciate if you would double check both the non-threaded and the threaded Stan code, for example by running: make_stancode(time | cens(censored) ~ age * sex + disease + (1|patient),
data = kidney, family = lognormal())
make_stancode(time | cens(censored) ~ age * sex + disease + (1|patient),
data = kidney, family = lognormal(), threads = threading(2)) |
will do |
The Stan code looks good to me in both cases. I also ran as a sanity check this one:
Doing so makes the second fit run the very same likelihood, since the parallelisation is turned off via the grain size. I am getting matching numerical results. I have not checked interval censoring, which is not part of this example.... but I guess it's just fine. Very cool to have this coded now like this. This will be super useful for truncated data as well! |
Thank you for checking! Truncation does not yet support auto-vectorized code I think. Perhaps you can check and open a new issue with an example if you think there is something to be optimized as well. |
Uhm... this is not enabled for Cox? I also confirmed that we gain speed here library(brms)
set.seed(12345)
covs <- data.frame(id = 1:10000, trt = stats::rbinom(10000, 1L, 0.5))
d1 <- simsurv::simsurv(lambdas = 0.1, gammas = 1.5, betas = c(trt = -0.5),
x = covs, maxt = 5)
d1 <- merge(d1, covs)
d1$g <- sample(c("a", "b"), nrow(d1), TRUE)
.cache_dir <- here::here("_brms-cache", Sys.info()["machine"])
options(brms.backend="cmdstanr", cmdstanr_write_stan_file_dir=.cache_dir)
# create cache directory if not yet available
dir.create(.cache_dir, FALSE, TRUE)
fit1 <- brm(eventtime | cens(1 - status) ~ 1 + trt,
data = d1, family = lognormal(), refresh = 250, chains=1, seed=3546546)
## 6s
.cache_dir <- here::here("_brms-cache-dev", Sys.info()["machine"])
options(brms.backend="cmdstanr", cmdstanr_write_stan_file_dir=.cache_dir)
# create cache directory if not yet available
dir.create(.cache_dir, FALSE, TRUE)
pkgload::load_all("/home/rdocker/rwork/brms")
fit1_new <- brm(eventtime | cens(1 - status) ~ 1 + trt,
data = d1, family = lognormal(), refresh = 250, chains=1, seed=3546546)
## 3.6s I will check truncation... pretty sure the same pattern will be fruitful there as well...next issue coming once ready.. |
Not enabled for cox since we would need vectorized lcdf etc functions for it. Does Stan already support function overloading (same function name with different signatures) for functions defined by the user? |
Cox models now also have a vectorized version and the speed up seems to be clearly noticeable from my checks. I am pretty sure the math is correct, but would you mind double checking in https://github.com/paul-buerkner/brms/blob/master/inst/chunks/fun_cox.stan ? |
Thanks for working on this, however, the current approach may have issues. I installed brms 2.22 (current status on main branch) and reran the original reprex (first post in this thread). I observed the following:
I attach the brms generated stan files that are needed to run this reprex locally (that was the easiest approach I came up with to deal with multiple brms versions - the reprex uses cmdstanr to fit the models). library(tidyverse)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.22.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#>
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#>
#> ar
## generate data from beta distribution
set.seed(2154)
mu <- 0.5
phi <- 2
true_data <- tibble(
y = rbeta(100, shape1 = mu * phi, shape2 = (1 - mu) * phi)
)
## Censoring
censored_data <- true_data %>%
mutate(
y_left = floor(y * 10) / 10,
y_right = ceiling(y * 10) / 10,
y_left = ifelse(y_left == 0, 1e-7, y_left),
y_right = ifelse(y_right == 1, 1 - 1e-7, y_right),
censoring = case_when(
y_right == 0.1 ~ "left",
y_left == 0.9 ~ "right",
TRUE ~ "interval"
),
y_cens = case_when(
censoring == "left" ~ y_right,
censoring == "right" ~ y_left,
TRUE ~ y_left
)
)
beta_stan_data <- brms::make_standata(
y_cens | cens(censoring, y_right) ~ 1,
family = brms::Beta(),
data = censored_data)
# compile models
integrate_out_not_vectorized <- cmdstanr::cmdstan_model(
"beta_censored_brms_not_vectorized.stan" # this is brms 2.21
)
integrate_out_vectorized <- cmdstanr::cmdstan_model(
"beta_censored_brms_vectorized.stan" # this is brms 2.22
)
augmented_not_vectorized <- cmdstanr::cmdstan_model(
"bda.stan" # this is data augmentation model, not vectorized
)
# fit models
integrated_out_not_vectorized <- integrate_out_not_vectorized$sample(
data = beta_stan_data,
seed = 123,
chains = 4,
parallel_chains = 1
)
#> Running MCMC with 4 sequential chains...
#>
#> Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpcF6k12/model-4cecd294f9.stan', line 20, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Iteration: 100 / 2000 [ 5%] (Warmup)
#> Chain 1 Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 1 Iteration: 300 / 2000 [ 15%] (Warmup)
#> Chain 1 Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
#> Chain 1 Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 1 Iteration: 700 / 2000 [ 35%] (Warmup)
#> Chain 1 Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 1 Iteration: 900 / 2000 [ 45%] (Warmup)
#> Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 1 Iteration: 1100 / 2000 [ 55%] (Sampling)
#> Chain 1 Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 1 Iteration: 1300 / 2000 [ 65%] (Sampling)
#> Chain 1 Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
#> Chain 1 Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 1 Iteration: 1700 / 2000 [ 85%] (Sampling)
#> Chain 1 Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 1 Iteration: 1900 / 2000 [ 95%] (Sampling)
#> Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 1 finished in 35.3 seconds.
#> Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: beta_lccdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpcF6k12/model-4cecd294f9.stan', line 34, column 8 to column 68)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: beta_lccdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpcF6k12/model-4cecd294f9.stan', line 34, column 8 to column 68)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Iteration: 100 / 2000 [ 5%] (Warmup)
#> Chain 2 Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 2 Iteration: 300 / 2000 [ 15%] (Warmup)
#> Chain 2 Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
#> Chain 2 Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 2 Iteration: 700 / 2000 [ 35%] (Warmup)
#> Chain 2 Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 2 Iteration: 900 / 2000 [ 45%] (Warmup)
#> Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 2 Iteration: 1100 / 2000 [ 55%] (Sampling)
#> Chain 2 Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 2 Iteration: 1300 / 2000 [ 65%] (Sampling)
#> Chain 2 Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
#> Chain 2 Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 2 Iteration: 1700 / 2000 [ 85%] (Sampling)
#> Chain 2 Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 2 Iteration: 1900 / 2000 [ 95%] (Sampling)
#> Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 2 finished in 23.3 seconds.
#> Chain 3 Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpcF6k12/model-4cecd294f9.stan', line 20, column 2 to column 41)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Iteration: 100 / 2000 [ 5%] (Warmup)
#> Chain 3 Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 3 Iteration: 300 / 2000 [ 15%] (Warmup)
#> Chain 3 Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 3 Iteration: 500 / 2000 [ 25%] (Warmup)
#> Chain 3 Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 3 Iteration: 700 / 2000 [ 35%] (Warmup)
#> Chain 3 Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 3 Iteration: 900 / 2000 [ 45%] (Warmup)
#> Chain 3 Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 3 Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 3 Iteration: 1100 / 2000 [ 55%] (Sampling)
#> Chain 3 Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 3 Iteration: 1300 / 2000 [ 65%] (Sampling)
#> Chain 3 Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 3 Iteration: 1500 / 2000 [ 75%] (Sampling)
#> Chain 3 Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 3 Iteration: 1700 / 2000 [ 85%] (Sampling)
#> Chain 3 Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 3 Iteration: 1900 / 2000 [ 95%] (Sampling)
#> Chain 3 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 3 finished in 22.8 seconds.
#> Chain 4 Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: beta_lccdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpcF6k12/model-4cecd294f9.stan', line 34, column 8 to column 68)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: beta_lccdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpcF6k12/model-4cecd294f9.stan', line 34, column 8 to column 68)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Iteration: 100 / 2000 [ 5%] (Warmup)
#> Chain 4 Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 4 Iteration: 300 / 2000 [ 15%] (Warmup)
#> Chain 4 Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 4 Iteration: 500 / 2000 [ 25%] (Warmup)
#> Chain 4 Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 4 Iteration: 700 / 2000 [ 35%] (Warmup)
#> Chain 4 Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 4 Iteration: 900 / 2000 [ 45%] (Warmup)
#> Chain 4 Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 4 Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 4 Iteration: 1100 / 2000 [ 55%] (Sampling)
#> Chain 4 Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 4 Iteration: 1300 / 2000 [ 65%] (Sampling)
#> Chain 4 Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 4 Iteration: 1500 / 2000 [ 75%] (Sampling)
#> Chain 4 Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 4 Iteration: 1700 / 2000 [ 85%] (Sampling)
#> Chain 4 Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 4 Iteration: 1900 / 2000 [ 95%] (Sampling)
#> Chain 4 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 4 finished in 20.9 seconds.
#>
#> All 4 chains finished successfully.
#> Mean chain execution time: 25.6 seconds.
#> Total execution time: 103.1 seconds.
integrated_out_vectorized <- integrate_out_vectorized$sample(
data = beta_stan_data,
seed = 123,
chains = 4,
parallel_chains = 1
)
#> Running MCMC with 4 sequential chains...
#>
#> Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp80ePEl/model-1cdc46c86c94.stan', line 47, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Iteration: 100 / 2000 [ 5%] (Warmup)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: beta_lccdf: Second shape parameter[1] is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp80ePEl/model-1cdc46c86c94.stan', line 58, column 4 to column 109)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 1 Iteration: 300 / 2000 [ 15%] (Warmup)
#> Chain 1 Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
#> Chain 1 Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 1 Iteration: 700 / 2000 [ 35%] (Warmup)
#> Chain 1 Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 1 Iteration: 900 / 2000 [ 45%] (Warmup)
#> Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 1 Iteration: 1100 / 2000 [ 55%] (Sampling)
#> Chain 1 Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 1 Iteration: 1300 / 2000 [ 65%] (Sampling)
#> Chain 1 Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
#> Chain 1 Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 1 Iteration: 1700 / 2000 [ 85%] (Sampling)
#> Chain 1 Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 1 Iteration: 1900 / 2000 [ 95%] (Sampling)
#> Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 1 finished in 23.6 seconds.
#> Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: beta_lccdf: First shape parameter[1] is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp80ePEl/model-1cdc46c86c94.stan', line 58, column 4 to column 109)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Iteration: 100 / 2000 [ 5%] (Warmup)
#> Chain 2 Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 2 Iteration: 300 / 2000 [ 15%] (Warmup)
#> Chain 2 Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
#> Chain 2 Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 2 Iteration: 700 / 2000 [ 35%] (Warmup)
#> Chain 2 Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 2 Iteration: 900 / 2000 [ 45%] (Warmup)
#> Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 2 Iteration: 1100 / 2000 [ 55%] (Sampling)
#> Chain 2 Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 2 Iteration: 1300 / 2000 [ 65%] (Sampling)
#> Chain 2 Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
#> Chain 2 Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 2 Iteration: 1700 / 2000 [ 85%] (Sampling)
#> Chain 2 Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 2 Iteration: 1900 / 2000 [ 95%] (Sampling)
#> Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 2 finished in 20.2 seconds.
#> Chain 3 Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: beta_lccdf: Second shape parameter[1] is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp80ePEl/model-1cdc46c86c94.stan', line 58, column 4 to column 109)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Iteration: 100 / 2000 [ 5%] (Warmup)
#> Chain 3 Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 3 Iteration: 300 / 2000 [ 15%] (Warmup)
#> Chain 3 Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 3 Iteration: 500 / 2000 [ 25%] (Warmup)
#> Chain 3 Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 3 Iteration: 700 / 2000 [ 35%] (Warmup)
#> Chain 3 Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 3 Iteration: 900 / 2000 [ 45%] (Warmup)
#> Chain 3 Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 3 Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 3 Iteration: 1100 / 2000 [ 55%] (Sampling)
#> Chain 3 Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 3 Iteration: 1300 / 2000 [ 65%] (Sampling)
#> Chain 3 Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 3 Iteration: 1500 / 2000 [ 75%] (Sampling)
#> Chain 3 Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 3 Iteration: 1700 / 2000 [ 85%] (Sampling)
#> Chain 3 Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 3 Iteration: 1900 / 2000 [ 95%] (Sampling)
#> Chain 3 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 3 finished in 22.4 seconds.
#> Chain 4 Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: beta_lccdf: Second shape parameter[1] is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp80ePEl/model-1cdc46c86c94.stan', line 58, column 4 to column 109)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Iteration: 100 / 2000 [ 5%] (Warmup)
#> Chain 4 Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 4 Iteration: 300 / 2000 [ 15%] (Warmup)
#> Chain 4 Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 4 Iteration: 500 / 2000 [ 25%] (Warmup)
#> Chain 4 Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 4 Iteration: 700 / 2000 [ 35%] (Warmup)
#> Chain 4 Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 4 Iteration: 900 / 2000 [ 45%] (Warmup)
#> Chain 4 Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 4 Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 4 Iteration: 1100 / 2000 [ 55%] (Sampling)
#> Chain 4 Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 4 Iteration: 1300 / 2000 [ 65%] (Sampling)
#> Chain 4 Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 4 Iteration: 1500 / 2000 [ 75%] (Sampling)
#> Chain 4 Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 4 Iteration: 1700 / 2000 [ 85%] (Sampling)
#> Chain 4 Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 4 Iteration: 1900 / 2000 [ 95%] (Sampling)
#> Chain 4 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 4 finished in 22.8 seconds.
#>
#> All 4 chains finished successfully.
#> Mean chain execution time: 22.2 seconds.
#> Total execution time: 89.5 seconds.
augmented_not_vectorized <- augmented_not_vectorized$sample(
data = beta_stan_data,
seed = 123,
chains = 4,
parallel_chains = 1
)
#> Running MCMC with 4 sequential chains...
#>
#> Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 113, column 6 to column 78)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Iteration: 100 / 2000 [ 5%] (Warmup)
#> Chain 1 Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 1 Iteration: 300 / 2000 [ 15%] (Warmup)
#> Chain 1 Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
#> Chain 1 Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 1 Iteration: 700 / 2000 [ 35%] (Warmup)
#> Chain 1 Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 1 Iteration: 900 / 2000 [ 45%] (Warmup)
#> Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 1 Iteration: 1100 / 2000 [ 55%] (Sampling)
#> Chain 1 Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 1 Iteration: 1300 / 2000 [ 65%] (Sampling)
#> Chain 1 Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
#> Chain 1 Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 1 Iteration: 1700 / 2000 [ 85%] (Sampling)
#> Chain 1 Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 1 Iteration: 1900 / 2000 [ 95%] (Sampling)
#> Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 1 finished in 1.2 seconds.
#> Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: beta_lpdf: First shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 113, column 6 to column 78)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Iteration: 100 / 2000 [ 5%] (Warmup)
#> Chain 2 Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 2 Iteration: 300 / 2000 [ 15%] (Warmup)
#> Chain 2 Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
#> Chain 2 Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 2 Iteration: 700 / 2000 [ 35%] (Warmup)
#> Chain 2 Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 2 Iteration: 900 / 2000 [ 45%] (Warmup)
#> Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 2 Iteration: 1100 / 2000 [ 55%] (Sampling)
#> Chain 2 Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 2 Iteration: 1300 / 2000 [ 65%] (Sampling)
#> Chain 2 Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
#> Chain 2 Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 2 Iteration: 1700 / 2000 [ 85%] (Sampling)
#> Chain 2 Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 2 Iteration: 1900 / 2000 [ 95%] (Sampling)
#> Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 2 finished in 1.1 seconds.
#> Chain 3 Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 3 Iteration: 100 / 2000 [ 5%] (Warmup)
#> Chain 3 Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 3 Iteration: 300 / 2000 [ 15%] (Warmup)
#> Chain 3 Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 3 Iteration: 500 / 2000 [ 25%] (Warmup)
#> Chain 3 Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 3 Iteration: 700 / 2000 [ 35%] (Warmup)
#> Chain 3 Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 3 Iteration: 900 / 2000 [ 45%] (Warmup)
#> Chain 3 Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 3 Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 3 Iteration: 1100 / 2000 [ 55%] (Sampling)
#> Chain 3 Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 3 Iteration: 1300 / 2000 [ 65%] (Sampling)
#> Chain 3 Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 3 Iteration: 1500 / 2000 [ 75%] (Sampling)
#> Chain 3 Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 3 Iteration: 1700 / 2000 [ 85%] (Sampling)
#> Chain 3 Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 3 Iteration: 1900 / 2000 [ 95%] (Sampling)
#> Chain 3 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 3 finished in 1.3 seconds.
#> Chain 4 Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 113, column 6 to column 78)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Iteration: 100 / 2000 [ 5%] (Warmup)
#> Chain 4 Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 4 Iteration: 300 / 2000 [ 15%] (Warmup)
#> Chain 4 Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 4 Iteration: 500 / 2000 [ 25%] (Warmup)
#> Chain 4 Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 4 Iteration: 700 / 2000 [ 35%] (Warmup)
#> Chain 4 Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 4 Iteration: 900 / 2000 [ 45%] (Warmup)
#> Chain 4 Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 4 Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 4 Iteration: 1100 / 2000 [ 55%] (Sampling)
#> Chain 4 Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 4 Iteration: 1300 / 2000 [ 65%] (Sampling)
#> Chain 4 Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 4 Iteration: 1500 / 2000 [ 75%] (Sampling)
#> Chain 4 Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 4 Iteration: 1700 / 2000 [ 85%] (Sampling)
#> Chain 4 Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 4 Iteration: 1900 / 2000 [ 95%] (Sampling)
#> Chain 4 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 4 finished in 1.2 seconds.
#>
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.2 seconds.
#> Total execution time: 5.3 seconds.
# compare model timings
integrated_out_not_vectorized$time()
#> $total
#> [1] 103.0832
#>
#> $chains
#> chain_id warmup sampling total
#> 1 1 14.599 20.725 35.324
#> 2 2 10.685 12.591 23.276
#> 3 3 11.129 11.637 22.766
#> 4 4 10.009 10.877 20.886
integrated_out_vectorized$time()
#> $total
#> [1] 89.54901
#>
#> $chains
#> chain_id warmup sampling total
#> 1 1 11.491 12.068 23.559
#> 2 2 10.855 9.370 20.225
#> 3 3 11.770 10.626 22.396
#> 4 4 11.594 11.174 22.768
augmented_not_vectorized$time()
#> $total
#> [1] 5.272309
#>
#> $chains
#> chain_id warmup sampling total
#> 1 1 0.626 0.529 1.155
#> 2 2 0.601 0.544 1.145
#> 3 3 0.659 0.613 1.272
#> 4 4 0.628 0.539 1.167
# compare model fits
integrated_out_not_vectorized$summary(variables = c("b_Intercept", "phi"))
#> # A tibble: 2 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 b_Intercept 0.187 0.185 0.106 0.105 0.0160 0.364 1.00 3398. 2949.
#> 2 phi 2.43 2.41 0.338 0.334 1.90 3.02 1.00 3527. 2585.
integrated_out_vectorized$summary(variables = c("b_Intercept", "phi"))
#> # A tibble: 2 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 b_Inter… -1.83 -1.82 0.291 0.282 -2.32 -1.36 1.01 2370. 1974.
#> 2 phi 0.0387 0.0260 0.0396 0.0271 0.00180 0.118 1.00 1602. 984.
augmented_not_vectorized$summary(variables = c("b_Intercept", "phi"))
#> # A tibble: 2 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 b_Intercept 0.187 0.188 0.110 0.110 0.00514 0.365 1.00 9034. 3182.
#> 2 phi 2.42 2.40 0.329 0.321 1.91 3.00 1.00 6914. 2742.
drw_ionv <- posterior::as_draws_rvars(integrated_out_not_vectorized)
drw_iov <- posterior::as_draws_rvars(integrated_out_vectorized)
drw_anv <- posterior::as_draws_rvars(augmented_not_vectorized)
ionv_mu_phi <- drw_ionv %>%
as_draws_df() %>%
select(Intercept, phi) %>%
mutate(
mu = plogis(Intercept),
model = "integrate out (not vectorized)"
)
#> Warning: Dropping 'draws_df' class as required metadata was removed.
iov_mu_phi <- drw_iov %>%
as_draws_df() %>%
select(Intercept, phi) %>%
mutate(
mu = plogis(Intercept),
model = "integrate out (vectorized)"
)
#> Warning: Dropping 'draws_df' class as required metadata was removed.
anv_mu_phi <- drw_anv %>%
as_draws_df() %>%
select(Intercept, phi) %>%
mutate(
mu = plogis(Intercept),
model = "data augmentation"
)
#> Warning: Dropping 'draws_df' class as required metadata was removed.
bind_rows(ionv_mu_phi, iov_mu_phi, anv_mu_phi) %>%
pivot_longer(cols = c("phi", "mu")) %>%
ggplot() +
geom_density(aes(x = value, colour = model)) +
geom_vline(
data = data.frame(
name = c("mu", "phi"),
value = c(mu, phi)
),
aes(xintercept = value)) +
facet_wrap(~name, scales = "free") Created on 2024-09-22 with reprex v2.1.1 Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.1 (2024-06-14 ucrt)
#> os Windows 10 x64 (build 19045)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Dutch_Belgium.utf8
#> ctype Dutch_Belgium.utf8
#> tz Europe/Brussels
#> date 2024-09-22
#> pandoc 3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> ! package * version date (UTC) lib source
#> abind 1.4-8 2024-09-12 [1] CRAN (R 4.4.1)
#> backports 1.5.0 2024-05-23 [1] CRAN (R 4.4.0)
#> bayesplot 1.11.1.9000 2024-09-01 [1] https://stan-dev.r-universe.dev (R 4.4.1)
#> bridgesampling 1.1-2 2021-04-16 [1] CRAN (R 4.4.0)
#> brms * 2.22.0 2024-09-22 [1] Github (paul-buerkner/brms@9053e9f)
#> Brobdingnag 1.2-9 2022-10-19 [1] CRAN (R 4.4.0)
#> checkmate 2.3.2 2024-07-29 [1] CRAN (R 4.4.1)
#> cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.1)
#> cmdstanr 0.8.1 2024-08-05 [1] https://stan-dev.r-universe.dev (R 4.4.0)
#> coda 0.19-4.1 2024-01-31 [1] CRAN (R 4.4.0)
#> codetools 0.2-20 2024-03-31 [2] CRAN (R 4.4.1)
#> colorspace 2.1-1 2024-07-26 [1] CRAN (R 4.4.1)
#> curl 5.2.1 2024-03-01 [1] CRAN (R 4.4.0)
#> data.table 1.15.4 2024-03-30 [1] CRAN (R 4.4.0)
#> digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
#> distributional 0.5.0 2024-09-17 [1] CRAN (R 4.4.1)
#> dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.4.0)
#> emmeans 1.10.4 2024-08-21 [1] CRAN (R 4.4.1)
#> estimability 1.5.1 2024-05-12 [1] CRAN (R 4.4.0)
#> evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.1)
#> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0)
#> farver 2.1.2 2024-05-13 [1] CRAN (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
#> forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.4.0)
#> fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0)
#> ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.4.0)
#> glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
#> gtable 0.3.5 2024-04-22 [1] CRAN (R 4.4.0)
#> highr 0.11 2024-05-26 [1] CRAN (R 4.4.0)
#> hms 1.1.3 2023-03-21 [1] CRAN (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#> jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0)
#> knitr 1.48 2024-07-07 [1] CRAN (R 4.4.1)
#> labeling 0.4.3 2023-08-29 [1] CRAN (R 4.4.0)
#> lattice 0.22-6 2024-03-20 [1] CRAN (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
#> loo 2.8.0.9000 2024-08-05 [1] https://stan-dev.r-universe.dev (R 4.4.0)
#> lubridate * 1.9.3.9000 2024-05-28 [1] https://inbo.r-universe.dev (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
#> MASS 7.3-61 2024-06-13 [2] CRAN (R 4.4.1)
#> Matrix 1.7-0 2024-03-22 [1] CRAN (R 4.4.0)
#> matrixStats 1.4.1 2024-09-08 [1] CRAN (R 4.4.1)
#> multcomp 1.4-26 2024-07-18 [1] CRAN (R 4.4.1)
#> munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.0)
#> mvtnorm 1.3-1 2024-09-03 [1] CRAN (R 4.4.1)
#> nlme 3.1-166 2024-08-14 [2] CRAN (R 4.4.1)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
#> posterior 1.6.0 2024-07-03 [1] CRAN (R 4.4.1)
#> processx 3.8.4 2024-03-16 [1] CRAN (R 4.4.0)
#> ps 1.8.0 2024-09-12 [1] CRAN (R 4.4.1)
#> purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
#> Rcpp * 1.0.13 2024-07-17 [1] CRAN (R 4.4.1)
#> D RcppParallel 5.1.9 2024-08-19 [1] CRAN (R 4.4.1)
#> readr * 2.1.5 2024-01-10 [1] CRAN (R 4.4.0)
#> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.4.1)
#> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.1)
#> rmarkdown 2.28 2024-08-17 [1] CRAN (R 4.4.1)
#> rstantools 2.4.0.9000 2024-08-18 [1] https://stan-dev.r-universe.dev (R 4.4.0)
#> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0)
#> sandwich 3.1-0 2023-12-11 [1] CRAN (R 4.4.0)
#> scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
#> stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0)
#> stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.4.0)
#> survival 3.7-0 2024-06-05 [2] CRAN (R 4.4.1)
#> tensorA 0.36.2.1 2023-12-13 [1] CRAN (R 4.4.0)
#> TH.data 1.1-2 2023-04-17 [1] CRAN (R 4.4.0)
#> tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
#> tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
#> tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.4.0)
#> timechange 0.3.0 2024-01-18 [1] CRAN (R 4.4.0)
#> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
#> withr 3.0.1 2024-07-31 [1] CRAN (R 4.4.1)
#> xfun 0.47 2024-08-17 [1] CRAN (R 4.4.1)
#> xml2 1.3.6 2023-12-04 [1] CRAN (R 4.4.0)
#> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0)
#> yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.1)
#> zoo 1.8-12 2023-04-13 [1] CRAN (R 4.4.0)
#>
#> [1] C:/R/library
#> [2] C:/R/R-4.4.1/library
#>
#> D ── DLL MD5 mismatch, broken installation.
#>
#> ────────────────────────────────────────────────────────────────────────────── |
Thank you for your detailed reprex! I will look into it and fix it! |
Got it. The issue was in the interval censored case. There, I treated the PDFs as returning a vector when being passed vectors as inputs, which of course the don't (they just return a scalar). I think the best approach is to deactivate the vectorization for interval censored data. As for the data augmention, this indeed seems to help a whole lot in terms of speed but may be more cumbersome to implement. I will leave this issue open in case I or someone else wants to tackle it in the future. Thank you again for your help! |
For future reference, here is a vectorized version of the beta-distribution data augmentation model bda_vectorized.zip. Vectorization is trivial in this case, also for interval censoring. To try it, use the code in #1657 (comment) |
I will certainly try the data augmentation approach. In fact, I‘d expect that it should be simple to implement in brms given that brms supports missing data already. As I understand this approach, it is basically missing data with a lower bound. Talking to colleague he said that this approach is also used by JAGS. |
@paul-buerkner reading through the cox code I don‘t find any glitch. The only thing I noticed is that Log1m_exp(- xy) Can obviously be written as Log1p_exp(xy) But I would really leave in as a comment the original log1m_exp(-xy) line in a commented way, since it is more direct in turning the ccdf into a cdf…will also test the new approach on an example data set next. |
Context:
I often work with data that are censored. The approach for censored data implemented in
brms
is really useful, but I have found it to be slow - especially when a large part of the data are censored and with distributions where it is costly to integrate out censored data.The approach taken by
brms
is explained in the stan user guide: https://mc-stan.org/docs/stan-users-guide/truncation-censoring.html#integrating-out-censored-valuesThis involves integrating out the PDF between the censoring bounds and therefore requires the CDF and/or CCDF functions depending on left, right or interval censoring.
Alternative:
The stan user guide also mentions an alternative approach: https://mc-stan.org/docs/stan-users-guide/truncation-censoring.html#estimating-censored-values
I have found this approach to be much faster (about 30 times faster in below reprex) and it recovers the same parameter estimates. In this approach, the censored data are declared as parameters to be estimated. I think this approach is called a data augmentation (imputation) approach and is discussed in https://academic.oup.com/jrsssb/article-abstract/55/1/3/7035892 (section 6.3). It only requires the PDF function.
Notes:
The below reprex is for a beta distribution. I have extracted the brms generated stan code and changed this according to the User's guide, but without changing how brms requires the data to be formatted (so standata() is the same in both implementations). I am not very familiar with programming in stan, so some further efficiency gains are probably possible.
I have also tested this with a Gamma distribution and this worked too, but the speed-up is not that spectacular. I can imagine that for a Gaussian distribution there would be no speed-up.
Feature request:
Would you be willing to consider implementing this approach in
brms
? Maybe not as a replacement to the current implementation, but as an alternative. Perhaps an extra argument in thecens()
function to switch between both alternatives?I think this approach might not work for discrete distributions because stan cannot sample discrete parameters? Although I have a feeling this might be related: https://users.aalto.fi/~johnsoa2/notebooks/CopulaIntro.html#bayesian-estimation-data-augmentation. (haven't thought this through)
Reproducible example:
Created on 2024-05-19 with reprex v2.1.0
Session info
The text was updated successfully, but these errors were encountered: