Introduction to Simulation Based Calibration

Target

a. Prediction, p(y.new): Predictive checks which compares the distribution of replicated y and real y.
b. Posterior: p(theta.new|theta = c): Prior and likelihood are tested assuming the consistency of computation. Any Bayesian model has the self recovering property, in which averaging over posterior distributions fitted with samples from the prior predictive distribution will always be equal to the prior distribution. SBC uses the above principle and evaluates the combination of the prior and likelihood model under a fixed computation algorithm. Users should choose one computation algorithm in advance, such as full HMC, ADVI, Laplace approximation. Properties of potential posterior distributions resulting from these calibration simulations allows us to identify common pathologies, such as overfitting and poor identifiability, that limit the utility of any resulting inferences.
c. Posterior computations: This quantifies how faithfully our computational tools represent the model that they’re fitting. One usecase would be testing approximation algorithms. Approximation based Bayesian computation is very promising but one limitation is that it can be hard to diagnose its reliability. For example, full HMC benchmark is needed to measure its error. SBC which evaluates how well an algorithm samples from the posterior distribution, given a model and a prior could be an alternative tool for measuring reliability.

Note. Calibration matters but so does sharpness; ideal conditions are unbiasedness and high precision for each.

Method

With the quantities of interest (QI) determined from the target above, different modes of QI calibration exist based on the space where coverage is test. The space is derived by marginalizing or conditioning the full joint space; a. is fully marginalized while b and c is conditional on data and parameter. When QI is forecast probability, y.new serves as QI which is compared with p.hat = E(y.new|y) on different spaces. E[QI|y] or E(y.new|y) could be thought as ground truth for simulation.

a. Unconditional coverage E[QI] = E[E[QI|y]]: both a Bayesian and frequentist property and unconditional coverage will occur if all the assumptions are true for both. e.g. E(y.new) = E(p.hat)
b. Coverage conditional on data E[QI|E[QI|y]] = E[QI|y] for all values of E[QI|y]: Bayesian calibration. It tests how well the recovered posteriors are centered around the ground truth theta value (E[QI|y]). e.g. E(y.new|p.hat) = p.hat for any p.hat
c. Coverage conditional on theta E[QI|theta] = E[E[QI|y]|theta]: frequentist calbiration. e.g. E[y.new|theta] = E[p.hat|theta]

See Andrew Gelman's writing for further detail.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction to Simulation Based Calibration

Target

Method

Clone this wiki locally