In this approach, we use bootstrapping to estimate prediction uncertainty (such as confidence intervals) by resampling the training data, training multiple models, and then aggregating their predictions.
-
Description:
Create many bootstrapped samples (with replacement) from your training data. -
Purpose:
Each bootstrapped sample provides a slightly different version of the training data, allowing you to capture the variability and uncertainty inherent in the data.
-
Description:
For each bootstrapped sample, train your classification (or regression) model. -
Purpose:
This process results in a collection of models, each reflecting variations in the training data. The ensemble of models helps to capture the uncertainty in the model's predictions.
-
Description:
For a given input, collect the predictions from all the models trained on the different bootstrapped samples. -
Purpose:
The variability among these predictions provides an empirical estimate of the model's uncertainty on that input.
-
Description:
Analyze the distribution of predictions to compute quantiles. For instance, calculate the 2.5th and 97.5th percentiles of the predictions. -
Example:
The 2.5th and 97.5th percentiles can serve as the bounds of a 95% prediction interval. -
Purpose:
These quantiles form a confidence interval around the predicted value, giving an estimate of prediction uncertainty.
Using bootstrapping to assess prediction uncertainty involves:
- Resampling your training data to create multiple bootstrapped datasets.
- Training a separate model on each bootstrapped sample.
- Aggregating the predictions for a specific input across all models.
- Computing quantiles (e.g., the 2.5th and 97.5th percentiles)