promised-ai · BaxterEaves · Jan 18, 2024 · Dec 14, 2023 · Dec 14, 2023 · Dec 14, 2023
@@ -10,10 +10,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 
 - Added `plot.state` function in pylace to render PCC states
+- Added `analysis.explain_prediction` in pylace to explain predictions
+- Added `plot.prediction_explanation` in pylace to render prediction explanations
+- Added `analysis.held_out_uncertainty` in pylace
+- Added `analysis.attributable_[neglogp | inconsistrncy | uncertainty]` in pylace to quantify the amount of surprisal (neglogp), inconsistency, and uncertainty attributable to other features
 
 ### Changed
 
 - Updated all packages to have the correct SPDX for the Business Source License
+- Removed internal implimentation of `logsumexp` in favor of `rv::misc::logsumexp`
+- Update to rv 0.16.2
+- Impute and prediction uncertainty are the mean total variation distance between each state's distribution and the average distribution divided by the potential max: `(n-1) / n`, where `n` is the number of states. This normalization is meant to ensure that the interpretation is the same regardless of the number of states -- zero is lowest, one is highest.
 
 ### Fixed
 

@@ -18,10 +18,49 @@ Determining how certain the model is in its ability to capture a prediction is d
 
 Mathematically, uncertainty is formalized as the Jensen-Shannon divergence (JSD) between the state-level predictive distributions. Uncertainty goes from 0 to 1, 0 meaning that there is only one way to model a prediction, and 1 meaning that there are many ways to model a prediction and they all completely disagree.
 
-![Prediction uncertainty in unimodal data](prediction-uncertainty.png)
+<div class=tabbed-blocks>
 
-**Above.** Prediction uncertainty when predicting *Period_minutes* of a satellite in the satellites data set. Note that the uncertainty value here is driven mostly by the difference variances of the state-level predictive distributions.
+```python
+from lace import examples, plot
 
-Certain ignorance is when the model has zero data by which to make a prediction and instead falls back to the prior distribution. This is rare, but when it happens it will be apparent. To be as general as possible, the priors for a column's component distributions are generally much more broad than the predictive distribution, so if you see a predictive distribution that is senselessly wide and does not looks like the marginal distribution of that variable (which should follow the histogram of the data), you have a certain ignorance. The fix is to fill in the data for items similar to the one you are predicting.
+satellites = examples.Satellites()
+
+plot.prediction_uncertainty(
+  satellites,
+  "Period_minutes",
+  given={ "Class_of_Orbit": "GEO"}
+)
+```
+</div>
+
+{{#include html/sats-low-unc.html}}
+
+**Above.** Prediction uncertainty when predicting *Period_minutes* of a geosynchronous satellite in the satellites dataset. Uncertainty is low. Though the stat distributions differ slightly in their variance, they're relatively close, with similar means.
+
+To visualize a higher uncertainty prediction, well use `given` conditions from a record with a know data entry error.
+
+<div class=tabbed-blocks>
 
+```python
+given = sats["Intelsat 902", :].to_dicts()[0]
 
+# remove all missing data
+given = { k: v for k, v in given.items() if not pd.isnull(v) }
+
+# remove the index and the target value
+_ = row.pop("index")
+_ = row.pop("Period_minutes")
+
+plot.prediction_uncertainty(
+  satellites,
+  "Period_minutes",
+  given=given
+)
+```
+</div>
+
+{{#include html/sats-high-unc.html}}
+
+**Above.** Prediction uncertainty when predicting *Period_minutes* of Intelsat 902. Though the mean predictive distribution (black line) has a relatively low variance, there is a lot of disagreement between some of the samples, leading to high epistemic uncertainty.
+
+Certain ignorance is when the model has zero data by which to make a prediction and instead falls back to the prior distribution. This is rare, but when it happens it will be apparent. To be as general as possible, the priors for a column's component distributions are generally much more broad than the predictive distribution, so if you see a predictive distribution that is senselessly wide and does not looks like the marginal distribution of that variable (which should follow the histogram of the data), you have a certain ignorance. The fix is to fill in the data for items similar to the one you are predicting.