promised-ai · BaxterEaves · Sep 29, 2023 · Sep 28, 2023 · Sep 28, 2023 · Sep 28, 2023
@@ -68,6 +68,26 @@ distribution over your dataset, which enables users to...
 
 and more, all in one place, without any explicit model building.
 
+```python
+import pandas as pd
+import lace
+
+# Create an engine from a dataframe
+df = pd.read_csv("animals.csv", index_col=0)
+engine = lace.Engine.from_df(df)
+
+# Fit a model to the dataframe over 5000 steps of the fitting procedure
+engine.update(5000)
+
+# Show the statistical structure of the data -- which features are likely
+# dependent (predictive) on each other
+engine.clustermap("depprob", zmin=0, zmax=1)
+```
+
+![Animals dataset dependence probability](assets/animals-depprob.png)
+
+
+
 ## The Problem
 
 The goal of lace is to fill some of the massive chasm between standard machine
@@ -105,36 +125,62 @@ themselves from scratch, meaning they must know (or at least guess) the model.
 PPL users must also know how to specify such a model in a way that is
 compatible with the underlying inference procedure.
 
-### Who should not use lace
+### Example use cases
+
+- **Combine data sources and understand how they interact.** For example, we
+    may wish to predict cognitive decline from demographics, survey or task
+    performance, EKG data, and other clinical data. Combined, this data would
+    typically be very sparse (most patients will not have all fields filled
+    in), and it is difficult to know how to explicitly model the interaction of
+    these data layers. In Lace, we would just concatenate the layers and run
+    them through.
+- **Understanding the amount and causes of uncertainty over time.** For
+    example, a farmer may wish to understand the likelihood of achieving a
+    specific yield over the growing season. As the season progresses, new
+    weather data can be added to the prediction in the form of conditions.
+    Uncertainty can be visualized as variance in the prediction, disagreement
+    between posterior samples, or multi-modality in the predictive distribution
+    (see [this blog post](https://redpoll.ai/blog/ml-uncertainty/) for more
+    information on uncertainty)
+- **Data quality control.** Use `surprisal` to find anomalous data in the table
+    and use `-logp` to identify anomalies before they enter the table. Because
+    Lace creates a model of the data, we can also contrive methods to find data
+    that are *inconsistent* with that model, which we have used to good effect
+    in error finding.
+
+### Who should not use Lace
 
 There are a number of use cases for which Lace is not suited
 
 - Non-tabular data such as images and text
 - Highly optimizing specific predictions
     + Lace would rather over-generalize than over fit.
 
+
 ## Quick start
 
-Install the CLI and pylace (requires [rust and
-cargo](https://www.rust-lang.org/tools/install))
+### Installation
 
-```console
+Lace requires rust.
+
+To install the CLI:
+```
 $ cargo install --locked lace
-$ pip install py-lace
 ```
 
-First, use the CLI to fit a model to your data
+To install pylace
 
-```console
-$ lace run --csv satellites.csv -n 5000 -s 32 --seed 1337 satellites.lace 
+```
+$ pip install pylace
 ```
 
-Then load the model and start asking questions
+### Examples
 
+Lace comes with two pre-fit example data sets: Satellites and Animals.
 
 ```python
->>> from lace import Engine
->>> engine = Engine(metadata='satellites.lace')
+>>> from lace.examples import Satellites
+>>> engine = Satellites()
 
 # Predict the class of orbit given the satellite has a 75-minute
 # orbital period and that it has a missing value of geosynchronous
@@ -176,9 +222,13 @@ And similarly in rust:
 
 ```rust,noplayground
 use lace::prelude::*;
+use lace::examples::Example;
 
 fn main() {	
-    let mut engine = Engine::load("satellites.lace").unrwap();
+    // In rust, you can create an Engine or and Oracle. The Oracle is an
+    // immutable version of an Engine; it has the same inference functions as
+    // the Engine, but you cannot train or edit data.
+    let mut engine = Example::Satellites.engine().unwrap();
 
     // Predict the class of orbit given the satellite has a 75-minute
     // orbital period and that it has a missing value of geosynchronous
@@ -196,6 +246,33 @@ fn main() {
 }
 ```
 
+### Fitting a model
+
+To fit a model to your own data you can use the CLI
+
+```console
+$ lace run --csv my-data.csv -n 1000 my-data.lace
+```
+
+...or initialize an engine from a file or dataframe.
+
+```python
+>>> import pandas as pd  # Lace supports polars as well
+>>> from lace import Engine
+>>> engine = Engine.from_df(pd.read_csv("my-data.csv", index_col=0))
+>>> engine.update(1_000)
+>>> engine.save("my-data.lace")
+```
+
+You can monitor the progress of the training using diagnostic plots
+
+```python
+>>> from lace.plot import diagnostics
+>>> diagnostics(engine)
+```
+
+![Animals MCMC convergence](assets/animals-convergence.png)
+
 ## License
 
 Lace is licensed under Server Side Public License (SSPL), which is a copyleft

@@ -51,4 +51,4 @@ and examples, see Mansinghka et al [^pcc-jmlr].
   [^pcc-jmlr]: Mansinghka, V., Shafto, P., Jonas, E., Petschulat, C., Gasner,
   M., & Tenenbaum, J. B. (2016). Crosscat: A fully bayesian nonparametric
   method for analyzing heterogeneous, high dimensional data.
-  [(PDF)](jmlr.org/papers/volume17/11-392/11-392.pdf)
+  [(PDF)](https://jmlr.org/papers/volume17/11-392/11-392.pdf)
@@ -76,5 +76,5 @@ The CRP metaphor works like this: you are on your lunch break and, as one often
 
 where \\(z_i\\) is the table of customer i, \\(n_k\\) is the number of customers currently seated at table \\(k\\), and \\(N_{-i}\\) is the total number of seated customers, not including customer i (who is still deciding where to sit).
 
-Under the CRP formalism, we make inferences about what datum belongs to which category. The weight vector is implicit. That's it. For information on how inference is done in DPMMs check out the [literature recommendations](#literature-recommendations).
+Under the CRP formalism, we make inferences about what datum belongs to which category. The weight vector is implicit. That's it. For information on how inference is done in DPMMs check out the [literature recommendations](stats-primer.md).
 
@@ -29,7 +29,7 @@ Open the model in lace
 ```python
 import lace
 
-engine = lace.Engine(metadata='metadata.lace')
+engine = lace.Engine.load('metadata.lace')
 ```
 
 ```rust,noplayground

@@ -133,7 +133,7 @@
                             <i class="fa fa-paint-brush"></i>
                         </button>
                         <ul id="theme-list" class="theme-popup" aria-label="Themes" role="menu">
-                            <!-- <li role="none"><button role="menuitem" class="theme" id="light">{{ theme_option "Light" }}</button></li> -->
+                            <li role="none"><button role="menuitem" class="theme" id="light">{{ theme_option "Light" }}</button></li>
                             <!-- <li role="none"><button role="menuitem" class="theme" id="rust">{{ theme_option "Rust" }}</button></li> -->
                             <!-- <li role="none"><button role="menuitem" class="theme" id="coal">{{ theme_option "Coal" }}</button></li> -->
                             <!-- <li role="none"><button role="menuitem" class="theme" id="navy">{{ theme_option "Navy" }}</button></li> -->
Original file line number	Diff line number	Diff line change
Expand Up		@@ -76,5 +76,5 @@ The CRP metaphor works like this: you are on your lunch break and, as one often

		where \\(z_i\\) is the table of customer i, \\(n_k\\) is the number of customers currently seated at table \\(k\\), and \\(N_{-i}\\) is the total number of seated customers, not including customer i (who is still deciding where to sit).

		Under the CRP formalism, we make inferences about what datum belongs to which category. The weight vector is implicit. That's it. For information on how inference is done in DPMMs check out the [literature recommendations](#literature-recommendations).
		Under the CRP formalism, we make inferences about what datum belongs to which category. The weight vector is implicit. That's it. For information on how inference is done in DPMMs check out the [literature recommendations](stats-primer.md).