From 567181b0b479289e3789e75a8d732207d2ad670f Mon Sep 17 00:00:00 2001
From: beingPro007 <120173992+beingPro007@users.noreply.github.com>
Date: Fri, 11 Oct 2024 04:25:21 +0530
Subject: [PATCH] Solved Meta main #524 (#530)
* Solved Meta main #524
* Restore the 'probability-interface' line as per feedback in PR review [#530](https://github.com/TuringLang/docs/pull/530#discussion_r1789984305)
* Fixes #524: Add Quarto Meta Variables
- Added `` to replace `..` or `../..` usage across tutorials.
- As per feedback from @penelopeysm, added an anchor to the specific part of the docs for the tutorial.
- Included the `probability-interface` tutorial in the context.
- Ensured no unnecessary whitespace changes to keep the pull request clean and focused.
* Fixes #524: Added Necessary Quarto Meta Variables
- Implemented all required Quarto meta variables, as suggested by @yebai.
- These changes include the addition of all necessary meta variables identified up to this point.
- Future adjustments can be made as needed to accommodate any further requirements.
* Fix docs base link
* Remove trailing slashes, add prob interface variable
* Re-add site-url variable
* Use doc-base-url throughout
---------
Co-authored-by: Penelope Yong
---
_quarto.yml | 52 ++++++++++++++++---
tutorials/01-gaussian-mixture-model/index.qmd | 24 ++++-----
tutorials/04-hidden-markov-model/index.qmd | 4 +-
tutorials/05-linear-regression/index.qmd | 2 +-
tutorials/06-infinite-mixture-model/index.qmd | 4 +-
.../index.qmd | 4 +-
tutorials/09-variational-inference/index.qmd | 14 ++---
tutorials/14-minituring/index.qmd | 4 +-
tutorials/docs-00-getting-started/index.qmd | 6 +--
.../index.qmd | 22 ++++----
.../index.qmd | 6 +--
.../docs-09-using-turing-advanced/index.qmd | 8 +--
.../docs-12-using-turing-guide/index.qmd | 8 +--
.../index.qmd | 18 +++----
14 files changed, 106 insertions(+), 70 deletions(-)
diff --git a/_quarto.yml b/_quarto.yml
index 354fa9269..449dbf50b 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -12,7 +12,7 @@ website:
site-url: https://turinglang.org/
site-path: "/"
favicon: "assets/favicon.ico"
- search:
+ search:
location: navbar
type: overlay
navbar:
@@ -50,7 +50,7 @@ website:
sidebar:
- text: documentation
collapse-level: 1
- contents:
+ contents:
- section: "Users"
# href: tutorials/index.qmd, This page will be added later so keep this line commented
contents:
@@ -59,7 +59,7 @@ website:
- section: "Usage Tips"
collapse-level: 1
- contents:
+ contents:
- tutorials/docs-10-using-turing-autodiff/index.qmd
- tutorials/usage-custom-distribution/index.qmd
- tutorials/usage-probability-interface/index.qmd
@@ -72,7 +72,7 @@ website:
- tutorials/docs-16-using-turing-external-samplers/index.qmd
- section: "Tutorials"
- contents:
+ contents:
- tutorials/00-introduction/index.qmd
- text: Gaussian Mixture Models
href: tutorials/01-gaussian-mixture-model/index.qmd
@@ -129,7 +129,7 @@ website:
background: "#073c44"
left: |
Turing is created by Hong Ge, and lovingly maintained by the core team of volunteers.
- The contents of this website are © 2024 under the terms of the MIT License.
+ The contents of this website are © 2024 under the terms of the MIT License.
right:
- icon: twitter
@@ -162,6 +162,42 @@ execute:
# Global Variables to use in any qmd files using:
# {{< meta site-url >}}
-site-url: https://turinglang.org/
-get-started: docs/tutorials/docs-00-getting-started/
-tutorials-intro: docs/tutorials/00-introduction/
+
+site-url: https://turinglang.org
+doc-base-url: https://turinglang.org/docs
+
+get-started: tutorials/docs-00-getting-started
+tutorials-intro: tutorials/00-introduction
+gaussian-mixture-model: tutorials/01-gaussian-mixture-model
+logistic-regression: tutorials/02-logistic-regression
+bayesian-neural-network: tutorials/03-bayesian-neural-network
+hidden-markov-model: tutorials/04-hidden-markov-model
+linear-regression: tutorials/05-linear-regression
+infinite-mixture-model: tutorials/06-infinite-mixture-model
+poisson-regression: tutorials/07-poisson-regression
+multinomial-logistic-regression: tutorials/08-multinomial-logistic-regression
+variational-inference: tutorials/09-variational-inference
+bayesian-differential-equations: tutorials/10-bayesian-differential-equations
+probabilistic-pca: tutorials/11-probabilistic-pca
+gplvm: tutorials/12-gplvm
+seasonal-time-series: tutorials/13-seasonal-time-series
+contexts: tutorials/16-contexts
+miniature: tutorial/14-minituring
+contributing-guide: tutorials/docs-01-contributing-guide
+using-turing-abstractmcmc: tutorials/docs-04-for-developers-abstractmc-turing
+using-turing-compiler: tutorials/docs-05-for-developers-compiler
+using-turing-interface: tutorials/docs-06-for-developers-interface
+using-turing-variational-inference: tutorials/docs-07-for-developers-variational-inference
+using-turing-advanced: tutorials/tutorials/docs-09-using-turing-advanced
+using-turing-autodiff: tutorials/docs-10-using-turing-autodiff
+using-turing-dynamichmc: tutorials/docs-11-using-turing-dynamichmc
+using-turing: tutorials/docs-12-using-turing-guide
+using-turing-performance-tips: tutorials/docs-13-using-turing-performance-tips
+using-turing-sampler-viz: tutorials/docs-15-using-turing-sampler-viz
+using-turing-external-samplers: tutorials/docs-16-using-turing-external-samplers
+using-turing-implementing-samplers: tutorials/docs-17-implementing-samplers
+using-turing-mode-estimation: tutorials/docs-17-mode-estimation
+usage-probability-interface: tutorials/usage-probability-interface
+usage-custom-distribution: tutorials/tutorials/usage-custom-distribution
+usage-generated-quantities: tutorials/tutorials/usage-generated-quantities
+usage-modifying-logprob: tutorials/tutorials/usage-modifying-logprob
diff --git a/tutorials/01-gaussian-mixture-model/index.qmd b/tutorials/01-gaussian-mixture-model/index.qmd
index 7e7e528f2..096d13a58 100755
--- a/tutorials/01-gaussian-mixture-model/index.qmd
+++ b/tutorials/01-gaussian-mixture-model/index.qmd
@@ -130,7 +130,7 @@ chains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initia
::: {.callout-warning collapse="true"}
## Sampling With Multiple Threads
-The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains
+The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains
will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.](https://turinglang.org/dev/docs/using-turing/guide/#sampling-multiple-chains)
:::
@@ -161,7 +161,7 @@ It can happen that the modes of $\mu_1$ and $\mu_2$ switch between chains.
For more information see the [Stan documentation](https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html). This is because it's possible for either model parameter $\mu_k$ to be assigned to either of the corresponding true means, and this assignment need not be consistent between chains.
That is, the posterior is fundamentally multimodal, and different chains can end up in different modes, complicating inference.
-One solution here is to enforce an ordering on our $\mu$ vector, requiring $\mu_k > \mu_{k-1}$ for all $k$.
+One solution here is to enforce an ordering on our $\mu$ vector, requiring $\mu_k > \mu_{k-1}$ for all $k$.
`Bijectors.jl` [provides](https://turinglang.org/Bijectors.jl/dev/transforms/#Bijectors.OrderedBijector) an easy transformation (`ordered()`) for this purpose:
```{julia}
@@ -255,7 +255,7 @@ scatter(
## Marginalizing Out The Assignments
-We can write out the marginal posterior of (continuous) $w, \mu$ by summing out the influence of our (discrete) assignments $z_i$ from
+We can write out the marginal posterior of (continuous) $w, \mu$ by summing out the influence of our (discrete) assignments $z_i$ from
our likelihood:
$$
p(y \mid w, \mu ) = \sum_{k=1}^K w_k p_k(y \mid \mu_k)
@@ -299,11 +299,11 @@ end
::: {.callout-warning collapse="false"}
## Manually Incrementing Probablity
-When possible, use of `Turing.@addlogprob!` should be avoided, as it exists outside the
+When possible, use of `Turing.@addlogprob!` should be avoided, as it exists outside the
usual structure of a Turing model. In most cases, a custom distribution should be used instead.
Here, the next section demonstrates the perfered method --- using the `MixtureModel` distribution we have seen already to
-perform the marginalization automatically.
+perform the marginalization automatically.
:::
@@ -312,8 +312,8 @@ perform the marginalization automatically.
We can use Turing's `~` syntax with anything that `Distributions.jl` provides `logpdf` and `rand` methods for. It turns out that the
`MixtureModel` distribution it provides has, as its `logpdf` method, `logpdf(MixtureModel([Component_Distributions], weight_vector), Y)`, where `Y` can be either a single observation or vector of observations.
-In fact, `Distributions.jl` provides [many convenient constructors](https://juliastats.org/Distributions.jl/stable/mixture/) for mixture models, allowing further simplification in common special cases.
-
+In fact, `Distributions.jl` provides [many convenient constructors](https://juliastats.org/Distributions.jl/stable/mixture/) for mixture models, allowing further simplification in common special cases.
+
For example, when mixtures distributions are of the same type, one can write: `~ MixtureModel(Normal, [(μ1, σ1), (μ2, σ2)], w)`, or when the weight vector is known to allocate probability equally, it can be ommited.
The `logpdf` implementation for a `MixtureModel` distribution is exactly the marginalization defined above, and so our model becomes simply:
@@ -330,7 +330,7 @@ end
model = gmm_marginalized(x);
```
-As we've summed out the discrete components, we can perform inference using `NUTS()` alone.
+As we've summed out the discrete components, we can perform inference using `NUTS()` alone.
```{julia}
#| output: false
@@ -352,7 +352,7 @@ let
end
```
-`NUTS()` significantly outperforms our compositional Gibbs sampler, in large part because our model is now Rao-Blackwellized thanks to
+`NUTS()` significantly outperforms our compositional Gibbs sampler, in large part because our model is now Rao-Blackwellized thanks to
the marginalization of our assignment parameter.
```{julia}
@@ -360,13 +360,13 @@ plot(chains[["μ[1]", "μ[2]"]], legend=true)
```
## Inferred Assignments - Marginalized Model
-As we've summed over possible assignments, the associated parameter is no longer available in our chain.
+As we've summed over possible assignments, the associated parameter is no longer available in our chain.
This is not a problem, however, as given any fixed sample $(\mu, w)$, the assignment probability — $p(z_i \mid y_i)$ — can be recovered using Bayes rule:
$$
p(z_i \mid y_i) = \frac{p(y_i \mid z_i) p(z_i)}{\sum_{k = 1}^K \left(p(y_i \mid z_i) p(z_i) \right)}
$$
-This quantity can be computed for every $p(z = z_i \mid y_i)$, resulting in a probability vector, which is then used to sample
+This quantity can be computed for every $p(z = z_i \mid y_i)$, resulting in a probability vector, which is then used to sample
posterior predictive assignments from a categorial distribution.
For details on the mathematics here, see [the Stan documentation on latent discrete parameters](https://mc-stan.org/docs/stan-users-guide/latent-discrete.html).
```{julia}
@@ -399,7 +399,7 @@ chains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initia
Given a sample from the marginalized posterior, these assignments can be recovered with:
```{julia}
-assignments = mean(generated_quantities(gmm_recover(x), chains));
+assignments = mean(generated_quantities(gmm_recover(x), chains));
```
```{julia}
diff --git a/tutorials/04-hidden-markov-model/index.qmd b/tutorials/04-hidden-markov-model/index.qmd
index 38004e044..40ff269e2 100755
--- a/tutorials/04-hidden-markov-model/index.qmd
+++ b/tutorials/04-hidden-markov-model/index.qmd
@@ -14,7 +14,7 @@ This tutorial illustrates training Bayesian [Hidden Markov Models](https://en.wi
In this tutorial, we assume there are $k$ discrete hidden states; the observations are continuous and normally distributed - centered around the hidden states. This assumption reduces the number of parameters to be estimated in the emission matrix.
-Let's load the libraries we'll need. We also set a random seed (for reproducibility) and the automatic differentiation backend to forward mode (more [here](https://turinglang.org/dev/docs/using-turing/autodiff) on why this is useful).
+Let's load the libraries we'll need. We also set a random seed (for reproducibility) and the automatic differentiation backend to forward mode (more [here]( {{}}/{{}} ) on why this is useful).
```{julia}
# Load libraries.
@@ -125,7 +125,7 @@ We will use a combination of two samplers ([HMC](https://turinglang.org/dev/docs
In this case, we use HMC for `m` and `T`, representing the emission and transition matrices respectively. We use the Particle Gibbs sampler for `s`, the state sequence. You may wonder why it is that we are not assigning `s` to the HMC sampler, and why it is that we need compositional Gibbs sampling at all.
-The parameter `s` is not a continuous variable. It is a vector of **integers**, and thus Hamiltonian methods like HMC and [NUTS](https://turinglang.org/dev/docs/library/#Turing.Inference.NUTS) won't work correctly. Gibbs allows us to apply the right tools to the best effect. If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use [different automatic differentiation](https://turinglang.org/dev/docs/using-turing/autodiff#compositional-sampling-with-differing-ad-modes) backends for each parameter space.
+The parameter `s` is not a continuous variable. It is a vector of **integers**, and thus Hamiltonian methods like HMC and [NUTS](https://turinglang.org/dev/docs/library/#Turing.Inference.NUTS) won't work correctly. Gibbs allows us to apply the right tools to the best effect. If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use [different automatic differentiation]( {{}}/{{}}#compositional-sampling-with-differing-ad-modes) backends for each parameter space.
Time to run our sampler.
diff --git a/tutorials/05-linear-regression/index.qmd b/tutorials/05-linear-regression/index.qmd
index 94ee80a87..9bc5e5a84 100755
--- a/tutorials/05-linear-regression/index.qmd
+++ b/tutorials/05-linear-regression/index.qmd
@@ -164,7 +164,7 @@ end
## Comparing to OLS
-A satisfactory test of our model is to evaluate how well it predicts. Importantly, we want to compare our model to existing tools like OLS. The code below uses the [GLM.jl]() package to generate a traditional OLS multiple regression model on the same data as our probabilistic model.
+A satisfactory test of our model is to evaluate how well it predicts. Importantly, we want to compare our model to existing tools like OLS. The code below uses the [GLM.jl](https://juliastats.org/GLM.jl/stable/) package to generate a traditional OLS multiple regression model on the same data as our probabilistic model.
```{julia}
# Import the GLM package.
diff --git a/tutorials/06-infinite-mixture-model/index.qmd b/tutorials/06-infinite-mixture-model/index.qmd
index 0a9adc34d..51b10e03a 100755
--- a/tutorials/06-infinite-mixture-model/index.qmd
+++ b/tutorials/06-infinite-mixture-model/index.qmd
@@ -81,7 +81,7 @@ x &\sim \mathrm{Normal}(\mu_z, \Sigma)
\end{align}
$$
-which resembles the model in the [Gaussian mixture model tutorial](https://turinglang.org/stable/tutorials/01-gaussian-mixture-model/) with a slightly different notation.
+which resembles the model in the [Gaussian mixture model tutorial]( {{}}/{{}}) with a slightly different notation.
## Infinite Mixture Model
@@ -149,7 +149,7 @@ end
```{julia}
using Plots
-# Plot the cluster assignments over time
+# Plot the cluster assignments over time
@gif for i in 1:Nmax
scatter(
collect(1:i),
diff --git a/tutorials/08-multinomial-logistic-regression/index.qmd b/tutorials/08-multinomial-logistic-regression/index.qmd
index 8809d969e..4bbfcebff 100755
--- a/tutorials/08-multinomial-logistic-regression/index.qmd
+++ b/tutorials/08-multinomial-logistic-regression/index.qmd
@@ -144,8 +144,8 @@ chain
::: {.callout-warning collapse="true"}
## Sampling With Multiple Threads
-The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains
-will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.](https://turinglang.org/dev/docs/using-turing/guide/#sampling-multiple-chains)
+The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains
+will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.]( {{}}/{{}}#sampling-multiple-chains )
:::
Since we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points.
diff --git a/tutorials/09-variational-inference/index.qmd b/tutorials/09-variational-inference/index.qmd
index 1c212750b..a66354e0e 100755
--- a/tutorials/09-variational-inference/index.qmd
+++ b/tutorials/09-variational-inference/index.qmd
@@ -13,7 +13,7 @@ Pkg.instantiate();
In this post we'll have a look at what's know as **variational inference (VI)**, a family of _approximate_ Bayesian inference methods, and how to use it in Turing.jl as an alternative to other approaches such as MCMC. In particular, we will focus on one of the more standard VI methods called **Automatic Differentation Variational Inference (ADVI)**.
Here we will focus on how to use VI in Turing and not much on the theory underlying VI.
-If you are interested in understanding the mathematics you can checkout [our write-up](../../tutorials/docs-07-for-developers-variational-inference/) or any other resource online (there a lot of great ones).
+If you are interested in understanding the mathematics you can checkout [our write-up]( {{}}/{{}} ) or any other resource online (there a lot of great ones).
Using VI in Turing.jl is very straight forward.
If `model` denotes a definition of a `Turing.Model`, performing VI is as simple as
@@ -26,7 +26,7 @@ q = vi(m, vi_alg) # perform VI on `m` using the VI method `vi_alg`, which retur
Thus it's no more work than standard MCMC sampling in Turing.
-To get a bit more into what we can do with `vi`, we'll first have a look at a simple example and then we'll reproduce the [tutorial on Bayesian linear regression](../../tutorials/05-linear-regression/) using VI instead of MCMC. Finally we'll look at some of the different parameters of `vi` and how you for example can use your own custom variational family.
+To get a bit more into what we can do with `vi`, we'll first have a look at a simple example and then we'll reproduce the [tutorial on Bayesian linear regression]( {{}}/{{}}) using VI instead of MCMC. Finally we'll look at some of the different parameters of `vi` and how you for example can use your own custom variational family.
We first import the packages to be used:
@@ -248,7 +248,7 @@ plot(p1, p2; layout=(2, 1), size=(900, 500))
## Bayesian linear regression example using ADVI
-This is simply a duplication of the tutorial on [Bayesian linear regression](../../tutorials/05-linear-regression/) (much of the code is directly lifted), but now with the addition of an approximate posterior obtained using `ADVI`.
+This is simply a duplication of the tutorial on [Bayesian linear regression]({{< meta doc-base-url >}}/{{}}) (much of the code is directly lifted), but now with the addition of an approximate posterior obtained using `ADVI`.
As we'll see, there is really no additional work required to apply variational inference to a more complex `Model`.
@@ -599,7 +599,7 @@ println("Training set:
VI loss: $vi_loss1
Bayes loss: $bayes_loss1
OLS loss: $ols_loss1
-Test set:
+Test set:
VI loss: $vi_loss2
Bayes loss: $bayes_loss2
OLS loss: $ols_loss2")
@@ -765,8 +765,8 @@ plot(p1, p2; layout=(1, 2), size=(800, 2000))
So it seems like the "full" ADVI approach, i.e. no mean-field assumption, obtain the same modes as the mean-field approach but with greater uncertainty for some of the `coefficients`. This
```{julia}
-# Unfortunately, it seems like this has quite a high variance which is likely to be due to numerical instability,
-# so we consider a larger number of samples. If we get a couple of outliers due to numerical issues,
+# Unfortunately, it seems like this has quite a high variance which is likely to be due to numerical instability,
+# so we consider a larger number of samples. If we get a couple of outliers due to numerical issues,
# these kind affect the mean prediction greatly.
z = rand(q_full_normal, 10_000);
```
@@ -795,7 +795,7 @@ println("Training set:
VI loss: $vi_loss1
Bayes loss: $bayes_loss1
OLS loss: $ols_loss1
-Test set:
+Test set:
VI loss: $vi_loss2
Bayes loss: $bayes_loss2
OLS loss: $ols_loss2")
diff --git a/tutorials/14-minituring/index.qmd b/tutorials/14-minituring/index.qmd
index bb8d35049..8efaa8856 100755
--- a/tutorials/14-minituring/index.qmd
+++ b/tutorials/14-minituring/index.qmd
@@ -82,7 +82,7 @@ Thus depending on the inference algorithm we want to use different `assume` and
We can achieve this by providing this `context` information as a function argument to `assume` and `observe`.
**Note:** *Although the context system in this tutorial is inspired by DynamicPPL, it is very simplistic.
-We expand this mini Turing example in the [contexts](../16-contexts) tutorial with some more complexity, to illustrate how and why contexts are central to Turing's design. For the full details one still needs to go to the actual source of DynamicPPL though.*
+We expand this mini Turing example in the [contexts]( {{}}/{{}} ) tutorial with some more complexity, to illustrate how and why contexts are central to Turing's design. For the full details one still needs to go to the actual source of DynamicPPL though.*
Here we can see the implementation of a sampler that draws values of unobserved variables from the prior and computes the log-probability for every variable.
@@ -189,7 +189,7 @@ function assume(context::SamplingContext{<:MHSampler}, varinfo, dist, var_id)
sampler = context.sampler
old_value = varinfo.values[var_id]
- # propose a random-walk step, i.e, add the current value to a random
+ # propose a random-walk step, i.e, add the current value to a random
# value sampled from a Normal distribution centered at 0
value = rand(context.rng, Normal(old_value, sampler.sigma))
logp = Distributions.logpdf(dist, value)
diff --git a/tutorials/docs-00-getting-started/index.qmd b/tutorials/docs-00-getting-started/index.qmd
index 629c5743b..8612ff481 100644
--- a/tutorials/docs-00-getting-started/index.qmd
+++ b/tutorials/docs-00-getting-started/index.qmd
@@ -1,7 +1,7 @@
---
title: Getting Started
engine: julia
-aliases:
+aliases:
- ../../
---
@@ -82,5 +82,5 @@ The underlying theory of Bayesian machine learning is not explained in detail in
A thorough introduction to the field is [*Pattern Recognition and Machine Learning*](https://www.springer.com/us/book/9780387310732) (Bishop, 2006); an online version is available [here (PDF, 18.1 MB)](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf).
:::
-The next page on [Turing's core functionality](../../tutorials/docs-12-using-turing-guide/) explains the basic features of the Turing language.
-From there, you can either look at [worked examples of how different models are implemented in Turing](../../tutorials/00-introduction/), or [specific tips and tricks that can help you get the most out of Turing](../../tutorials/docs-17-mode-estimation/).
+The next page on [Turing's core functionality]( {{}}/{{}} ) explains the basic features of the Turing language.
+From there, you can either look at [worked examples of how different models are implemented in Turing]( {{}}/{{}} ), or [specific tips and tricks that can help you get the most out of Turing]( {{}}/{{}} ).
diff --git a/tutorials/docs-04-for-developers-abstractmcmc-turing/index.qmd b/tutorials/docs-04-for-developers-abstractmcmc-turing/index.qmd
index fa90d8f27..9106a1447 100755
--- a/tutorials/docs-04-for-developers-abstractmcmc-turing/index.qmd
+++ b/tutorials/docs-04-for-developers-abstractmcmc-turing/index.qmd
@@ -10,7 +10,7 @@ using Pkg;
Pkg.instantiate();
```
-Prerequisite: [Interface guide](../interface/).
+Prerequisite: [Interface guide](../{{}}).
## Introduction
@@ -33,7 +33,7 @@ n_samples = 1000
chn = sample(mod, alg, n_samples, progress=false)
```
-The function `sample` is part of the AbstractMCMC interface. As explained in the [interface guide](https://turinglang.org/dev/docs/for-developers/interface), building a sampling method that can be used by `sample` consists in overloading the structs and functions in `AbstractMCMC`. The interface guide also gives a standalone example of their implementation, [`AdvancedMH.jl`]().
+The function `sample` is part of the AbstractMCMC interface. As explained in the [interface guide]( {{}}/{{}} ), building a sampling method that can be used by `sample` consists in overloading the structs and functions in `AbstractMCMC`. The interface guide also gives a standalone example of their implementation, [`AdvancedMH.jl`]().
Turing sampling methods (most of which are written [here](https://github.com/TuringLang/Turing.jl/tree/master/src/mcmc)) also implement `AbstractMCMC`. Turing defines a particular architecture for `AbstractMCMC` implementations, that enables working with models defined by the `@model` macro, and uses DynamicPPL as a backend. The goal of this page is to describe this architecture, and how you would go about implementing your own sampling method in Turing, using Importance Sampling as an example. I don't go into all the details: for instance, I don't address selectors or parallelism.
@@ -182,7 +182,7 @@ The following diagram summarizes the hierarchy presented above.
//| echo: false
digraph G {
node [shape=box];
-
+
spl [label=Sampler
<:AbstractSampler>, style=rounded, xlabel="", shape=box];
state [label=State
<:AbstractSamplerState>, style=rounded, xlabel="", shape=box];
alg [label=Algorithm
<:InferenceAlgorithm>, style=rounded, xlabel="", shape=box];
@@ -191,7 +191,7 @@ digraph G {
placeholder2 [label="...", width=1];
placeholder3 [label="...", width=1];
placeholder4 [label="...", width=1];
-
+
spl -> state;
spl -> alg;
spl -> placeholder1;
@@ -265,7 +265,7 @@ This begs the question: how can these functions access model information during
Consider an instance `m` of `Model` and a sampler `spl`, with associated `VarInfo` `vi = spl.state.vi`. At some point during the sampling process, an AbstractMCMC function such as `step!` calls `m(vi, ...)`, which calls the model evaluation function `m.f(vi, ...)`.
- for every tilde statement in the `@model` macro, `m.f(vi, ...)` returns model-related information (samples, value of the model density, etc.), and adds it to `vi`. How does it do that?
-
+
+ recall that the code for `m.f(vi, ...)` is automatically generated by compilation of the `@model` macro
+ for every tilde statement in the `@model` declaration, this code contains a call to `assume(vi, ...)` if the variable on the LHS of the tilde is a **model parameter to infer**, and `observe(vi, ...)` if the variable on the LHS of the tilde is an **observation**
@@ -303,25 +303,25 @@ It simply returns the density (in the discrete case, the probability) of the obs
We focus on the AbstractMCMC functions that are overridden in `is.jl` and executed inside `mcmcsample`: `step!`, which is called `n_samples` times, and `sample_end!`, which is executed once after those `n_samples` iterations.
- During the $i$-th iteration, `step!` does 3 things:
-
+
+ `empty!!(spl.state.vi)`: remove information about the previous sample from the sampler's `VarInfo`
-
+
+ `model(rng, spl.state.vi, spl)`: call the model evaluation function
-
+
* calls to `assume` add the samples from the prior $s_i$ and $m_i$ to `spl.state.vi`
* calls to `assume` or `observe` are followed by the line `acclogp!!(vi, lp)`, where `lp` is an output of `assume` and `observe`
-
+
* `lp` is set to 0 after `assume`, and to the value of the density at the observation after `observe`
* When all the tilde statements have been covered, `spl.state.vi.logp[]` is the sum of the `lp`, i.e., the likelihood $\log p(x, y \mid s_i, m_i) = \log p(x \mid s_i, m_i) + \log p(y \mid s_i, m_i)$ of the observations given the latent variable samples $s_i$ and $m_i$.
+ `return Transition(spl)`: build a transition from the sampler, and return that transition
-
+
* the transition's `vi` field is simply `spl.state.vi`
* the `lp` field contains the likelihood `spl.state.vi.logp[]`
- When the `n_samples` iterations are completed, `sample_end!` fills the `final_logevidence` field of `spl.state`
-
+
+ It simply takes the logarithm of the average of the sample weights, using the log weights for numerical stability
diff --git a/tutorials/docs-07-for-developers-variational-inference/index.qmd b/tutorials/docs-07-for-developers-variational-inference/index.qmd
index 00c17b456..2568fefe1 100755
--- a/tutorials/docs-07-for-developers-variational-inference/index.qmd
+++ b/tutorials/docs-07-for-developers-variational-inference/index.qmd
@@ -7,7 +7,7 @@ engine: julia
In this post, we'll examine variational inference (VI), a family of approximate Bayesian inference methods. We will focus on one of the more standard VI methods, Automatic Differentiation Variational Inference (ADVI).
-Here, we'll examine the theory behind VI, but if you're interested in using ADVI in Turing, [check out this tutorial](../../tutorials/09-variational-inference).
+Here, we'll examine the theory behind VI, but if you're interested in using ADVI in Turing, [check out this tutorial]( {{}}/{{}} ).
# Motivation
@@ -181,7 +181,7 @@ With all this nailed down, we eventually reach the section on **Automatic Differ
So let's revisit the assumptions we've made at this point:
1. The variational posterior $q\_{\theta}$ is in a parameterized family of densities denoted $\mathscr{Q}\_{\Theta}$, with $\theta \in \Theta$.
-
+
2. $\mathscr{Q}\_{\Theta}$ is a space of _reparameterizable_ densities with $\bar{q}(z)$ as the base-density.
3. The parameterization function $g\_{\theta}$ is differentiable wrt. $\theta$.
@@ -380,4 +380,4 @@ $$
And maximizing this wrt. $\mu$ and $\Sigma$ is what's referred to as **Automatic Differentiation Variational Inference (ADVI)**!
-Now if you want to try it out, [check out the tutorial on how to use ADVI in Turing.jl](../../tutorials/09-variational-inference/)!
+Now if you want to try it out, [check out the tutorial on how to use ADVI in Turing.jl]( {{}}/{{}} )!
diff --git a/tutorials/docs-09-using-turing-advanced/index.qmd b/tutorials/docs-09-using-turing-advanced/index.qmd
index 6e67e9d22..85ed53f66 100755
--- a/tutorials/docs-09-using-turing-advanced/index.qmd
+++ b/tutorials/docs-09-using-turing-advanced/index.qmd
@@ -5,7 +5,7 @@ engine: julia
This page has been separated into new sections. Please update any bookmarks you might have:
- - [Custom Distributions](../../tutorials/usage-custom-distribution)
- - [Modifying the Log Probability](../../tutorials/usage-modifying-logprob/)
- - [Defining a Model without `@model`](../../tutorials/dev-model-manual/)
- - [Reparametrization and Generated Quantities](../../tutorials/usage-generated-quantities/)
+ - [Custom Distributions]({{< meta doc-base-url >}}/tutorials/usage-custom-distribution)
+ - [Modifying the Log Probability]({{< meta doc-base-url >}}/tutorials/usage-modifying-logprob/)
+ - [Defining a Model without `@model`]({{< meta doc-base-url >}}/tutorials/dev-model-manual/)
+ g [Reparametrization and Generated Quantities]({{< meta doc-base-url >}}/tutorials/usage-generated-quantities/)
diff --git a/tutorials/docs-12-using-turing-guide/index.qmd b/tutorials/docs-12-using-turing-guide/index.qmd
index aff57541c..a595323c8 100755
--- a/tutorials/docs-12-using-turing-guide/index.qmd
+++ b/tutorials/docs-12-using-turing-guide/index.qmd
@@ -427,7 +427,7 @@ mle_estimate = maximum_likelihood(model)
map_estimate = maximum_a_posteriori(model)
```
-For more details see the [mode estimation page](../docs-17-mode-estimation/index.qmd).
+For more details see the [mode estimation page]( {{}}/{{}} ).
## Beyond the Basics
@@ -453,7 +453,7 @@ simple_choice_f = simple_choice([1.5, 2.0, 0.3])
chn = sample(simple_choice_f, Gibbs(HMC(0.2, 3, :p), PG(20, :z)), 1000)
```
-The `Gibbs` sampler can be used to specify unique automatic differentiation backends for different variable spaces. Please see the [Automatic Differentiation]({{< meta site-url >}}/dev/docs/using-turing/autodiff) article for more.
+The `Gibbs` sampler can be used to specify unique automatic differentiation backends for different variable spaces. Please see the [Automatic Differentiation]( {{}}/{{}} ) article for more.
For more details of compositional sampling in Turing.jl, please check the corresponding [paper](https://proceedings.mlr.press/v84/ge18b.html).
@@ -516,13 +516,13 @@ ForwardDiff (Turing's default AD backend) uses forward-mode chunk-wise AD. The c
Turing supports four automatic differentiation (AD) packages in the back end during sampling. The default AD backend is [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) for forward-mode AD. Three reverse-mode AD backends are also supported, namely [Tracker](https://github.com/FluxML/Tracker.jl), [Zygote](https://github.com/FluxML/Zygote.jl) and [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl). `Zygote` and `ReverseDiff` are supported optionally if explicitly loaded by the user with `using Zygote` or `using ReverseDiff` next to `using Turing`.
-For more information on Turing's automatic differentiation backend, please see the [Automatic Differentiation]({{< meta site-url >}}/dev/docs/using-turing/autodiff) article.
+For more information on Turing's automatic differentiation backend, please see the [Automatic Differentiation]({{< meta doc-base-url >}}/{{}}) article.
#### Progress Logging
`Turing.jl` uses ProgressLogging.jl to log the sampling progress. Progress
logging is enabled as default but might slow down inference. It can be turned on
-or off by setting the keyword argument `progress` of `sample` to `true` or `false`.
+or off by setting the keyword argument `progress` of `sample` to `true` or `false`.
Moreover, you can enable or disable progress logging globally by calling `setprogress!(true)` or `setprogress!(false)`, respectively.
Turing uses heuristics to select an appropriate visualization backend. If you
diff --git a/tutorials/docs-13-using-turing-performance-tips/index.qmd b/tutorials/docs-13-using-turing-performance-tips/index.qmd
index 2d5c15cfd..5be99b98c 100755
--- a/tutorials/docs-13-using-turing-performance-tips/index.qmd
+++ b/tutorials/docs-13-using-turing-performance-tips/index.qmd
@@ -43,25 +43,25 @@ end
## Choose your AD backend
Automatic differentiation (AD) makes it possible to use modern, efficient gradient-based samplers like NUTS and HMC, and that means a good AD system is incredibly important. Turing currently
-supports several AD backends, including [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) (the default), [Zygote](https://github.com/FluxML/Zygote.jl),
-[ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl), and [Tracker](https://github.com/FluxML/Tracker.jl). Experimental support is also available for
-[Tapir](https://github.com/withbayes/Tapir.jl).
+supports several AD backends, including [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) (the default), [Zygote](https://github.com/FluxML/Zygote.jl),
+[ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl), and [Tracker](https://github.com/FluxML/Tracker.jl). Experimental support is also available for
+[Tapir](https://github.com/withbayes/Tapir.jl).
-For many common types of models, the default ForwardDiff backend performs great, and there is no need to worry about changing it. However, if you need more speed, you can try
+For many common types of models, the default ForwardDiff backend performs great, and there is no need to worry about changing it. However, if you need more speed, you can try
different backends via the standard [ADTypes](https://github.com/SciML/ADTypes.jl) interface by passing an `AbstractADType` to the sampler with the optional `adtype` argument, e.g.
-`NUTS(adtype = AutoZygote())`. See [Automatic Differentiation](autodiff) for details. Generally, `adtype = AutoForwardDiff()` is likely to be the fastest and most reliable for models with
+`NUTS(adtype = AutoZygote())`. See [Automatic Differentiation] {{}}/{{}} ) for details. Generally, `adtype = AutoForwardDiff()` is likely to be the fastest and most reliable for models with
few parameters (say, less than 20 or so), while reverse-mode backends such as `AutoZygote()` or `AutoReverseDiff()` will perform better for models with many parameters or linear algebra
operations. If in doubt, it's easy to try a few different backends to see how they compare.
### Special care for Zygote and Tracker
Note that Zygote and Tracker will not perform well if your model contains `for`-loops, due to the way reverse-mode AD is implemented in these packages. Zygote also cannot differentiate code
-that contains mutating operations. If you can't implement your model without `for`-loops or mutation, `ReverseDiff` will be a better, more performant option. In general, though,
+that contains mutating operations. If you can't implement your model without `for`-loops or mutation, `ReverseDiff` will be a better, more performant option. In general, though,
vectorized operations are still likely to perform best.
-Avoiding loops can be done using `filldist(dist, N)` and `arraydist(dists)`. `filldist(dist, N)` creates a multivariate distribution that is composed of `N` identical and independent
-copies of the univariate distribution `dist` if `dist` is univariate, or it creates a matrix-variate distribution composed of `N` identical and independent copies of the multivariate
-distribution `dist` if `dist` is multivariate. `filldist(dist, N, M)` can also be used to create a matrix-variate distribution from a univariate distribution `dist`. `arraydist(dists)`
+Avoiding loops can be done using `filldist(dist, N)` and `arraydist(dists)`. `filldist(dist, N)` creates a multivariate distribution that is composed of `N` identical and independent
+copies of the univariate distribution `dist` if `dist` is univariate, or it creates a matrix-variate distribution composed of `N` identical and independent copies of the multivariate
+distribution `dist` if `dist` is multivariate. `filldist(dist, N, M)` can also be used to create a matrix-variate distribution from a univariate distribution `dist`. `arraydist(dists)`
is similar to `filldist` but it takes an array of distributions `dists` as input. Writing a [custom distribution](advanced) with a custom adjoint is another option to avoid loops.
### Special care for ReverseDiff with a compiled tape