Skip to content

Commit

Permalink
Merge pull request #42 from minaskar/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
minaskar authored May 31, 2024
2 parents 062e9a6 + ca0d697 commit 77fa21e
Show file tree
Hide file tree
Showing 34 changed files with 2,077 additions and 1,159 deletions.
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@

## Brief introduction

``pocoMC`` utilises a *Normalising Flow* in order to precondition the target distribution by removing any correlations between its parameters. The code then generates posterior samples, that can be used for parameter estimation, using a powerful adaptive *Sequential Monte Carlo* algorithm manifesting a sampling effiency that can be orders of magnitude higher than without precondition. Furthermore, ``pocoMC`` also provides an unbiased estimate of the *model evidence* that can be used for the task of *Bayesian model comparison*.
``pocoMC`` is a Python package for fast Bayesian posterior and model evidence estimation. It leverages
the Preconditioned Monte Carlo (PMC) algorithm, offering significant speed improvements over
traditional methods like MCMC and Nested Sampling. Ideal for large-scale scientific problems
with expensive likelihood evaluations, non-linear correlations, and multimodality, ``pocoMC``
provides efficient and scalable posterior sampling and model evidence estimation. Widely used
in cosmology and astronomy, pocoMC is user-friendly, flexible, and actively maintained.

## Documentation

Expand Down
434 changes: 0 additions & 434 deletions docs/source/advanced.rst

This file was deleted.

4 changes: 4 additions & 0 deletions docs/source/api/tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ Compute effective sample size

.. autofunction:: pocomc.tools.compute_ess

.. autofunction:: pocomc.tools.effective_sample_size

.. autofunction:: pocomc.tools.unique_sample_size


Progress bar
------------
Expand Down
45 changes: 45 additions & 0 deletions docs/source/background.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,48 @@ Bayes factor on the other hand is simply the ratio of the model evidences of the
.. math::
BF_{ij} \equiv \frac{p(d\vert\mathcal{M}_{i})}{p(d\vert\mathcal{M}_{j})} = \frac{\mathcal{Z}_{i}}{\mathcal{Z}_{j}}
Preconditioned Monte Carlo
--------------------------

The Preconditioned Monte Carlo (PMC) algorithm is a variant of the Persistent Sampling (PS) framework, which is a generalization
of the Sequential Monte Carlo (SMC) algorithm. The PMC algorithm is designed to sample from a sequence of probability distributions
:math:`\mathcal{P}_{t}(\theta)`, where the target distribution :math:`\mathcal{P}_{t}(\theta)` is defined by

.. math::
\mathcal{P}_{t}(\theta) \propto \mathcal{L}(\theta)^{\beta_{t}}\pi(\theta)
where :math:`\mathcal{L}(\theta)` is the likelihood function and :math:`\pi(\theta)` is the prior probability density. The effective
inverse temperature parameter :math:`\beta_{t}` is initialized to 0 and is gradually increased to 1. When :math:`\beta_{t}=0`, the target
distribution is the prior distribution, and when :math:`\beta_{t}=1`, the target distribution is the posterior distribution. The inverse
temperature parameter is increased in each iteration by a small step size :math:`\Delta\beta` until it reaches 1. The :math:`\Delta\beta`
is computed adaptively in each iteration to ensure PMC maintains a constant number of effective particles. In each iteration, the PMC
algorithm samples from the target distribution :math:`\mathcal{P}_{t}(\theta)` using a set of particles by applying a sequence of three steps:

1. **Reweighting**: The particles are reweighted to target the distribution :math:`\mathcal{P}_{t}(\theta)`.
2. **Resampling**: The particles are resampled according to their weights to ensure that the effective number of particles is constant.
3. **Mutation**: The particles are mutated by applying a number of MCMC.

The PMC algorithm terminates when the inverse temperature parameter reaches 1. The samples obtained from the PMC algorithm can be used to
approximate the posterior distribution of the parameters :math:`\theta` given the data :math:`d` and the model :math:`\mathcal{M}`. The PMC
algorithm is particularly useful for sampling from high-dimensional and multimodal posterior distributions. Furthemore, the PMC algorithm
offers an estimate of the logarithm of the model evidence :math:`\log\mathcal{Z}` which can be used for Bayesian model comparison.

The high sampling efficiency and robustness of the PMC algorithm is derived by three key features:

1. **Persistent Sampling**: The PMC algorithm maintains a set of particles throughout the entire run of the algorithm. This allows the PMC
algorithm to reuse the particles from previous iterations to sample from the target distribution in the current iteration. This is particularly
useful when the target distribution changes smoothly from one iteration to the next.
2. **Normalizing Flow Preconditioning**: The PMC algorithm uses a normalizing flow to precondition each target distribution :math:`\mathcal{P}_{t}(\theta)`.
The normalizing flow is a sequence of invertible transformations that maps a simple distribution to the target distribution. The normalizing
flow is trained to approximate the target distribution using a set of particles. Sampling in the target distribution is then performed by
sampling from the simple distribution and applying the inverse of the normalizing flow. The normalizing flow preconditioning allows the PMC
algorithm to sample from complex and multimodal target distributions.
3. **t-preconditioned Crank-Nicolson**: The PMC algorithm uses a t-preconditioned Crank-Nicolson integrator to evolve the particles in the target
distribution. The t-preconditioned Crank-Nicolson algorithm is an MCMC method that scales well with the dimensionality of the target distribution.
For targets that are close to Gaussian, the t-preconditioned Crank-Nicolson algorithm is particularly efficient and can scale to very high dimensions.
For non-Gaussian targets (e.g., multimodal distributions), the t-preconditioned Crank-Nicolson algorithm can be combined with the normalizing flow
preconditioning to sample from the target distribution efficiently even in high dimensions.

Unlike traditional samplers that rely on Random-walk Metropolis, Slice Sampling, Rejection Sampling, Importance Sampling, or Independence Metropolis, PMC
can scale to high-dimensions without desolving into random-walk behavior.
94 changes: 94 additions & 0 deletions docs/source/blobs.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Blobs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sometimes it is useful to store additional metadata or derived parameters during the run so that the user does not re-compute them afterwards. This can be achieved easily in ``pocoMC`` using the *blobs* framework, inspired by ``zeus`` and ``emcee`` samplers.\n",
"\n",
"Any additional derived parameters can be returned by the log-likelihood function. The dtypes of these derived parameters should be defined using the ``blobs_dtype`` argument of the ``Sampler`` class.\n",
"\n",
"For instance, for a Gaussian likelihood (with zero mean and unit variance in 5D) where we want to store the median value of the parameters and the number of positive parameters, we would do something like this:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Iter: 25it [01:11, 2.84s/it, calls=11264, beta=1, logZ=-8.23, ESS=3.84e+3, acc=0.857, steps=2, logP=-15.1, eff=0.93]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Median: [ 0.28968189 -2.48432868 1.40069292]\n",
"Number of positive parameters: [3 1 4]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"import numpy as np\n",
"from scipy.stats import norm\n",
"import pocomc as pc\n",
"\n",
"prior = pc.Prior(5*[norm(0,5)])\n",
"\n",
"def log_likelihood(x):\n",
" return -0.5 * np.dot(x,x), np.median(x), np.sum(x>0, dtype=int)\n",
"\n",
"sampler = pc.Sampler(prior,\n",
" log_likelihood,\n",
" blobs_dtype=[('median', float), ('n_positive', int)],\n",
" )\n",
"\n",
"sampler.run()\n",
"\n",
"samples, weights, logl, logp, blobs = sampler.posterior(return_blobs=True)\n",
"\n",
"print(\"Median: \", blobs[\"median\"][:3])\n",
"print(\"Number of positive parameters: \", blobs[\"n_positive\"][:3])\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "dev-env",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
87 changes: 87 additions & 0 deletions docs/source/checkpoint.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Checkpointing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A useful option, especially for long runs, is to be able to store the state of ``pocoMC`` in a file and also the to use\n",
"that file in order to later continue the same run. This can help avoid disastrous situations in which a run is interrupted\n",
"or terminated prematurely (e.g. due to time limitation in computing clusters or possible crashes).\n",
"\n",
"Fortunately, ``pocoMC`` offers both options to save and load a previous state of the sampler."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Save\n",
"\n",
"In order to save the state of the sampler during the run, one has to specify how often to save the state in a file. This is\n",
"done using the ``save_every`` argument in the ``run`` method. The default is ``save_every=None`` which means that no state\n",
"is saved during the run. If instead we want to store the state of ``pocoMC`` every e.g. ``3`` iterations, we would do\n",
"something like:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sampler.run(save_every = 3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The default directory in which the state files are saved is a folder named ``states`` in the current directory. One can change\n",
"this using the ``output_dir`` argument when initialising the sampler (e.g. ``output_dir = \"new_run\"``). By default, the state\n",
"files follow the naming convention ``pmc_{i}.state`` where ``i`` is the iteration index. For instance, if ``save_every=3`` was \n",
"specified then the ``output_dir`` directory will include the files ``pmc_3.state``, ``pmc_6.state``, etc. One can also change\n",
"the label from ``pmc`` to anything else by using the ``output_label`` argument when initialising the sampler (e.g. \n",
"``output_label=\"grav_waves\"``)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Loading a previous state of the sampler and resuming the run from that point requires to provide the path to the specific state\n",
"file to the ``run`` method using the ``resume_state_path`` argument. For instance, if we want to continue the run from the \n",
"``pmc_3.state`` which is in the ``states`` directory, we would do:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sampler.run(resume_state_path = \"states/pmc_3.state\")"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
#'collapse_navigation': True,
#'navigation_depth': 4,
"announcement": (
"⚠️ The new release 1.0.0 includes major performance and quality-of-life updates. Please check the new syntax and features! ⚠️"
"⚠️ The new release 1.1.0 includes major performance and quality-of-life updates. Please check the new syntax and features! ⚠️"
),
'sidebar_hide_name': True,
}
Expand Down
Loading

0 comments on commit 77fa21e

Please sign in to comment.