Merge pull request #42 from minaskar/dev

Dev
minaskar · May 31, 2024 · 77fa21e · 77fa21e
2 parents 062e9a6 + ca0d697
commit 77fa21e
Show file tree

Hide file tree

Showing 34 changed files with 2,077 additions and 1,159 deletions.
diff --git a/README.md b/README.md
@@ -10,7 +10,12 @@
 
 ## Brief introduction
 
-``pocoMC`` utilises a *Normalising Flow* in order to precondition the target distribution by removing any correlations between its parameters. The code then generates posterior samples, that can be used for parameter estimation, using a powerful adaptive *Sequential Monte Carlo* algorithm manifesting a sampling effiency that can be orders of magnitude higher than without precondition. Furthermore, ``pocoMC`` also provides an unbiased estimate of the *model evidence* that can be used for the task of *Bayesian model comparison*.
+``pocoMC`` is a Python package for fast Bayesian posterior and model evidence estimation. It leverages 
+the Preconditioned Monte Carlo (PMC) algorithm, offering significant speed improvements over 
+traditional methods like MCMC and Nested Sampling. Ideal for large-scale scientific problems 
+with expensive likelihood evaluations, non-linear correlations, and multimodality, ``pocoMC`` 
+provides efficient and scalable posterior sampling and model evidence estimation. Widely used 
+in cosmology and astronomy, pocoMC is user-friendly, flexible, and actively maintained.
 
 ## Documentation
 

diff --git a/docs/source/advanced.rst b/docs/source/advanced.rst
diff --git a/docs/source/api/tools.rst b/docs/source/api/tools.rst
@@ -12,6 +12,10 @@ Compute effective sample size
 
 .. autofunction:: pocomc.tools.compute_ess
 
+.. autofunction:: pocomc.tools.effective_sample_size
+
+.. autofunction:: pocomc.tools.unique_sample_size
+
 
 Progress bar
 ------------

diff --git a/docs/source/background.rst b/docs/source/background.rst
@@ -64,3 +64,48 @@ Bayes factor on the other hand is simply the ratio of the model evidences of the
 .. math::
     BF_{ij} \equiv \frac{p(d\vert\mathcal{M}_{i})}{p(d\vert\mathcal{M}_{j})} = \frac{\mathcal{Z}_{i}}{\mathcal{Z}_{j}}
 
+
+Preconditioned Monte Carlo
+--------------------------
+
+The Preconditioned Monte Carlo (PMC) algorithm is a variant of the Persistent Sampling (PS) framework, which is a generalization
+of the Sequential Monte Carlo (SMC) algorithm. The PMC algorithm is designed to sample from a sequence of probability distributions
+:math:`\mathcal{P}_{t}(\theta)`, where the target distribution :math:`\mathcal{P}_{t}(\theta)` is defined by
+
+.. math::
+    \mathcal{P}_{t}(\theta) \propto \mathcal{L}(\theta)^{\beta_{t}}\pi(\theta)
+
+where :math:`\mathcal{L}(\theta)` is the likelihood function and :math:`\pi(\theta)` is the prior probability density. The effective
+inverse temperature parameter :math:`\beta_{t}` is initialized to 0 and is gradually increased to 1. When :math:`\beta_{t}=0`, the target
+distribution is the prior distribution, and when :math:`\beta_{t}=1`, the target distribution is the posterior distribution. The inverse
+temperature parameter is increased in each iteration by a small step size :math:`\Delta\beta` until it reaches 1. The :math:`\Delta\beta`
+is computed adaptively in each iteration to ensure PMC maintains a constant number of effective particles. In each iteration, the PMC 
+algorithm samples from the target distribution :math:`\mathcal{P}_{t}(\theta)` using a set of particles by applying a sequence of three steps:
+
+1. **Reweighting**: The particles are reweighted to target the distribution :math:`\mathcal{P}_{t}(\theta)`.
+2. **Resampling**: The particles are resampled according to their weights to ensure that the effective number of particles is constant.
+3. **Mutation**: The particles are mutated by applying a number of MCMC.
+
+The PMC algorithm terminates when the inverse temperature parameter reaches 1. The samples obtained from the PMC algorithm can be used to
+approximate the posterior distribution of the parameters :math:`\theta` given the data :math:`d` and the model :math:`\mathcal{M}`. The PMC
+algorithm is particularly useful for sampling from high-dimensional and multimodal posterior distributions. Furthemore, the PMC algorithm
+offers an estimate of the logarithm of the model evidence :math:`\log\mathcal{Z}` which can be used for Bayesian model comparison.
+
+The high sampling efficiency and robustness of the PMC algorithm is derived by three key features:
+
+1. **Persistent Sampling**: The PMC algorithm maintains a set of particles throughout the entire run of the algorithm. This allows the PMC
+   algorithm to reuse the particles from previous iterations to sample from the target distribution in the current iteration. This is particularly
+   useful when the target distribution changes smoothly from one iteration to the next.
+2. **Normalizing Flow Preconditioning**: The PMC algorithm uses a normalizing flow to precondition each target distribution :math:`\mathcal{P}_{t}(\theta)`.
+   The normalizing flow is a sequence of invertible transformations that maps a simple distribution to the target distribution. The normalizing
+   flow is trained to approximate the target distribution using a set of particles. Sampling in the target distribution is then performed by
+   sampling from the simple distribution and applying the inverse of the normalizing flow. The normalizing flow preconditioning allows the PMC
+   algorithm to sample from complex and multimodal target distributions.
+3. **t-preconditioned Crank-Nicolson**: The PMC algorithm uses a t-preconditioned Crank-Nicolson integrator to evolve the particles in the target
+   distribution. The t-preconditioned Crank-Nicolson algorithm is an MCMC method that scales well with the dimensionality of the target distribution.
+   For targets that are close to Gaussian, the t-preconditioned Crank-Nicolson algorithm is particularly efficient and can scale to very high dimensions.
+   For non-Gaussian targets (e.g., multimodal distributions), the t-preconditioned Crank-Nicolson algorithm can be combined with the normalizing flow
+   preconditioning to sample from the target distribution efficiently even in high dimensions.
+
+Unlike traditional samplers that rely on Random-walk Metropolis, Slice Sampling, Rejection Sampling, Importance Sampling, or Independence Metropolis, PMC
+can scale to high-dimensions without desolving into random-walk behavior.
diff --git a/docs/source/blobs.ipynb b/docs/source/blobs.ipynb
@@ -0,0 +1,94 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Blobs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Sometimes it is useful to store additional metadata or derived parameters during the run so that the user does not re-compute them afterwards. This can be achieved easily in ``pocoMC`` using the *blobs* framework, inspired by ``zeus`` and ``emcee`` samplers.\n",
+    "\n",
+    "Any additional derived parameters can be returned by the log-likelihood function. The dtypes of these derived parameters should be defined using the ``blobs_dtype`` argument of the ``Sampler`` class.\n",
+    "\n",
+    "For instance, for a Gaussian likelihood (with zero mean and unit variance in 5D) where we want to store the median value of the parameters and the number of positive parameters, we would do something like this:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Iter: 25it [01:11,  2.84s/it, calls=11264, beta=1, logZ=-8.23, ESS=3.84e+3, acc=0.857, steps=2, logP=-15.1, eff=0.93]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Median:  [ 0.28968189 -2.48432868  1.40069292]\n",
+      "Number of positive parameters:  [3 1 4]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "from scipy.stats import norm\n",
+    "import pocomc as pc\n",
+    "\n",
+    "prior = pc.Prior(5*[norm(0,5)])\n",
+    "\n",
+    "def log_likelihood(x):\n",
+    "    return -0.5 * np.dot(x,x), np.median(x), np.sum(x>0, dtype=int)\n",
+    "\n",
+    "sampler = pc.Sampler(prior,\n",
+    "                    log_likelihood,\n",
+    "                    blobs_dtype=[('median', float), ('n_positive', int)],\n",
+    "                    )\n",
+    "\n",
+    "sampler.run()\n",
+    "\n",
+    "samples, weights, logl, logp, blobs = sampler.posterior(return_blobs=True)\n",
+    "\n",
+    "print(\"Median: \", blobs[\"median\"][:3])\n",
+    "print(\"Number of positive parameters: \", blobs[\"n_positive\"][:3])\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "dev-env",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/source/checkpoint.ipynb b/docs/source/checkpoint.ipynb
@@ -0,0 +1,87 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Checkpointing"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A useful option, especially for long runs, is to be able to store the state of ``pocoMC`` in a file and also the to use\n",
+    "that file in order to later continue the same run. This can help avoid disastrous situations in which a run is interrupted\n",
+    "or terminated prematurely (e.g. due to time limitation in computing clusters or possible crashes).\n",
+    "\n",
+    "Fortunately, ``pocoMC`` offers both options to save and load a previous state of the sampler."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Save\n",
+    "\n",
+    "In order to save the state of the sampler during the run, one has to specify how often to save the state in a file. This is\n",
+    "done using the ``save_every`` argument in the ``run`` method. The default is ``save_every=None`` which means that no state\n",
+    "is saved during the run. If instead we want to store the state of ``pocoMC`` every e.g. ``3`` iterations, we would do\n",
+    "something like:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sampler.run(save_every = 3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The default directory in which the state files are saved is a folder named ``states`` in the current directory. One can change\n",
+    "this using the ``output_dir`` argument when initialising the sampler (e.g. ``output_dir = \"new_run\"``). By default, the state\n",
+    "files follow the naming convention ``pmc_{i}.state`` where ``i`` is the iteration index. For instance, if ``save_every=3`` was \n",
+    "specified then the ``output_dir`` directory will include the files ``pmc_3.state``, ``pmc_6.state``, etc. One can also change\n",
+    "the label from ``pmc`` to anything else by using the ``output_label`` argument when initialising the sampler (e.g. \n",
+    "``output_label=\"grav_waves\"``)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Loading a previous state of the sampler and resuming the run from that point requires to provide the path to the specific state\n",
+    "file to the ``run`` method using the ``resume_state_path`` argument. For instance, if we want to continue the run from the \n",
+    "``pmc_3.state`` which is in the ``states`` directory, we would do:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sampler.run(resume_state_path = \"states/pmc_3.state\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -75,7 +75,7 @@
     #'collapse_navigation': True, 
     #'navigation_depth': 4,
     "announcement": (
-        "⚠️ The new release 1.0.0 includes major performance and quality-of-life updates. Please check the new syntax and features! ⚠️"
+        "⚠️ The new release 1.1.0 includes major performance and quality-of-life updates. Please check the new syntax and features! ⚠️"
     ),
     'sidebar_hide_name': True,
 }