Skip to content

Commit

Permalink
Merge pull request #265 from elfi-dev/dev
Browse files Browse the repository at this point in the history
Release 0.7.1
  • Loading branch information
vuolleko authored Apr 12, 2018
2 parents 1476806 + 79c7e9c commit b5807b1
Show file tree
Hide file tree
Showing 15 changed files with 188 additions and 55 deletions.
7 changes: 4 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,10 @@ matrix:
- mkdir -p /Users/travis/.matplotlib
- "echo 'backend: TkAgg' > /Users/travis/.matplotlib/matplotlibrc"
- brew update
- brew install python3
- virtualenv env -p python3
- source env/bin/activate
- brew upgrade python
- pip3 install virtualenv
- virtualenv py3env -p python3
- source py3env/bin/activate

cache: pip

Expand Down
10 changes: 9 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
Changelog
=========

0.7.1 (2018-04-11)
------------------
- Implemented model selection (elfi.compare_models). See API documentation.
- Fix threshold=0 in rejection sampling
- Set default batch_size to 1 in ParameterInference base class

0.7 (2017-11-30)
----------------

- Added new example: the stochastic Lotka-Volterra model
- Fix methods.bo.utils.minimize to be strictly within bounds
- Implemented the Two Stage Procedure, a method of summary-statistics diagnostics
- Added the MaxVar acquisition method
- Added the RandMaxVar acquisition method
- Added the ExpIntVar acquisition method
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
**Version 0.7 released!** See the CHANGELOG and [notebooks](https://github.com/elfi-dev/notebooks).
**Version 0.7.1 released!** See the CHANGELOG and [notebooks](https://github.com/elfi-dev/notebooks).

**NOTE:** For the time being NetworkX 2 is incompatible with ELFI.

Expand Down Expand Up @@ -28,6 +28,8 @@ Other notable included algorithms and methods:
- Bayesian Optimization
- [No-U-Turn-Sampler](http://jmlr.org/papers/volume15/hoffman14a/hoffman14a.pdf), a Hamiltonian Monte Carlo MCMC sampler

ELFI also integrates tools for visualization, model comparison, diagnostics and post-processing.

See examples under [notebooks](https://github.com/elfi-dev/notebooks) to get started. Full
documentation can be found at http://elfi.readthedocs.io/. Limited user-support may be
asked from elfi-support.at.hiit.fi, but the
Expand Down
7 changes: 7 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,13 @@ Inference API classes
:members:
:inherited-members:

**Model selection**

.. currentmodule:: .

.. autofunction:: elfi.compare_models


Other
.....

Expand Down
28 changes: 28 additions & 0 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,31 @@ produces outputs from the interval (1, 3).*
their definitions. There the uniform distribution uses the location/scale definition, so
the first argument defines the starting point of the interval and the second its length.

.. _vectorization:

*Q: What is vectorization in ELFI?*

**A**: Looping is relatively inefficient in Python, and so whenever possible, you should *vectorize*
your operations_. This means that repetitive computations are performed on a batch of data using
precompiled libraries (typically NumPy_), which effectively runs the loops in faster, compiled C-code.
ELFI supports vectorized operations, and due to the potentially huge saving in CPU-time it is
recommended to vectorize all user-code whenever possible.

.. _operations: good-to-know.html#operations
.. _NumPy: http://www.numpy.org/

For example, imagine you have a simulator that depends on a scalar parameter and produces a vector of 5
values. When this is used in ELFI with ``batch_size`` set to 1000, ELFI draws 1000 values from the
parameter's prior distribution and gives this *vector* to the simulator. Ideally, the simulator should
efficiently process all 1000 parameter cases in one go and output an array of shape (1000, 5). When using
vectorized operations in ELFI, the length (i.e. the first dimension) of all output arrays should equal
``batch_size``. Note that because of this the evaluation of summary statistics, distances etc. should
bypass the first dimension (e.g. with NumPy functions using ``axis=1`` in this case).

See ``elfi.examples`` for tips on how to vectorize simulators and work with ELFI. In case you are
unable to vectorize your simulator, you can use `elfi.tools.vectorize`_ to mimic
vectorized behaviour, though without the performance benefits. Finally, for simplicity vectorization
is not assumed (``batch_size=1`` by default).

.. _`elfi.tools.vectorize`: api.html#elfi.tools.vectorize

2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ ELFI also has the following non LFI methods:

.. _No-U-Turn-Sampler: http://jmlr.org/papers/volume15/hoffman14a/hoffman14a.pdf

Additionally, ELFI integrates tools for visualization, model comparison, diagnostics and post-processing.


.. toctree::
:maxdepth: 1
Expand Down
66 changes: 33 additions & 33 deletions docs/usage/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,9 @@ settings.
import numpy as np
import scipy.stats
import matplotlib
import matplotlib.pyplot as plt
import logging
logging.basicConfig(level=logging.INFO)
logging.basicConfig(level=logging.INFO) # sometimes this is required to enable logging inside Jupyter
%matplotlib inline
%precision 2
Expand Down Expand Up @@ -251,7 +250,7 @@ a DAG.



.. note:: You will need the Graphviz_ software as well as the graphviz `Python package`_ (https://pypi.python.org/pypi/graphviz) for drawing this. The software is already installed in many unix-like OS.
.. note:: You will need the Graphviz_ software as well as the graphviz `Python package`_ (https://pypi.python.org/pypi/graphviz) for drawing this.

.. _Graphviz: http://www.graphviz.org
.. _`Python package`: https://pypi.python.org/pypi/graphviz
Expand Down Expand Up @@ -396,8 +395,8 @@ time is spent in drawing.

.. parsed-literal::
CPU times: user 2.28 s, sys: 165 ms, total: 2.45 s
Wall time: 2.45 s
CPU times: user 1.6 s, sys: 166 ms, total: 1.77 s
Wall time: 1.76 s
The ``sample`` method returns a ``Sample`` object, which contains
Expand Down Expand Up @@ -452,8 +451,8 @@ as long as it takes to generate the requested number of samples.
.. parsed-literal::
CPU times: user 222 ms, sys: 40.3 ms, total: 263 ms
Wall time: 261 ms
CPU times: user 198 ms, sys: 35.5 ms, total: 233 ms
Wall time: 231 ms
Method: Rejection
Number of samples: 1000
Number of simulations: 40000
Expand Down Expand Up @@ -497,9 +496,9 @@ been reached or a maximum of one second of time has been used.
Method: Rejection
Number of samples: 1000
Number of simulations: 190000
Threshold: 0.0855
Sample means: t1: 0.561, t2: 0.218
Number of simulations: 180000
Threshold: 0.088
Sample means: t1: 0.561, t2: 0.221
Expand Down Expand Up @@ -547,8 +546,8 @@ in our model:
.. parsed-literal::
CPU times: user 5.26 s, sys: 37.1 ms, total: 5.3 s
Wall time: 5.3 s
CPU times: user 5.01 s, sys: 60.9 ms, total: 5.07 s
Wall time: 5.09 s
Expand All @@ -558,8 +557,8 @@ in our model:
Method: Rejection
Number of samples: 1000
Number of simulations: 1000000
Threshold: 0.036
Sample means: t1: 0.561, t2: 0.227
Threshold: 0.0363
Sample means: t1: 0.554, t2: 0.216
Expand All @@ -580,8 +579,8 @@ anything. Let's do that.
.. parsed-literal::
CPU times: user 636 ms, sys: 1.35 ms, total: 638 ms
Wall time: 638 ms
CPU times: user 423 ms, sys: 3.35 ms, total: 426 ms
Wall time: 429 ms
Expand All @@ -591,8 +590,8 @@ anything. Let's do that.
Method: Rejection
Number of samples: 1000
Number of simulations: 1000000
Threshold: 0.0452
Sample means: t1: 0.56, t2: 0.228
Threshold: 0.0457
Sample means: t1: 0.55, t2: 0.216
Expand All @@ -610,8 +609,8 @@ simulations and only have to simulate the new ones:
.. parsed-literal::
CPU times: user 1.72 s, sys: 10.6 ms, total: 1.73 s
Wall time: 1.73 s
CPU times: user 1.44 s, sys: 17.9 ms, total: 1.46 s
Wall time: 1.47 s
Expand All @@ -621,8 +620,8 @@ simulations and only have to simulate the new ones:
Method: Rejection
Number of samples: 1000
Number of simulations: 1200000
Threshold: 0.0417
Sample means: t1: 0.561, t2: 0.225
Threshold: 0.0415
Sample means: t1: 0.55, t2: 0.215
Expand All @@ -640,8 +639,8 @@ standard numpy .npy files:
.. parsed-literal::
CPU times: user 25.8 ms, sys: 3.27 ms, total: 29 ms
Wall time: 28.5 ms
CPU times: user 28.7 ms, sys: 4.5 ms, total: 33.2 ms
Wall time: 33.4 ms
This stores the simulated data in binary ``npy`` format under
Expand All @@ -658,7 +657,7 @@ This stores the simulated data in binary ``npy`` format under
.. parsed-literal::
Files in pools/arraypool_3521077242 are ['d.npy', 't1.npy', 't2.npy', 'Y.npy']
Files in pools/arraypool_3375867934 are ['d.npy', 't1.npy', 't2.npy', 'Y.npy']
Now lets load all the parameters ``t1`` that were generated with numpy:
Expand All @@ -672,7 +671,7 @@ Now lets load all the parameters ``t1`` that were generated with numpy:
.. parsed-literal::
array([ 0.79, -0.01, -1.47, ..., 0.98, 0.18, 0.5 ])
array([ 0.36, 0.47, -1.66, ..., 0.09, 0.45, 0.2 ])
Expand All @@ -687,7 +686,7 @@ We can also close (or save) the whole pool if we wish to continue later:
.. parsed-literal::
arraypool_3521077242
arraypool_3375867934
And open it up later to continue where we were left. We can open it
Expand Down Expand Up @@ -718,12 +717,12 @@ You can delete the files with:
os.listdir(arraypool.path)
except FileNotFoundError:
print("The directry is removed")
print("The directory is removed")
.. parsed-literal::
The directry is removed
The directory is removed
Visualizing the results
Expand Down Expand Up @@ -820,8 +819,9 @@ sampler:
smc = elfi.SMC(d, batch_size=10000, seed=seed)
For sampling, one has to define the number of output samples, the number
of populations and a *schedule* i.e. a list of quantiles to use for each
population. In essence, a population is just refined rejection sampling.
of populations and a *schedule* i.e. a list of thresholds to use for
each population. In essence, a population is just refined rejection
sampling.

.. code:: ipython3
Expand All @@ -839,8 +839,8 @@ population. In essence, a population is just refined rejection sampling.
.. parsed-literal::
CPU times: user 1.72 s, sys: 154 ms, total: 1.87 s
Wall time: 1.56 s
CPU times: user 1.6 s, sys: 156 ms, total: 1.75 s
Wall time: 1.38 s
We can have summaries and plots of the results just like above:
Expand Down
3 changes: 2 additions & 1 deletion elfi/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import elfi.model.tools as tools
from elfi.client import get_client, set_client
from elfi.methods.diagnostics import TwoStageSelection
from elfi.methods.model_selection import *
from elfi.methods.parameter_inference import *
from elfi.methods.post_processing import adjust_posterior
from elfi.model.elfi_model import *
Expand All @@ -24,4 +25,4 @@
__email__ = '[email protected]'

# make sure __version_ is on the last non-empty line (read by setup.py)
__version__ = '0.7'
__version__ = '0.7.1'
59 changes: 59 additions & 0 deletions elfi/methods/model_selection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
"""This module contains methods for model comparison and selection."""

import numpy as np


def compare_models(sample_objs, model_priors=None):
"""Find posterior probabilities for different models.
The algorithm requires elfi.Sample objects from prerun inference methods. For example the
output from elfi.Rejection.sample is valid. The portion of samples for each model in the top
discrepancies are adjusted by each models acceptance ratio and prior probability.
The discrepancies (including summary statistics) must be comparable so that it is
meaningful to sort them!
Parameters
----------
sample_objs : list of elfi.Sample
Resulting Sample objects from prerun inference models. The objects must include
a valid `discrepancies` attribute.
model_priors : array_like, optional
Prior probability of each model. Defaults to 1 / n_models.
Returns
-------
np.array
Posterior probabilities for the considered models.
"""
n_models = len(sample_objs)
n_min = min([s.n_samples for s in sample_objs])

# concatenate discrepancy vectors
try:
discrepancies = np.concatenate([s.discrepancies for s in sample_objs])
except ValueError:
raise ValueError("All Sample objects must include valid discrepancies.")

# sort and take the smallest n_min
inds = np.argsort(discrepancies)[:n_min]

# calculate the portions of accepted samples for each model in the top discrepancies
p_models = np.empty(n_models)
up_bound = 0
for i in range(n_models):
low_bound = up_bound
up_bound += sample_objs[i].n_samples
p_models[i] = np.logical_and(inds >= low_bound, inds < up_bound).sum()

# adjust by the number of simulations run
p_models[i] /= sample_objs[i].n_sim

# adjust by the prior model probability
if model_priors is not None:
p_models[i] *= model_priors[i]

p_models = p_models / p_models.sum()

return p_models
Loading

0 comments on commit b5807b1

Please sign in to comment.