Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue #5

Open
fonnesbeck opened this issue Feb 26, 2015 · 2 comments
Open

Memory issue #5

fonnesbeck opened this issue Feb 26, 2015 · 2 comments

Comments

@fonnesbeck
Copy link
Member

Moved from pymc-devs/pymc#543

Connecting a single Stochastic variable to a large number of other Stochastic variables takes a lot of memory. E.g.

def create_model(i, a):
    b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
    return locals()

a = pymc.Uniform('a', lower=0., upper=100., value=.1)
l = [create_model(i, a) for i in range(10000)]
model = pymc.Model(l)

while having twice as much not connected variables is fine :

def create_model(i):
    a = pymc.Uniform('a_%i' % i, lower=0., upper=100., value=.1)
    b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
    return locals()

l = [create_model(i) for i in range(10000)]
model = pymc.Model(l)
@fonnesbeck
Copy link
Member Author

Doing a little digging using the memory profiler, first for the "connected" model:

$ python -m memory_profiler connected.py
Filename: connected.py

Line #    Mem usage    Increment   Line Contents
================================================
    10  179.062 MiB    0.000 MiB       l = [create_model(i, a) for i in range(1000)]


Filename: connected.py

Line #    Mem usage    Increment   Line Contents
================================================
     3   82.871 MiB    0.000 MiB   @profile
     4                             def main():
     5   82.871 MiB    0.000 MiB       def create_model(i, a):
     6                                     b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
     7                                     return locals()
     8                             
     9   82.949 MiB    0.078 MiB       a = pymc.Uniform('a', lower=0., upper=100., value=.1)
    10  179.062 MiB   96.113 MiB       l = [create_model(i, a) for i in range(1000)]
    11  247.961 MiB   68.898 MiB       model = pymc.Model(l)


Filename: connected.py

Line #    Mem usage    Increment   Line Contents
================================================
     5  178.930 MiB    0.000 MiB       def create_model(i, a):
     6  179.062 MiB    0.133 MiB           b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
     7  179.062 MiB    0.000 MiB           return locals()

and for the "unconnected"

$ python -m memory_profiler unconnected.py
Filename: unconnected.py

Line #    Mem usage    Increment   Line Contents
================================================
     3   82.832 MiB    0.000 MiB   @profile
     4                             def main():
     5   82.832 MiB    0.000 MiB       def create_model(i):
     6                                     a = pymc.Uniform('a_%i' % i, lower=0., upper=100., value=.1)
     7                                     b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
     8                                     return locals()
     9                             
    10  108.156 MiB   25.324 MiB       l = [create_model(i) for i in range(1000)]
    11  115.336 MiB    7.180 MiB       model = pymc.Model(l)


Filename: unconnected.py

Line #    Mem usage    Increment   Line Contents
================================================
    10  108.156 MiB    0.000 MiB       l = [create_model(i) for i in range(1000)]


Filename: unconnected.py

Line #    Mem usage    Increment   Line Contents
================================================
     5  108.129 MiB    0.000 MiB       def create_model(i):
     6  108.141 MiB    0.012 MiB           a = pymc.Uniform('a_%i' % i, lower=0., upper=100., value=.1)
     7  108.156 MiB    0.016 MiB           b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
     8  108.156 MiB    0.000 MiB           return locals()

I have also confirmed that the connected model is not somehow creating new PyMC objects (at least as far as I can tell), and that the size of the individual variables in each model is identical, via sys.getsizeof(model.variables.pop()).

So, this is still a mystery. Need to do deeper profiling, I suppose.

@abojchevski
Copy link

Any information on the memory issue?

The following relatively simple (stochastic block) model has tremendous memory usage even for small number of samples (e.g. n=100).

Or am I doing something wrong in the model definition?

import numpy as np
import pymc as pm

# generate random block matrix
n = 50
A11 = np.random.rand(n, n) > 0.3
A12 = np.random.rand(n, n) > 0.9
A21 = np.random.rand(n, n) > 0.9
A22 = np.random.rand(n, n) > 0.3

A_obs = np.bmat([[A11, A12], [A21, A22]])

N = A_obs.shape[0]
K = 2

# define model
pi = pm.Dirichlet('pi', theta=0.5 * np.ones(K))
eta = pm.Container([[pm.Beta('b_{}{}'.format(i, j), alpha=1, beta=1) for i in range(K)] for j in range(K)])

q = pm.Container([pm.Categorical('q_{}'.format(i), p=pi) for i in range(N)])

A = pm.Container([[pm.Bernoulli('A_{}_{}'.format(i, j),
                                p=pm.Lambda('A_lambda_{}_{}'.format(i, j),
                                            lambda qi=q[i], qj=q[j], eta=eta: eta[qi][qj]),
                                value=A_obs[i, j], observed=True) for i in range(N)] for j in range(N)])

# sample
mcmc = pm.MCMC([A, q, pi, eta])
trace = mcmc.sample(200)

print(np.array(q.value))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants