-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement some predictive model transformations #141
base: main
Are you sure you want to change the base?
Conversation
pymc_experimental/model_transform.py
Outdated
forecast_timeseries_rewrite = in2out(forecast_timeseries_node_rewrite, ignore_newtrees=True) | ||
|
||
|
||
def forecast_timeseries(model: Model, forecast_steps: int) -> Model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could maybe request a replace_dims={"old_dim": "new_dim"}
. Right now any dims of deterministics/variables below the timeseries that corresponded to to the initial timeseries steps will be carried over (incorrectly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the GP predictions in mind with this example, but this was a bit simpler as all the graph information is available in the model. For more context see https://gist.github.com/ricardoV94/c16693680d55294eb73feed7c9997421
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, still have your gist up in a tab.. I'm thinking that a really nice way to conceptualize GPs, or AR models like this, would just be as functions.
with pm.Model() as model:
X = pm.MutableData("X", X_)
gp = pm.gp.Latent(cov_func=cov_func)
f = gp(X) # formerly gp.prior(...)
...
pm.sample()
with model:
pm.set_data(X_new)
# or maybe pm.set_data({"X": X_new}, replace_dims={"X": new_coords})
Is all that's needed is to tuck your function here into pm.set_data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to create a new model and not mess with the original one. In these cases the predictive model is different than the prior model, which is not such an unreasonable thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, it'd be nice from a user perspective though if there was one way to do this regardless of whether its a linear model, AR, or GP.
My bad, obv you cant return a new model somehow within pm.set_data
. What if the syntax switched a bit, new_model = pm.set_data(X_new)
(or a new function, pm.update_data
) and could handle the diff cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API I have in mind for GPs would be:
with predict_gp(gp1=x1, gp2=x2, ...) as pred_m:
pm.sample_posterior_predictive()
That feels much more intuitive.
Procedurally set_data
doesn't seem like good analogy for GPs. We don't want to override the initial x, we want to expand them (and apply a specific multivariate normal to that expansion). If it was called pm.expand_data
it would be closer, but again I don't think it's that intuitive.
I think sample
is actually a good counter example, that code is really hard to understand and maintain. Users have no idea how to do something less standard like resuming sampling which libraries like blackjax and numpyro do much better.
Also a more practical if silly reason is that we can't create a distinct model inside the context of another. The variables would be added to the parent. And you can't call set_data
outside of a model.
But instead of me arguing against it, do you see a problem with this approach? Do you think it would be hard to use, or to find or both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the gp1=x, gp2=x2, ...
part work? is that setting X for each gp object in the model?
The details of predicting with a GP could be abstracted away though. It'd be nice if a user could just think of GPs as random functions f(x)
that take some x
and return a distribution. So you sample initially with some x
, then you can plug in new x
and get the new GP at x. Which is true, the stuff about conditionals and non-parametric and all that is kind of an implementation detail, and there to make working with GPs more efficient. It's just as correct to fit the whole thing with NUTS, a GP over x (where data is observed) and x_new (where it isnt) up front. It's too slow though.
One potential problem (don't know if it necessarily is), is how this would interact with additive GPs, example here. Could you apply this to predict a component of an additive GP? (I think its def possible, predict_gp
might need some args for this, just wanted to keep it in mind as a potential complication)
It also kind of looks like you're building higher level model components, so censoring, gps, timeseries, which all use this function chaining for uncensoring/prediction/forecasting whatever is natural. Even though you could do it with pm.set_data
, you could additional build a predict_lm
function that works the same more function chaining way too.
If you had a linear model as the mean of a GP, or equiv.,
with predict_gp(...) as pred_m:
pm.set_data({"X": x_new})
pm.sample_posterior_predictive()
(or maybe, this one?)
with predict_gp(predict_lm(...)) as pred_m:
pm.sample_posterior_predictive()
Anyway, please don't let this discussion be a drag! I feel there's a few threads here and theres a lot to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential problem (don't know if it necessarily is), is how this would interact with additive GPs, example here. Could you apply this to predict a component of an additive GP? (I think its def possible, predict_gp might need some args for this, just wanted to keep it in mind as a potential complication)
I have to read a bit about that to understand it.
If you had a linear model as the mean of a GP, or equiv., , would you do
Either would be possible! The GLM
part usually doesn't need any graph transformation (that's why it already works). Although you could use it replace ConstantData predictors by a new one if you wanted to. This way a PyMC model can sample with constant predictors, which may allow more efficient graphs, than when we use shared variables where it has to assume those could change values/shape any time.
But I would probably go for the first (not implement a predict_lm
transform). The predict_gp
, creates a new model with the prediction component inserted right after where the original gp
was (conditioned on the new_X). After that you can still set any old MutableData
variables you had in the original model as you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, please don't let this discussion be a drag! I feel there's a few threads here and theres a lot to think about.
This was the point of me opening the PR, so not a drag at all. The idea is to experiment and see if something works well enough. Maybe I should add a gp_predict
transform in this PR to see how it looks like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay definitely, I'm curious how it looks.
The predict_gp, creates a new model with the prediction component inserted right after where the original gp was (conditioned on the new_X). After that you can still set any old MutableData variables you had in the original model as you want.
Would mutating X
affect the original GP though? Like, if you have
f = gp.prior("f", X)
...
pm.set_data({"X": x_new})
you might expect something to happen to f
?
Also, I'd be curious to check the graph after .prior
has been replaced by .conditional
to see if there is an X
as input somewhere still hanging around.
15b78db
to
96fbc30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being late to the party here. I'm catching up with all of your discussion but I haven't read all of the stuff you wrote about using this for GPs.
To be honest, I think that you started with two rather difficult and specific applications. I suggest that you do a couple of things differently:
- Make
model_transform
a directory (subpackage) - Have model transformations split within according to some logic, instead of a monolithic script
- Add a transformation called
observe
and one calledunobserve
. The idead behind these two is to move a random variable between a freeRV and an observedRV. - Add a transformation called
impute
that takes any random variable and replaces it with aTensorConstant
of the appropriate size and dtype. This, along withobserve
will serve as the basis for do-calculus and would open the door of practically all of causal inference to us.
pymc_experimental/model_transform.py
Outdated
# FIXME: This special logic shouldn't be needed for ObservedRVs | ||
# but PyMC does not allow one to not resample observed. | ||
# We hack around by conditioning on the value variable directly, | ||
# even though that should not be part of the generative graph... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. This special logic IS needed for ObservedRVs because forecasting is different than posterior predictive.
Posterior predictive does something like this:
draw other potential values of the
Forecasting does this:
So it samples from a different conditional distribution than the posterior predictive. That's why we do need to manually add the past y_{1}^{obs}, ..., y_{n}^{obs}
values explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I think of it observed variables are just FreeRVs that had constant values during sampling. If I can condition on FreeRVs traced values I should also be able to condition on ObservedRV traced values (in the constant/observed data group) without having to hack across the value variable representation on the forward graph.
Similarly, if I had a Deterministic based on an ObservedRV (as happens with imputation), why can't I now make predictions based on those traced values without forcing sampling from the likelihood?
The difference between forecasting vs posterior predictive (+ forecasting) would be controllable by including or excluding the ObservedRV in the var_names
kwarg in posterior predictive, which would behave exactly the same as with FreeRVs
Any preference for a TensorConstant instead of SharedVariable? The latter can be modified without needing to reconstruct the graph |
96fbc30
to
53fcdf9
Compare
Following up here with the conversation I had with @ricardoV94 on discourse so it doesn't get lost -- Another nice transformation would be one that takes in an MvN distribution and marginalizes over observed data, returning a new conditional distribution over the set complement to the observations. It would look something like this: from pytensor.tensor.random.basic import multivariate_normal, MvNormalRV
from pytensor.tensor.nlinalg import matrix_dot
import pytensor.tensor as pt
import pymc as pm
def marginalize_over_observations(dist, observations):
if not isinstance(dist.owner.op, MvNormalRV):
raise NotImplementedError
# Is this always true?
*_, mu, cov = dist.owner.inputs
observations = pt.as_tensor_variable(observations)
is_nan = pt.isnan(observations)
missing_idx = pt.nonzero(is_nan, return_matrix=False)[0]
obs_idx = pt.nonzero(pt.neq(is_nan, np.array(True)), return_matrix=False)[0]
n_missing = missing_idx.shape[0]
n_obs = obs_idx.shape[0]
n = n_missing + n_obs
# Z is a map from R_n to R_observed
# K is a map from R_n to R_missing
Z = pt.set_subtensor(pt.zeros((n_obs, n))[pt.arange(n_obs), obs_idx], 1)
K = pt.set_subtensor(pt.zeros((n_missing, n))[pt.arange(n_missing), missing_idx], 1)
resid = observations[obs_idx] - Z @ mu
# TODO: Benchmark this vs indexing on a large problem, gains from BLAS?
cov_uu = matrix_dot(K, cov, K.T)
cov_uo = matrix_dot(K, cov, Z.T)
cov_oo = matrix_dot(Z, cov, Z.T)
# This could be sped up if we got chol instead of cov, because we could to solve_triangular twice to invert
# Typical MvN parameterization is in terms of chol, so there's a lot of waste going back and forth
# over and over?
cov_oo_inv = pt.linalg.solve(cov_oo, pt.eye(n_obs), assume_a="pos", check_finite=False)
beta = cov_uo @ cov_oo_inv
conditional_mu = Z @ mu + beta @ resid
conditional_cov = cov_uu - matrix_dot(beta, cov_oo, beta.T)
return multivariate_normal(mean=conditional_mu,
cov=conditional_cov,
dtype=dist.dtype) Simple usage:
This could then be applied to a model with 1) A list of MvN Rvs to re-write, and 2) a list of observations to condition them on. I will have a look at how the other functions in this PR work and see if I can have a try, but I actually don't know how to PR into a PR :> |
@jessegrabowski you can push directly to my branch if you have the git permissions. Otherwise you can fork my branch and open a PR with that as a base. If that doesn't work just open a PR directly on main because I will have the permissions to work on it anyway. About the Cholesky thing we added some rewrites recently in PyTensor that work as long as the right "tags" are on the variables (e.g. "lower_triangular". You can check it in: pymc-devs/pymc@ee1657b Would those cover the case here? If not, can we add the rewrite in PyTensor? |
About the observations could we just request the new ones and combine with the old observed values from the model/trace |
The rewrite handles my quip about going back and forth, since it looks like now the canonical MvN will be parameterized by the cholesky factorized covariance matrix? I was surprised to see that if you give One last thing I just thought of: this "conditionalization" has implications for missing data imputation in MvN likelihoods -- the missing values should be filled with these conditional distributions. I know you just did a PR that brought back multivariate imputation, but I didn't carefully look to see what goes into the missing values now |
The default parametrization of the MvNormal is still the covariance. What we did was to add rewrites to remove useless operations when you tag that some variables where already lower triangular (which the outputs of LKJ are tagged with). You should still write the code expecting a full covariance. The question is if the rewrites that we have right now would go from the full covariance graph to the optimized one as long as the tags are provided? If not, we could open an issue on Pytensor to do it. Maybe @dehorsley would be interested in also implementing those? |
About the imputation, yes in that case you shouldn't need anything else but it can be pretty inneficient to sample the variables with NUTS so it's nice if we can get them afterwards using the conditioning algebra. CC @bwengals as this is pretty much the same thing with GPs |
Companionship to #111
Exploring some possibilities and also understanding the limitations/ design issues.
This is just a draft to guide design decisions for now.