Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relaxing constant sigma assumption #7

Open
kevinykuo opened this issue Feb 5, 2019 · 3 comments
Open

Relaxing constant sigma assumption #7

kevinykuo opened this issue Feb 5, 2019 · 3 comments

Comments

@kevinykuo
Copy link

If we wanted to make this bit

# for the toy example, assume y* ~ N(mu, sigma) with fixed sigma
sigma_star <- tf$constant(noise_sd, dtype = tf$float32)
list(mu = mu_star, sigma = sigma_star)

more general, what would be the correct way to do it? Would we try to estimate it from the n_draws draws of each of the y* predictions?

@kasparmartens
Copy link
Owner

You could make noise_sd a parameter and try to learn it (for numerical stability, you probably want to lower- and upper-bound it).

@kevinykuo
Copy link
Author

Thanks for the reply! Do you mean e.g. outputting another quantity connected to hidden?

hidden <- input %>%
tf$layers$dense(dim_g_hidden, tf$nn$relu, name = "decoder_layer1", reuse = tf$AUTO_REUSE)
# mu will be of the shape [N_star, n_draws]
mu_star <- hidden %>%
tf$layers$dense(1L, name = "decoder_layer2", reuse = tf$AUTO_REUSE) %>%
tf$squeeze(axis = 2L) %>%
tf$transpose()

That seems straightforward but I wasn't sure how to justify it since the decoder looked like it should only predict the target y's and we needed to obtain the variance elsewhere. But I guess we would be taking the samples of z in the input so that randomness is accounted for.

@kasparmartens
Copy link
Owner

Depends what kind of noise model you want to assume. The most natural one would probably be the one which assumes constant noise. E.g. in the GP-regression model, typical choice for p(y|f, x) would be Normal distribution with mean f(x) and variance \sigma^2, i.e. the latter would not depend on input x. In this case, \sigma^2 would be a single variable (not parameterised by a network).

If we are interested in scenarios where noise level varies with x, then we could indeed consider parameterising \sigma^2 along the lines as you described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants