You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There may be a mistake in the reported bits per dimension for your CIFAR10 experiments.
In particular, it seems like they are computed by this code, which either accounts for normalization (dividing by 256) or adding "sigma noise".
Very good point, thank you! I don't think we originally intended to use both at the same time (ablations showed no difference whether de-quantization was on or off). The situation is not very clear cut, but it seems that the if/else is correct (I will however add an explaining comment in the code).
I should also say that there are no real implications for the BPD numbers in the paper, as all the models are trained in the same way (so they will all be offset by the same constant), and we do not compare to numbers from other works.
Dequantization here does not mean to divide by 256 (that has no effect on BPD of discrete values), but to change every discrete delta-peak into a uniform distribution with width 1/256.
From the theory standpoint, the uniform dequantization affects the original data distribtion p(X),
and the "sigma-noise" determines the noise distribution p(Epsilon) later leading to p(X + Epsilon).
One could argue (see Appendix A.4), that we should only be including the Epsilon noise for the BPD, regardless of dequantization.
On the other hand, this gives unusual values that don't coincide with the convention of including the dequantization used in most other papers.
If we want to include both the Epsilon noise and the dequantization, computing the constant is not trivial, because the two effects are convolved. In that case, by only including one, we get a bound on the true BPD (same as argued by Theis et al, arxiv.org/abs/1511.01844, although they miss the fact that you can compute the difference between bound and true value if the dequantization is non-overlapping, as most other papers do when computing BPD).
So in conclusion, it's not easy to say, but I think the code currently is the best solution (also for people in future who run the code to reproduce the numbers from the paper)
Hi,
There may be a mistake in the reported bits per dimension for your CIFAR10 experiments.
In particular, it seems like they are computed by this code, which either accounts for normalization (dividing by 256) or adding "
sigma
noise".Code copy-pasted here:
However, it seems from the config files that both
cfg['data']['dequantize_uniform']
andcfg['data']['sigma_noise']
are turned on at once.Please correct me if I am wrong.
Once again, thanks for a very well done code base.
The text was updated successfully, but these errors were encountered: