How to reproduce this visualization from the VQVAE 2 paper #8

theAdamColton · 2023-02-13T15:13:08Z

The original paper had this cool graphic in it, which showed what I believe is a decoded representation of different parts of the network. But I don't understand how in practice you could obtain a decoded image using only the top level FFHQ encoder representation. In the case of the three level FFHQ model, the final decoder layer is applied to a concatenation of the upscaled middle layer and the double upscaled top layer, and expects 192 layers.

Is there a way, only using information from the top level encoded quantized representation, to get an image out of the network?

vvvm23 · 2023-02-13T22:09:31Z

This is something I also never quite understood about the original paper, nor have I explored myself. So I can't really answer this question. It could be something as naive as passing a zero tensor as a substitute for the lower level codes in the final decoder, but only the original authors know.

Thanks for bringing this to my attention, I am currently working on a refactor of this repo (see #5) so I might investigate this once that is done. There are actually quite a lot of unclear things in the paper that we may never know for sure how it was done for the paper.

theAdamColton closed this as completed Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reproduce this visualization from the VQVAE 2 paper #8

How to reproduce this visualization from the VQVAE 2 paper #8

theAdamColton commented Feb 13, 2023

vvvm23 commented Feb 13, 2023

How to reproduce this visualization from the VQVAE 2 paper #8

How to reproduce this visualization from the VQVAE 2 paper #8

Comments

theAdamColton commented Feb 13, 2023

vvvm23 commented Feb 13, 2023