Interested in reproducing MNIST - training seems to focus on NLL only #8

adam2392 · 2024-10-15T20:05:49Z

Hi,

I was interested in trying to train a FIF on MNIST down to a latent dimensionality of 128, or even 12 to compare the compression.
I am using an encoder/decoder, similar to Table 11: https://arxiv.org/pdf/2306.01843. However, I am using a slightly different network, where the hidden dimensionality of the decoder and encoder are the same. If interested, I have the gist.

During training of the model with beta=100, I am observing that the NLL consistently decreases, while the reconstruction loss stagnates. For example, here is a snippet of my training log:

Epoch 446:   0%|          | 0/54 [00:00<?, ?it/s]         
train_loss: -311.244 | recon_loss: 0.920 | nll_loss: -403.208 | surrogate_loss: 0.677
Epoch 447:   0%|          | 0/54 [00:00<?, ?it/s]         
train_loss: -312.497 | recon_loss: 0.922 | nll_loss: -404.648 | surrogate_loss: 1.268
Epoch 448:   0%|          | 0/54 [00:00<?, ?it/s]         
train_loss: -312.804 | recon_loss: 0.920 | nll_loss: -404.820 | surrogate_loss: 0.932
Epoch 449:   0%|          | 0/54 [00:00<?, ?it/s]         
train_loss: -313.825 | recon_loss: 0.921 | nll_loss: -405.916 | surrogate_loss: 1.183
Epoch 449: 100%|██████████| 54/54 [00:04<00:00, 11.23it/s]    val_loss: -313.076
Epoch 450:   0%|          | 0/54 [00:00<?, ?it/s]         t/s]
train_loss: -313.134 | recon_loss: 0.921 | nll_loss: -405.244 | surrogate_loss: 0.111
Epoch 451:   0%|          | 0/54 [00:00<?, ?it/s]         
train_loss: -313.543 | recon_loss: 0.925 | nll_loss: -406.025 | surrogate_loss: 0.120
Epoch 452:   0%|          | 0/54 [00:00<?, ?it/s]         
train_loss: -314.921 | recon_loss: 0.921 | nll_loss: -407.062 | surrogate_loss: 0.177
Epoch 453:   0%|          | 0/54 [00:00<?, ?it/s]         
train_loss: -316.010 | recon_loss: 0.923 | nll_loss: -408.296 | surrogate_loss: 0.582
Epoch 454:   0%|          | 0/54 [00:00<?, ?it/s]         
train_loss: -317.485 | recon_loss: 0.921 | nll_loss: -409.606 | surrogate_loss: 1.434
Epoch 454: 100%|██████████| 54/54 [00:04<00:00, 11.77it/s]    val_loss: -317.109
Epoch 455:   0%|          | 0/54 [00:00<?, ?it/s]         t/s]
train_loss: -317.156 | recon_loss: 0.920 | nll_loss: -409.162 | surrogate_loss: 0.394

As a result, my overall training loss decreases, but my reconstruction loss looks like it plateaus around 0.920. It seems this problem is alluded to in the paper, but it suggests that higher betas will work. I am just wondering if there is any intuition on how to fix this problem?

The training seems to just focus on minimizing negative log likelihood, and so the sampled images look fine, but not great. The output of a sampled image:

@torch.no_grad()
    def sample(self, num_samples=16, **params):
        """
        Sample a batch of images from the flow.
        """
        # sample latent space and reshape to (batches, 1, embed_dim)
        v = self.latent.sample(num_samples, **params)
        v = v.reshape(num_samples, 1, -1)
        return self.decoder(v)

The text was updated successfully, but these errors were encountered:

wj-gxy · 2024-10-30T01:54:52Z

Hi, based on the example in the repository, it seems that the latent dimension is set?
May I ask how the dimension of latent is set for the image dataset you are doing?

adam2392 · 2024-10-30T02:00:35Z

Hi thanks for the reply!

Hi, based on the example in the repository, it seems that the latent dimension is set? May I ask how the dimension of latent is set for the image dataset you are doing?

Yes the latent dimension is set arbitrarily to e.g. 12, 64, or 128. And the image dataset I am using is the standard MNIST 1x28x28. Is this what your question meant? I tried 64 and 128 because those are mentioned in the paper. The latent prior is just a standard Gaussian.

In terms of the hidden dimensions within each ResidualBlock, I just set it according to table 11.

FYI: It could be that my encoder/decoder block are not expressive enough(?), which I am actively exploring, but I did try to replicate the blocks exactly as done in the paper in the Table 11.

fdraxler · 2024-10-30T08:38:21Z

Hi, did you try with the MNIST setting that we provide in https://github.com/vislearn/FFF/blob/main/configs/fif/mnist.yaml? IIRC correctly this yields useful generations on MNIST.

Overall, the reconstruction loss looks relatively high at 0.9, but this might be due to the bottleneck. Maybe you can confirm with a non-FIF autoencoder what reconstruction error to expect and otherwise increase the weight for the reconstruction loss (don‘t hesitate to change the order of magnitude).

wj-gxy · 2024-10-30T10:38:06Z

Thank you very much for your reply, yes I am in the same boat, but I am utilizing a speech dataset and I need to map the speech to a standard Gaussian distribution with defined dimensions, however, at the moment it is difficult. Of course, it could be that my encoder and decoder settings are not good enough. If possible, would it be possible to have a look at your code for training MNIST, would be appreciated.

wj-gxy · 2024-10-31T08:13:07Z

Train mnist with: python -m lightning_trainable.launcher.fit configs/fif/mnist.yaml --name '{data_set[name]}'. Error reported: shapes do not match for intermediate reconstruction 0: torch.size([512,16]) vs torch..size([512,784]). Troubleshooting didn't find the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interested in reproducing MNIST - training seems to focus on NLL only #8

Interested in reproducing MNIST - training seems to focus on NLL only #8

adam2392 commented Oct 15, 2024

wj-gxy commented Oct 30, 2024

adam2392 commented Oct 30, 2024

fdraxler commented Oct 30, 2024

wj-gxy commented Oct 30, 2024

wj-gxy commented Oct 31, 2024

Interested in reproducing MNIST - training seems to focus on NLL only #8

Interested in reproducing MNIST - training seems to focus on NLL only #8

Comments

adam2392 commented Oct 15, 2024

wj-gxy commented Oct 30, 2024

adam2392 commented Oct 30, 2024

fdraxler commented Oct 30, 2024

wj-gxy commented Oct 30, 2024

wj-gxy commented Oct 31, 2024