Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce the FID/IS result of ImageNet? #14

Open
JohnDreamer opened this issue May 26, 2022 · 4 comments
Open

Cannot reproduce the FID/IS result of ImageNet? #14

JohnDreamer opened this issue May 26, 2022 · 4 comments

Comments

@JohnDreamer
Copy link

Hi, it is a great work! These days, I have trained the model with configs/imagenet.yaml. The model is trained on 8 gpus and 100 epochs. But the FID result is only 20, which is far away from 11.89 reported in your paper. I evaluate model with the following steps: (1) process all the training images: resize the shot edge of the image to 256 and center crop it to 256*256; (2) sample the images, 50 images per class, totally 50 * 1000 = 50K images; (3) use torch-fidelity to calculate the 50K sampled images with the processed training images. Do I make any mistake? Are 100 epochs enough for training? Could you please provide some details about training ImageNet, such as epoch? It will be very appreciated to get the reply from you! Thanks!

@cientgu
Copy link
Owner

cientgu commented Jun 2, 2022

Sorry for the late reply. I think you may notice two important things. First, in the sampling process, the truncation rate is extremely important, we searched the best truncation rate and select it as 0.869 in our model to test FID. Second, we followed the previous works like GLIDE, VQGAN to calculate FID between 50k generated images and all the training images. Personally I don't think it's a good evaluation metric, however, it do achieve a lower FID score.

@JohnDreamer
Copy link
Author

Sorry for the late reply. I think you may notice two important things. First, in the sampling process, the truncation rate is extremely important, we searched the best truncation rate and select it as 0.869 in our model to test FID. Second, we followed the previous works like GLIDE, VQGAN to calculate FID between 50k generated images and all the training images. Personally I don't think it's a good evaluation metric, however, it do achieve a lower FID score.

Thanks for the reply. (1) The truncation rate, I use, is the default setting, 0.86. Will it cause a big result difference with 0.869? (2) The training script, you provide, show the training epoch is 100, is that enough? And how many gpus do you use for training ImageNet? (3) The process of the training set for calculate FID: a. only resize each image to 256256; b. resize the shot edge of the image to 256 and center crop it to 256256, which do you use? (4) How to sample test images? Do you sample the image of the same number (50) for each class?
I really want to figure out the key point causing the result difference. It will be very appreciated to get the reply from you! Thanks!

@cientgu
Copy link
Owner

cientgu commented Jun 12, 2022

(1) I think it will only have a slight difference. (2) we only trained it for 100 epochs, I'm not sure if more epochs will improve the performance. (3) we should use "ImageNetTransformerPreprocessor" in "https://github.com/cientgu/VQ-Diffusion/blob/main/image_synthesis/data/utils/image_preprocessor.py", however, we use "DalleTransformerPreprocessor", it seems to be our mistake, sorry about it, and currently we don't know how much it affect the results. (4) Yes, you are right.
Besides, our follow-up work "Improved VQ-Diffusion" greatly improves the performance on ImageNet, and we have released the pretrained model, maybe it can help you.

@guyuchao
Copy link

I met the same problem. I test the pre-trained checkpoint in "Improved VQ-Diffusion" and only achieved the 20.4958 FID. I use the script provided in 'inference_VQ_Diffusion.py' in microsoft/VQ-diffusion.

VQ_Diffusion_model = VQ_Diffusion(config='OUTPUT/pretrained_model/config_imagenet.yaml', path='OUTPUT/pretrained_model/imagenet_pretrained.pth')
VQ_Diffusion_model.inference_generate_sample_with_class(407, truncation_rate=0.86, save_root="RESULT", batch_size=4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants