-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot reproduce the FID/IS result of ImageNet? #14
Comments
Sorry for the late reply. I think you may notice two important things. First, in the sampling process, the truncation rate is extremely important, we searched the best truncation rate and select it as 0.869 in our model to test FID. Second, we followed the previous works like GLIDE, VQGAN to calculate FID between 50k generated images and all the training images. Personally I don't think it's a good evaluation metric, however, it do achieve a lower FID score. |
Thanks for the reply. (1) The truncation rate, I use, is the default setting, 0.86. Will it cause a big result difference with 0.869? (2) The training script, you provide, show the training epoch is 100, is that enough? And how many gpus do you use for training ImageNet? (3) The process of the training set for calculate FID: a. only resize each image to 256256; b. resize the shot edge of the image to 256 and center crop it to 256256, which do you use? (4) How to sample test images? Do you sample the image of the same number (50) for each class? |
(1) I think it will only have a slight difference. (2) we only trained it for 100 epochs, I'm not sure if more epochs will improve the performance. (3) we should use "ImageNetTransformerPreprocessor" in "https://github.com/cientgu/VQ-Diffusion/blob/main/image_synthesis/data/utils/image_preprocessor.py", however, we use "DalleTransformerPreprocessor", it seems to be our mistake, sorry about it, and currently we don't know how much it affect the results. (4) Yes, you are right. |
I met the same problem. I test the pre-trained checkpoint in "Improved VQ-Diffusion" and only achieved the 20.4958 FID. I use the script provided in 'inference_VQ_Diffusion.py' in microsoft/VQ-diffusion. VQ_Diffusion_model = VQ_Diffusion(config='OUTPUT/pretrained_model/config_imagenet.yaml', path='OUTPUT/pretrained_model/imagenet_pretrained.pth') |
Hi, it is a great work! These days, I have trained the model with configs/imagenet.yaml. The model is trained on 8 gpus and 100 epochs. But the FID result is only 20, which is far away from 11.89 reported in your paper. I evaluate model with the following steps: (1) process all the training images: resize the shot edge of the image to 256 and center crop it to 256*256; (2) sample the images, 50 images per class, totally 50 * 1000 = 50K images; (3) use torch-fidelity to calculate the 50K sampled images with the processed training images. Do I make any mistake? Are 100 epochs enough for training? Could you please provide some details about training ImageNet, such as epoch? It will be very appreciated to get the reply from you! Thanks!
The text was updated successfully, but these errors were encountered: