Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the performance of MAE not match with the paper results in ImageNet #313

Closed
HuangChiEn opened this issue Dec 17, 2022 · 22 comments
Closed

Comments

@HuangChiEn
Copy link

HuangChiEn commented Dec 17, 2022

Firstly, thanks for your release of such an amazing framework, which almost covers all of the SOTA-SSL methods.

Do you mind to look into why the performance of MAE does not match the paper results in ImageNet? This paper recorded 82.1% top-1 acc for 100 epoch pretraining on ViT-base architecture with 4096 batch size on ImageNet dataset (100 epoch for fine-tuning).

They mention the results come from performing the official code on 100 epoch / 300 epoch and 1600 epoch. For the 300 epoch and 1600 epoch, we also find the accuracy is matched with the other paper, so we think the 100 epoch results is also verified.

On the other hand, we use the solo-lean of this version to run the mae pretraining and the pretraining configuration as well as the procedure tracing can be found in the link.

We keep the exactly same configuration, but the resulting performance of top-1 accuracy is only 77.4%, which is lower then the aforementioned 82.1% about 4%, and i think it's largely beyond the random seed and acceptable variance of experiments.

In addition, the fine-tuning configuration as well as the finetuning tracing can be found in the link.

All the above configurations of finetuning match the pretraining script of official implementation (except the num of epochs).
Any suggestion is appreciated !!

@vturrisi
Copy link
Owner

Hey,

Thanks for letting us know about this. I'm a bit busy until probably next year, but I'll try to check before then. Nonetheless, try to see if the parameters that we have are the same as in the original paper as we might have missed some.

I'll try to check it myself as soon as I can.

@HuangChiEn
Copy link
Author

thanks for your help ~
We're also try to figure out this issue, i just keep this issue open in here

@HuangChiEn
Copy link
Author

HuangChiEn commented Dec 27, 2022

Hey,

Thanks for letting us know about this. I'm a bit busy until probably next year, but I'll try to check before then. Nonetheless, try to see if the parameters that we have are the same as in the original paper as we might have missed some.

I'll try to check it myself as soon as I can.


Newly update :

Hello, this several week, we found that the start_lr (or warmup_start_lr in my config) should be setup to exactly 0 in fine-tuning stage, this modification will increase about 2% top-1 acc (from 77.4%). Since the default setup is a small number instead of zero. So, the reproduced version of MAE in solo-learn could achieved 79.6% top-1 acc in ImageNet for 100ep pertaining (100ep fine-tuning) currently.


The wandb link (the other run is deleted) are provided for pretraining and fine-tuning, respectively.

However, the official code released accuracy is 82.1%, still have some hyparameters need to be tuned.
Any suggestion will be appreciated!!

@vturrisi
Copy link
Owner

Glad to hear that. I still haven't found the time to look into it.

One experiment that might be very interesting (resources permit) is see how much a model pretrained with the official code differs from a model pretrained on solo. I would advise to pretrain a model with the official and then run our finetune, in that way, we can know of the problem is the pretraining or the fine-tuning.

@DonkeyShot21
Copy link
Collaborator

DonkeyShot21 commented Dec 27, 2022

Another thing to consider is that MAE is very sensitive to the jpeg decoding library that you use. So for instance if you are using pillow simd you can expect a ~1% loss in accuracy wrt normal pillow.

@HuangChiEn
Copy link
Author

HuangChiEn commented Dec 28, 2022

Glad to hear that. I still haven't found the time to look into it.

One experiment that might be very interesting (resources permit) is see how much a model pretrained with the official code differs from a model pretrained on solo. I would advise to pretrain a model with the official and then run our finetune, in that way, we can know of the problem is the pretraining or the fine-tuning.


TL; DR (option to read)
Thanks for your reply, these several weeks our team also provide some interesting testing for the official code.

One of our member re-run the 100ep pertaining based on the official MAE 1600ep-config on the official code, and we only got 80.62%. At the same time, he also found a paper report 100ep pertaining accuracy 81.2%. So, we can believe that the report of 82.1% top-1 acc must have some params tuning on it.

On the other hand, the accuracy between 80.62% and 81.2% could be counted as the effect of random seed setup. (variance : 0.6%).
So, we consider the 80.62% could be the acceptable reproduced results. while the 1% accuracy worse maybe the last issue we need to think about.


Thanks for your helping, if you have free time for survey this issue we still wait for that and remaining this issue.
If we figure out some part, we'll also provide some information of imagenet 100ep config and close this issue ~

@HuangChiEn
Copy link
Author

HuangChiEn commented Dec 28, 2022

pillow simd
@DonkeyShot21
Thanks for participate this issue, we're also thanks for that !!


For speedup the data augmentation, we applied the dali, which indeed decode jpeg with some special function.

So, do you suggest that we can disable the dali and apply the image_folder setup for pretrain and fine-tune stage to get the 1% acc increasing ?
( this also make sense, since the official code did not use dali to speedup the image loading..

@zeyuyun1
Copy link

zeyuyun1 commented Jan 2, 2023

@HuangChiEn Hi, I wonder if you also tried to benchmark MAE on cifar10. I ran the training script and using the config file in the repo and got 83% top1 evaluation accuracy. Does this look right?

@HuangChiEn
Copy link
Author

HuangChiEn commented Jan 3, 2023

@HuangChiEn Hi, I wonder if you also tried to benchmark MAE on cifar10. I ran the training script and using the config file in the repo and got 83% top1 evaluation accuracy. Does this look right?

I'm sorry, but we don't have any experience with pretrain/finetune MAE on cifar10.

However, I think it's a bit lower than expected if you have pretrain & fine-tune it. Also in a practical view, the small-scale supervised trained model can easily surpass this accuracy, so it may not be the target of SSL research.

Besides, I believe the contributor of solo-learn has already provided a well-tuned config for cifar10/cifar100 in here.
Did you run the exp according to this configuration?
If you have your own, could you also present it in wandb ?

@zeyuyun1
Copy link

zeyuyun1 commented Jan 3, 2023

Yes. This is config I used to pretrain MAE. This is the run in wandb: https://wandb.ai/chobitstian/solo-learn. It's the first one with name "mae-cifar10", please ignore all the other runs.

@HuangChiEn
Copy link
Author

Yes. This is config I used to pretrain MAE. This is the run in wandb: https://wandb.ai/chobitstian/solo-learn. It's the first one with name "mae-cifar10", please ignore all the other runs.

I have quickly scane your pretraining config, may i suggest that may be you can follow the configuration given by solo-learn, which i believe is well-tested.

For example, the warmup-epoch is mismatched, while you setup for 10, but the config is 40.

Also, note that this version solo-learn config is more straightforward..


Since seldom benchmarks directly runs for cifar-10 pretrain and then fintune, i also can not judge this accuracy. However, i believe it should be nearby MoCoV3 performance (93.10/99.80). At least, DeepCluster V2 is the lower bound (88.85/99.58).

@zeyuyun1
Copy link

zeyuyun1 commented Jan 5, 2023

Uh. That's interesting. The current config (the one you mentioned before) didn't mention warmup-epoch.

I think it might also cause by the effect of DDP? When you simply increase number of GPUs, the effect batch size become larger, but I don't think the current code is adjusting the learning rate for that. So I think I will try single gpu and the old config file you suggested. I will let you know the result.

Thanks for the help!

@HuangChiEn
Copy link
Author

HuangChiEn commented Jan 5, 2023

Good morning, thanks to DonkeyShot21 suggestion, we're glad to find the suitable configuration for MAE both in pretaining and fine-tuning. The finally accuracy could achieve 81.6% top-1 acc, 95.5% top-5 acc (yes, solo-learn can works slightly better then official code with same config).

The following wandb link provides the detail configuration of pretrain on MAE with 100ep in ImageNet dataset :
pretraining
finetuneing

@DonkeyShot21 @vturrisi If you'll don't mind, this configuration can also be applied in solo-learn and record the accuracy for ImageNet for the issue #153.


While I also provide the configuration with easy_configer format :

pretraining 100 ep on ImageNet : MAE

# Note : follow EasyCV cfg run 100ep, since 400ep cfg also follow 1600ep with modifiying epoch.
# 1600ep : https://github.com/alibaba/EasyCV/blob/master/configs/selfsup/mae/mae_vit_base_patch16_8xb64_1600e.py
seed = 42@int

[data_cfg]
    dataset = imagenet@str
    # Note : .h5 dataset may process faster..
    train_data_path = {'path':'/data/imgnet/train'}@Path   # do not forgot to regist the Path class of cfger
    val_data_path = {'path':'/data/imgnet/val'}@Path
    data_fraction = -1.0@float
    data_format = image_folder@str
    num_workers = 8@int

[model_cfg]
    method = mae@str
    backbone = vit_base@str
    decoder_embed_dim = 512@int
    decoder_depth = 8@int
    decoder_num_heads = 16@int
    mask_ratio = 0.75@float

[train_cfg] 
    batch_size = 512@int    # 512b effictive batch_size upto 4096 (original paper)
    max_epochs = 100@int
    precision = 16@int

[optmz_cfg]
    optimizer = adamw@str
    adamw_beta1 = 0.9@float
    adamw_beta2 = 0.95@float
    lr = 1.5e-4@float            
    classifier_lr = 1.5e-4@float
    weight_decay = 0.05@float
    scheduler = warmup_cosine@str
    warmup_epochs = 40@int
    warmup_start_lr = 0.00000@float

[gpu_cfg]
    devices = 0, 1@str
    accelerator = gpu@str
    strategy = ddp@str
    accumulate_grad_batches = 4@int
    dali_device = gpu@str

[wandb_cfg]
    name = wodali-mae-vitb-pt100ep-baseline@str
    entity = josef@str
    project = MixSim@str

[trfs_cfg]
    num_crops_per_aug = [1]@list
    brightness = [0]@list
    contrast = [0]@list    
    saturation = [0]@list
    hue = [0]@list
    gray_scale_prob = [0]@list
    gaussian_prob = [0]@list      
    solarization_prob = [0]@list   
    min_scale = [0.2]@list
    
[store_true] 
    wandb = True@bool
    sync_batchnorm = True@bool
    save_checkpoint = True@bool
    norm_pix_loss = True@bool

    # other flag for store_true
    debug_augmentations = True@bool   # transformation debug..
    no_labels = False@bool             # for custom data only..
    auto_resume = False@bool
    auto_umap = False@bool

fine-tuning 100ep

seed = 42@int

[data_cfg]
    dataset = imagenet@str
    train_data_path = {'path':'/data/imgnet/train'}@Path   # do not forgot to regist the Path class of cfger
    val_data_path = {'path':'/data/imgnet/val'}@Path
    data_fraction = -1.0@float
    data_format = image_folder@str   
    num_workers = 12@int

[model_cfg]
    method = mae@str
    backbone = vit_base@str
    
[train_cfg]
    batch_size = 256@int   # effictive batch_size : 1024
    max_epochs = 100@int
    precision = 16@int

[finetune_method]
    pretrain_method = vit@str
    # fixup-cfg : 0.75 
    layer_decay = 0.65@float     
    label_smoothing = 0.1@float
    mixup = 0.8@float
    cutmix = 1.0@float
    drop_path = 0.1@float

[optmz_cfg]
    # 5e-4 cfg for torch-implement : https://github.com/facebookresearch/mae/blob/main/FINETUNE.md
    lr = 5e-4@float               
    weight_decay = 0.05@float
    optimizer = adamw@str
    adamw_beta1 = 0.9@float
    adamw_beta2 = 0.999@float
    scheduler = warmup_cosine@str
    warmup_epochs = 5@int
    warmup_start_lr = 0.00000@float

[gpu_cfg]
    devices = 0, 1, 4, 5@str
    accelerator = gpu@str
    strategy = ddp@str
    dali_device = gpu@str

[wandb_cfg]
    name = test_mae-vitb-ft100ep-baseline-pt100ep@str
    entity = josef@str
    project = MixSim@str

[ckpt_cfg]
    pretrained_feature_extractor = /workspace/scripts/trained_models/mae/2vafj22o/wodali-mae-vitb-pt100ep-baseline-2vafj22o-ep=99.ckpt@str
    checkpoint_dir = {'path':'/workspace/scripts/trained_models'}@Path  # you can customized ckpt path
    checkpoint_frequency = 10@int  # (how many epoch)

[store_true]
    finetune = True@bool
    wandb = True@bool
    save_checkpoint = True@bool
    auto_resume = False@bool
    sync_batchnorm = True@bool

@zeyuyun1
I'll close this issue in 2 days, you can open a new issue for your question, if you still find the accuracy not match on the cifar10 dataset.

@zeyuyun1
Copy link

zeyuyun1 commented Jan 5, 2023

Sorry I am little confused about your comment. You said "The finally accuracy could achieve 81.6% top-1 acc, 95.5% top-5 acc," is this for cifar10? 81.6% top-1 acc is pretty low right?

@HuangChiEn
Copy link
Author

Oh.. www ~ (81.6% top-1 acc, 95.5% top-5 acc) it's ImageNet accuracy actually..

yeah, i think cifar10 should be higher then ~

@vturrisi
Copy link
Owner

vturrisi commented Jan 5, 2023

@HuangChiEn thanks for providing that! So the issue was fixed by using default image folder instead of Dali? I'll convert your config to our configuration format and open a PR. Can you also provide the pretrained and finetuned checkpoints?

@HuangChiEn
Copy link
Author

@HuangChiEn thanks for providing that! So the issue was fixed by using default image folder instead of Dali? I'll convert your config to our configuration format and open a PR. Can you also provide the pretrained and finetuned checkpoints?

Yes, i think we only modify the following part :

  1. we disable the dali with setting image_folder mode in both pertaining and fine-tuning stage. (personally thinking, wo_dali is more useful for pertaining mode, finetuning may only decrease acc slightly..)

  2. we fixed the warmup_start_lr with exactly zero (instead of 3e-5) in fine-tuning stage. Besides, we also set warmup_start_lr with zero in pertaining stage to align with MAE official code.

However, i think the default setting (3e-5) for pretraining stage may also increase the acc. (I have did that exp and see the slightly increasing acc in pretraining, but i interrupted in few epoch without running the whole 100ep)

About the checkpoint, we can provide it. But it may take a while to prepare.
Then I think I'll paste the google drive link in here, then close the issue ~

@vturrisi
Copy link
Owner

vturrisi commented Jan 5, 2023

Sure. I'll update the config files. Thanks for the help. Let me know when you have the checkpoints and I'll add the results/checkpoint to the readme.

@HuangChiEn
Copy link
Author

HuangChiEn commented Jan 7, 2023

Sure. I'll update the config files. Thanks for the help. Let me know when you have the checkpoints and I'll add the results/checkpoint to the readme.

Morning, the resulting checkpoint could be found in the following google drive link:
🔗 pretraining
🔗 fine-tuning

let this issue open 2 days, if you encounter any issue about downloading or can not find the ckpt,... ,etc. plz tag me ~

@vturrisi
Copy link
Owner

vturrisi commented Jan 7, 2023

@HuangChiEn Added the checkpoints to our zoo and added the results in #321.

@HuangChiEn
Copy link
Author

@HuangChiEn Added the checkpoints to our zoo and added the results in #321.

everything looks good ~
issue closed..

@vturrisi
Copy link
Owner

vturrisi commented Jan 8, 2023

@HuangChiEn Thanks again for debugging it for us and providing the checkpoints/results :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants