The inconsistent inference results #34

kasteric · 2024-09-11T15:11:51Z

Hi, I found that for the same checkpoints of IML-ViT, the inference results on CASIA v1 inferenced through this IMDB-IML-ViT framework is much lower (~12%) that computed within of the original code base framework IML-ViT (~70%).

SunnyHaze · 2024-09-13T11:40:47Z

Thanks for your attention to our project! And sorry for the delay.

Sorry for the misleading results. Could you please attach the corresponding log for each experiment for analysis and locate the issue for us?

kasteric · 2024-09-14T03:47:41Z

I located the issue, the data augmentations are inconsistent. For trained checkpoints, you have used resize with padding, for testing evaluation code in IMDLBenCo, you have used resize without padding, thus the data distribution is not aligned. On my custom dataset, I found that resize without padding works better than resize with padding. Did you observe similar results?

SunnyHaze · 2024-09-14T05:24:59Z

Hi,
Anyway, if you utilize the demo_test_iml_vit.sh generated from the command benco init model_zoo, I believe it will be resized with padding with the parameter if_padding.

IMDLBenCo/IMDLBenCo/statics/model_zoo/runs/demo_test_iml_vit.sh

Lines 1 to 20 in e4f59d0

    
           base_dir="./eval_dir" 
        
           mkdir -p ${base_dir} 
        
           CUDA_VISIBLE_DEVICES=1,2,3 \ 
        
           torchrun  \ 
        
               --standalone    \ 
        
               --nnodes=1     \ 
        
               --nproc_per_node=3 \ 
        
           ./test.py \ 
        
               --model IML_ViT \ 
        
               --edge_mask_width 7 \ 
        
               --world_size 1 \ 
        
               --test_data_json "./test_datasets.json" \ 
        
               --checkpoint_path "/mnt/data0/xiaochen/workspace/IMDLBench_dev/output_dir" \ 
        
               --test_batch_size 1 \ 
        
               --image_size 1024 \ 
        
               --if_padding \ 
        
               --output_dir ${base_dir}/ \ 
        
               --log_dir ${base_dir}/ \ 
        
           2> ${base_dir}/error.log 1>${base_dir}/logs.log

I am not sure how you train IML-ViT with resize without padding. Since the original code design only supports 1024x1024 input. Mostly we don't apply a traditional resize but just keep the raw resolution and pad the image to 1024x1024. Could you please specify the detailed implementation here for discussion? Thank you very much.

kasteric · 2024-09-16T05:54:36Z

Oh, I was not using demo_test_iml_vit.sh for testing, but just use demo_train_iml_vit.sh for evaluation, where I put the evaluation code before the training code in the train.py. In the generated demo_train_iml_vit.sh, I believe the configs are like:

base_dir="./output_dir_imlvit_orig"
mkdir -p ${base_dir}

CUDA_VISIBLE_DEVICES=1 \
torchrun  \
    --standalone    \
    --nnodes=1     \
    --nproc_per_node=1 \
../train.py \
    --model IML_ViT \
    --edge_lambda 20 \
    --vit_pretrain_path ../mae_pretrain_vit_base.pth \
    --world_size 1 \
    --batch_size 3 \
    --data_path  /<casia_v2> \
    --epochs 200 \
    --lr 1e-4 \
    --image_size 1024 \
    --if_resizing \
    --min_lr 5e-7 \
    --weight_decay 0.05 \
    --edge_mask_width 7 \
    --test_data_path /<casia_v1> \
    --warmup_epochs 2 \
    --output_dir ${base_dir}/ \
    --log_dir ${base_dir}/ \
    --accum_iter 8 \
    --seed 42 \
    --test_period 4 \
    --resume /<resumed.pth>

where if_resizing is set to True, and the data_transform would be resize without padding, like in the code below:

self.post_transform = None
        if is_padding == True:
            self.post_transform = get_albu_transforms(type_ = "pad", output_size = output_size)
        if is_resizing == True:
            self.post_transform = get_albu_transforms(type_ = "resize", output_size = output_size)

After I manually set the ablu_transform type to "pad", the results are consistent. I made the conclusion that it was not because
"pad" is better than "resize", but your checkpoints were trained based on "pad" mode.

On my custom dataset, however, I found "resize" mode yields better results by 1 or 2 percents.

SunnyHaze · 2024-09-20T07:26:17Z

Thank you for your feedback.

I see your points. Generally, the deep neural network fits a distribution as a function. Thus, keeping the training distribution similar to the testing distribution is essential. Just like the issue mentioned by you "I made the conclusion that it was not because
'pad' is better than 'resize', but your checkpoints were trained based on 'pad' mode."

Further, there are many possible explanations for the performance on your custom dataset. Such as:

The aspect ratio of your image is quite appropriate to 1:1. i.e. resizing operation won't twist the image.
The resolution of your image is relatively larger than CASIAv2.
Other issues may need to be discussed in a case study on those datasets.

Thanks again for your attention to our project. If you find the issue solved, please close the issue. You are also welcome to discuss further concerns and problems you met.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The inconsistent inference results #34

The inconsistent inference results #34

kasteric commented Sep 11, 2024

SunnyHaze commented Sep 13, 2024 •

edited

Loading

kasteric commented Sep 14, 2024

SunnyHaze commented Sep 14, 2024

kasteric commented Sep 16, 2024 •

edited

Loading

SunnyHaze commented Sep 20, 2024

The inconsistent inference results #34

The inconsistent inference results #34

Comments

kasteric commented Sep 11, 2024

SunnyHaze commented Sep 13, 2024 • edited Loading

kasteric commented Sep 14, 2024

SunnyHaze commented Sep 14, 2024

kasteric commented Sep 16, 2024 • edited Loading

SunnyHaze commented Sep 20, 2024

SunnyHaze commented Sep 13, 2024 •

edited

Loading

kasteric commented Sep 16, 2024 •

edited

Loading