Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The inconsistent inference results #34

Open
kasteric opened this issue Sep 11, 2024 · 5 comments
Open

The inconsistent inference results #34

kasteric opened this issue Sep 11, 2024 · 5 comments

Comments

@kasteric
Copy link

Hi, I found that for the same checkpoints of IML-ViT, the inference results on CASIA v1 inferenced through this IMDB-IML-ViT framework is much lower (~12%) that computed within of the original code base framework IML-ViT (~70%).

@SunnyHaze
Copy link
Contributor

SunnyHaze commented Sep 13, 2024

Thanks for your attention to our project! And sorry for the delay.

Sorry for the misleading results. Could you please attach the corresponding log for each experiment for analysis and locate the issue for us?

@kasteric
Copy link
Author

I located the issue, the data augmentations are inconsistent. For trained checkpoints, you have used resize with padding, for testing evaluation code in IMDLBenCo, you have used resize without padding, thus the data distribution is not aligned. On my custom dataset, I found that resize without padding works better than resize with padding. Did you observe similar results?

@SunnyHaze
Copy link
Contributor

Hi,
Anyway, if you utilize the demo_test_iml_vit.sh generated from the command benco init model_zoo, I believe it will be resized with padding with the parameter if_padding.

base_dir="./eval_dir"
mkdir -p ${base_dir}
CUDA_VISIBLE_DEVICES=1,2,3 \
torchrun \
--standalone \
--nnodes=1 \
--nproc_per_node=3 \
./test.py \
--model IML_ViT \
--edge_mask_width 7 \
--world_size 1 \
--test_data_json "./test_datasets.json" \
--checkpoint_path "/mnt/data0/xiaochen/workspace/IMDLBench_dev/output_dir" \
--test_batch_size 1 \
--image_size 1024 \
--if_padding \
--output_dir ${base_dir}/ \
--log_dir ${base_dir}/ \
2> ${base_dir}/error.log 1>${base_dir}/logs.log

I am not sure how you train IML-ViT with resize without padding. Since the original code design only supports 1024x1024 input. Mostly we don't apply a traditional resize but just keep the raw resolution and pad the image to 1024x1024. Could you please specify the detailed implementation here for discussion? Thank you very much.

@kasteric
Copy link
Author

kasteric commented Sep 16, 2024

Oh, I was not using demo_test_iml_vit.sh for testing, but just use demo_train_iml_vit.sh for evaluation, where I put the evaluation code before the training code in the train.py. In the generated demo_train_iml_vit.sh, I believe the configs are like:

base_dir="./output_dir_imlvit_orig"
mkdir -p ${base_dir}

CUDA_VISIBLE_DEVICES=1 \
torchrun  \
    --standalone    \
    --nnodes=1     \
    --nproc_per_node=1 \
../train.py \
    --model IML_ViT \
    --edge_lambda 20 \
    --vit_pretrain_path ../mae_pretrain_vit_base.pth \
    --world_size 1 \
    --batch_size 3 \
    --data_path  /<casia_v2> \
    --epochs 200 \
    --lr 1e-4 \
    --image_size 1024 \
    --if_resizing \
    --min_lr 5e-7 \
    --weight_decay 0.05 \
    --edge_mask_width 7 \
    --test_data_path /<casia_v1> \
    --warmup_epochs 2 \
    --output_dir ${base_dir}/ \
    --log_dir ${base_dir}/ \
    --accum_iter 8 \
    --seed 42 \
    --test_period 4 \
    --resume /<resumed.pth>

where if_resizing is set to True, and the data_transform would be resize without padding, like in the code below:

self.post_transform = None
        if is_padding == True:
            self.post_transform = get_albu_transforms(type_ = "pad", output_size = output_size)
        if is_resizing == True:
            self.post_transform = get_albu_transforms(type_ = "resize", output_size = output_size)

After I manually set the ablu_transform type to "pad", the results are consistent. I made the conclusion that it was not because
"pad" is better than "resize", but your checkpoints were trained based on "pad" mode.

On my custom dataset, however, I found "resize" mode yields better results by 1 or 2 percents.

@SunnyHaze
Copy link
Contributor

Thank you for your feedback.

I see your points. Generally, the deep neural network fits a distribution as a function. Thus, keeping the training distribution similar to the testing distribution is essential. Just like the issue mentioned by you "I made the conclusion that it was not because
'pad' is better than 'resize', but your checkpoints were trained based on 'pad' mode."

Further, there are many possible explanations for the performance on your custom dataset. Such as:

  1. The aspect ratio of your image is quite appropriate to 1:1. i.e. resizing operation won't twist the image.
  2. The resolution of your image is relatively larger than CASIAv2.
  3. Other issues may need to be discussed in a case study on those datasets.

Thanks again for your attention to our project. If you find the issue solved, please close the issue. You are also welcome to discuss further concerns and problems you met.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants