-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when doing inference using augmentation #10
Comments
Hi @ramdhan1989 , Sorry for replying late. You can modify the codeline in Line121, yolo.py from this:
to this:
The error is raised because in DRENet, we also return the degraded reconstruction image in def forward_once(). ============ It seems that you want to leverage the multi-scale inference by setting augment=True. However, I'm afraid that current C3ResAtnMHSA structure may not support different input size (because of the fixed-size positional encoding). Thus, if you want to use multi-scale inference, you may either consider to replace the C3ResAtnMHSA, or change the current C3ResAtnMHSA structure. For the structure change, maybe you can modify the current fix-size positional encoding into a adaptive one (maybe by bilinear interpolation?). You can have a try. |
noted, thank you |
Hi @WindVChen , I am interested to modify the code to accommodate different image size. In my opinion, it would be beneficial to improve performance by applying inference using augmentation and also doing inference using original image size to capture larger objects in addition to inference on sliced images. would you mind guiding me how can I start to do modification? do I need to change only the part below?
Thanks Ramdhan |
PROBLEMActually you only need to change the following part:
More specifically, we can find from the previous issues, that the errors (due to input resolutions) are usually come from this part:
And it is because that self.rel_h and self.rel_w is of fixed size by the settings in DRE.yaml
SOLUTIONSince we find the problem above, a straightforward solution is to make the following line resolution-adaptive:
My opinion is to add a codeline that Since I am not sure whether this solution (somewhat brute) can make good results, I will be very glad that you can share the experimental results with me whether it is effective. |
Hi, I have printed the vector size for every step in BottleneckResAtnMHSA and C3ResAtnMHSA class inside common.py and got the summary below : <style> </style>
|
You can consider interpolations on the output of the following codeline:
From the Table you provided, after add up rel_h and rel_w, the result's size should be 192x16x16. Then, to achieve flexibility, you can interpolate it to match the size of content_content. For example, for (1024, 1024) content_content, you should interpolate 192x16x16 to 192x32x32. |
It is still not working, the error occured after several looping process in forward procedure of BottleneckResAtnMHSA. here I tried to add print command.
the output as follow :
the error is RuntimeError: The size of tensor a (1024) must match the size of tensor b (256) at non-singleton dimension 1 Following your suggestion, I interpolated the result self.rel_h + self.rel_w but still produce an error since the first loop. |
Hi, I did a modification and it is successful to run with different size of image.
However, when I set augment=True, I got the error below:
do you have idea to solve this? |
Modifications:
And the error seems the same as here? |
Noted. it is working now. Based on my experiment, using TTA for inference, it didn't improve the performance in my case. based on this link you mentioned possibility to train the model using different size of image. I think there is a potential error coming from dataloader of different size of images, isn't it? |
You're right. The saying "train the model using different size of image" may just mean that we don't need to recalculate the parameters in DRENet.yaml for input images with sizes other than 512, if the program can adapt to different input sizes. Actually, I'm curious about whether such an adaptive operation will hurt the performance? For example, how is the result on |
Hi, I got error when doing inference using augment=True. the error is shown as follow. please advise
Thanks
The text was updated successfully, but these errors were encountered: