You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi author, your code is great, but when I introduce your SMT module for training I always have a case of not enough memory in the attn = (q @ k.transpose(-2, -1)) * self.scale statement when calculating the Attention, and it's not enough for me to set the Batchsize to 1. Can the author give some ideas how to modify it, please. I'm only using stage3's structure
The text was updated successfully, but these errors were encountered:
Hi, here are two simple ways u can try:
(1) reduce the number of channel (eg. 256->128)
(2) reduce the number of block (eg. 12->6)
Also, you need to confirm whether the resolution at stage 3 is too high for your own task.
Thank you for your answer, it is indeed a resolution problem, my input image resolution is 128128, so that when calculating the attention N=HW=16384, this is too big, I would like to ask the author why your attention calculation has to transform the input x's shape from (B,C,H,W) to (B,N,C)? This takes up so much memory to calculate the attention.
Hi author, your code is great, but when I introduce your SMT module for training I always have a case of not enough memory in the attn = (q @ k.transpose(-2, -1)) * self.scale statement when calculating the Attention, and it's not enough for me to set the Batchsize to 1. Can the author give some ideas how to modify it, please. I'm only using stage3's structure
The text was updated successfully, but these errors were encountered: