-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some error occured while running the torchrun.py #19
Comments
I think this is because your pytorch version is not v1.2. I came up with this error too with pytorch v1.6. |
Yes. My pytorch version is v1.7. I wonder if there is a way to make the code suitable for higher pytorch version? |
I have not solved this problem yet. A barely satisfactory way is to manually pad the convolution results. |
i got the same problem Line 97 in 9b0a5dc
different version of torch will get different size of feather map from the same input but another reason maybe the code Line 27 in 9b0a5dc
the feather map size come out with 255x255 instead of 256x256 for the repo it's fine to install torch1.20 and torchvison 0.4.0 without any problem for higher version of torch ,changing the conv2d settings does work |
Thank you so much for your help!! |
Hello, I met the same problem, how did you modify the Conv2D Settings? |
@HeBangYan, @seven-sent, @super3kl Hi, after a lot of hours and tears I have found the correct Setting, with this setting the entire architecture run on Pytorch 1.11 march 2022. class InitPRN2(nn.Module):
def __init__(self):
super(InitPRN2, self).__init__()
self.feature_size = 16
feature_size = self.feature_size
self.layer0 = Conv2d_BN_AC(in_channels=3, out_channels=feature_size, kernel_size=4, stride=1,
padding='same') # 256 x 256 x 16. run with {kernel_size:3, padding:1} or {4, 'same'}
self.encoder = nn.Sequential(
PRNResBlock(in_channels=feature_size, out_channels=feature_size * 2, kernel_size=4, stride=2, with_conv_shortcut=True), # 128 x 128 x 32
PRNResBlock(in_channels=feature_size * 2, out_channels=feature_size * 2, kernel_size=4, stride=1, with_conv_shortcut=False), # 128 x 128 x 32
PRNResBlock(in_channels=feature_size * 2, out_channels=feature_size * 4, kernel_size=4, stride=2, with_conv_shortcut=True), # 64 x 64 x 64
PRNResBlock(in_channels=feature_size * 4, out_channels=feature_size * 4, kernel_size=4, stride=1, with_conv_shortcut=False), # 64 x 64 x 64
PRNResBlock(in_channels=feature_size * 4, out_channels=feature_size * 8, kernel_size=4, stride=2, with_conv_shortcut=True), # 32 x 32 x 128
PRNResBlock(in_channels=feature_size * 8, out_channels=feature_size * 8, kernel_size=4, stride=1, with_conv_shortcut=False), # 32 x 32 x 128
PRNResBlock(in_channels=feature_size * 8, out_channels=feature_size * 16, kernel_size=4, stride=2, with_conv_shortcut=True), # 16 x 16 x 256
PRNResBlock(in_channels=feature_size * 16, out_channels=feature_size * 16, kernel_size=4, stride=1, with_conv_shortcut=False), # 16 x 16 x 256
PRNResBlock(in_channels=feature_size * 16, out_channels=feature_size * 32, kernel_size=4, stride=2, with_conv_shortcut=True), # 8 x 8 x 512
PRNResBlock(in_channels=feature_size * 32, out_channels=feature_size * 32, kernel_size=4, stride=1, with_conv_shortcut=False), # 8 x 8 x 512
)
self.decoder = nn.Sequential(
ConvTranspose2d_BN_AC(in_channels=feature_size * 32, out_channels=feature_size * 32, kernel_size=4, stride=1), # 8 x 8 x 512
ConvTranspose2d_BN_AC(in_channels=feature_size * 32, out_channels=feature_size * 16, kernel_size=4, stride=2), # 16 x 16 x 256
ConvTranspose2d_BN_AC(in_channels=feature_size * 16, out_channels=feature_size * 16, kernel_size=4, stride=1), # 16 x 16 x 256
ConvTranspose2d_BN_AC(in_channels=feature_size * 16, out_channels=feature_size * 16, kernel_size=4, stride=1), # 16 x 16 x 256
ConvTranspose2d_BN_AC(in_channels=feature_size * 16, out_channels=feature_size * 8, kernel_size=4, stride=2), # 32 x 32 x 128
ConvTranspose2d_BN_AC(in_channels=feature_size * 8, out_channels=feature_size * 8, kernel_size=4, stride=1), # 32 x 32 x 128
ConvTranspose2d_BN_AC(in_channels=feature_size * 8, out_channels=feature_size * 8, kernel_size=4, stride=1), # 32 x 32 x 128
ConvTranspose2d_BN_AC(in_channels=feature_size * 8, out_channels=feature_size * 4, kernel_size=4, stride=2), # 64 x 64 x 64
ConvTranspose2d_BN_AC(in_channels=feature_size * 4, out_channels=feature_size * 4, kernel_size=4, stride=1), # 64 x 64 x 64
ConvTranspose2d_BN_AC(in_channels=feature_size * 4, out_channels=feature_size * 4, kernel_size=4, stride=1), # 64 x 64 x 64
ConvTranspose2d_BN_AC(in_channels=feature_size * 4, out_channels=feature_size * 2, kernel_size=4, stride=2),
ConvTranspose2d_BN_AC(in_channels=feature_size * 2, out_channels=feature_size * 2, kernel_size=4, stride=1),
ConvTranspose2d_BN_AC(in_channels=feature_size * 2, out_channels=feature_size * 1, kernel_size=4, stride=2),
ConvTranspose2d_BN_AC(in_channels=feature_size * 1, out_channels=feature_size * 1, kernel_size=4, stride=1),
ConvTranspose2d_BN_AC(in_channels=feature_size * 1, out_channels=3, kernel_size=4, stride=1),
ConvTranspose2d_BN_AC(in_channels=3, out_channels=3, kernel_size=4, stride=1),
ConvTranspose2d_BN_AC(in_channels=3, out_channels=3, kernel_size=4, stride=1, activation=nn.Tanh())
)
self.loss = InitLoss() Last thing you have to modify is in torchmodule.py, this is the new PRNResBlock class: class PRNResBlock(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, with_conv_shortcut=False,
activation=nn.ReLU(inplace=True)):
super(PRNResBlock, self).__init__()
if kernel_size % 2 == 1:
self.pipe = nn.Sequential(
Conv2d_BN_AC(in_channels=in_channels, out_channels=out_channels // 2, stride=1, kernel_size=1),
Conv2d_BN_AC(in_channels=out_channels // 2, out_channels=out_channels // 2, stride=stride,
kernel_size=kernel_size, padding=(kernel_size - 1) // 2),
nn.Conv2d(in_channels=out_channels // 2, out_channels=out_channels, stride=1, kernel_size=1,
bias=False))
else: # even kernel
if stride == 1:
self.pipe = nn.Sequential(
Conv2d_BN_AC(in_channels=in_channels, out_channels=out_channels // 2, stride=1, kernel_size=1),
Conv2d_BN_AC(in_channels=out_channels // 2, out_channels=out_channels // 2, stride=stride,
kernel_size=kernel_size, padding=kernel_size - 1, padding_mode='circular'),
nn.Conv2d(in_channels=out_channels // 2, out_channels=out_channels, stride=1,
kernel_size=kernel_size, bias=False))
elif stride == 2:
self.pipe = nn.Sequential(
Conv2d_BN_AC(in_channels=in_channels, out_channels=out_channels // 2, stride=1, kernel_size=1),
Conv2d_BN_AC(in_channels=out_channels // 2, out_channels=out_channels // 2, stride=stride,
kernel_size=kernel_size, padding=kernel_size - 1, padding_mode='circular'),
nn.Conv2d(in_channels=out_channels // 2, out_channels=out_channels, stride=1,
kernel_size=kernel_size - 1, bias=False))
else:
print('Stride dimension are wrong:', stride)
self.shortcut = nn.Sequential()
if with_conv_shortcut:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels=in_channels, out_channels=out_channels, stride=stride, kernel_size=1, bias=False),
)
self.BN_AC = nn.Sequential(
nn.BatchNorm2d(out_channels, eps=0.001, momentum=0.5),
activation
) As you can see, in case of even kernel, if you want to maintain the correct size of output after the layer you need to use different kernel_size based on stride in nn.Conv2d. You have to retrain the CNN and you can not use the saved model given by @reshow . |
@maltempoLuca Thank you for sharing the workaround. I was able to train the model from scratch using your modifications, but the model I got has terrible output. Do you have your retrained model available to share? I'd like to try yours and see how it is. |
Thank you for sharing your project.
But I have some trouble in running the training code.
The output shape:
s.shape: torch.Size([15, 32, 128, 128])
out.shape: torch.Size([15, 32, 129, 129])
I did not change any code.
How can I solve it?
The text was updated successfully, but these errors were encountered: