-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about training #8
Comments
Hi @cl886699 , I am afraid I cannot help if I don't see the code. You should be getting MCC ~ 98% after 60 epochs, as per the manuscript on the LEVIRCD dataset, as in the paper: Assuming you haven't changed anything in the models from this repository, please post here your training code, and I'll take a look. Cheers, |
i did not use this code , i am not familiar with mxnet , so i used tensorflow write this model and trarining code from scratch. |
I can take a look at the code if you want to post it here, if there is any obvious error that I can spot quickly, it may help. |
it's nice if you could help, i upload to github |
Hi @cl886699 your code is really nice - my compliments! I did a thourgh walk through your code, unfortunately I don't see anything obvious. High risk areas of error can be: the way you calculate the loss, but it looks fine, the channel transition is perhaps the higher probable error (mxnet: channels axis = 1, TF channels axis = 3), which for the most part I've seen you've done correctly (don't know if anything was missed there). Normalization, which should be everywhere group normalization (check if you missed batch norm somewhere). The data: are you sure all data are OK? Maybe there is a bug in the train/validation set? For the record I've trained this model also on 2 x RTX 3090 with really nice results, although training took a week. You shoud be seeing nice results after epoch 50. I used learning rate 1.e-3 in the ceecnet paper, but when I trained on 2 gpus (batch size much smaller) I used 1.e-4, beucase with 1.e-3 training was unstable. Can you please post some training plots and examples of inference, both on the overfitted datum as well as more general data from LEVIRCD ? Some other comments that may prove helpful. Padding (see here), in terms of TF terminology, is everywhere "SAME", meaning input shape == output shape. This is implemented manually in mxnet. I nowhere did anything different than SAME though. Perhaps the trickiest one may be the PSP Pooling for which I've done a hack so as to be hybridizable in mxnet (static graph). When I wrote this repo, there was a problem in having the PSP Pooling operator hybridized, but you can translate this version from mxnet 2.0, which is much easier to understand and much easier to implement. Check if here are any errors. import mxnet as mx
from mxnet import gluon
from mxnet.gluon import HybridBlock
from mxprosthesis.nn.layers.conv2Dnormed import *
class PSP_Pooling(gluon.HybridBlock):
def __init__(self, nfilters, depth=4, norm_type = 'BatchNorm', norm_groups=None, mob=False, **kwards):
gluon.HybridBlock.__init__(self,**kwards)
self.depth = depth
self.convs = gluon.nn.HybridSequential()
for _ in range(depth):
self.convs.add(Conv2DNormed(nfilters//self.depth,kernel_size=(1,1),padding=(0,0),norm_type=norm_type, norm_groups=norm_groups))
self.conv_norm_final = Conv2DNormed(channels = nfilters,
kernel_size=(1,1),
padding=(0,0),
norm_type=norm_type,
norm_groups=norm_groups)
def forward(self,input):
_, _, h, w = input.shape
p = [input]
for i in range(self.depth):
hnew = h // (2**i)
wnew = w // (2**i)
kernel = (hnew,wnew)
x = mx.npx.pooling(input,kernel=kernel, stride=kernel, pool_type='max')
x = self.convs[i](x)
#x = mx.nd.UpSampling(x.as_nd_ndarray(),sample_type='nearest',scale=hnew)
x = mx.contrib.ndarray.BilinearResize2D(x.as_nd_ndarray(),height=h,width=w)
x = x.as_np_ndarray()
p += [x]
out = mx.np.concatenate(p,axis=1)
out = self.conv_norm_final(out)
return out or this version from pytorch (channel axis = 1): import torch
from trchprosthesis.nn.layers.conv2Dnormed import *
#from typing import List
import math
class PSP_Pooling(torch.nn.Module):
def __init__(self, nfilters, depth=4, norm_type = 'BatchNorm', norm_groups=None):
super(PSP_Pooling,self).__init__()
self.depth = depth
convs = []
for _ in range(depth):
convs.append(Conv2DNormed(in_channels = nfilters, out_channels = nfilters//self.depth,kernel_size=(1,1),padding=(0,0),norm_type=norm_type, num_groups=norm_groups))
self.convs = torch.nn.ModuleList(convs)
self.conv_norm_final = Conv2DNormed(in_channels = nfilters*2,
out_channels = nfilters,
kernel_size=(1,1), # there is no point for 3x3 here.
padding=(0,0),
norm_type=norm_type,
num_groups=norm_groups)
def forward(self,input:torch.Tensor)->torch.Tensor:
_, _, h, w = input.shape
p = [input]
for i, conv in enumerate(self.convs):
scale = 2**i
hnew = math.ceil(h/scale)
wnew = math.ceil(w/scale)
kernel = (hnew,wnew)
# Do pooling
x = torch.nn.functional.max_pool2d(input, kernel_size=kernel, stride=kernel)
x = conv(x) # this fixes number of channels
# Now upscale to original size -- THIS IS A SLOW FUNCTION!!!
x = torch.nn.functional.interpolate(x, scale_factor=float(hnew), mode='nearest')
p += [x]
out = torch.cat(p,dim=1)
out = self.conv_norm_final(out)
return out Sorry I couldn't be of much help here :( |
Please post training plot (validation loss, train loss vs epoch). Does the problem appear from the beginning or there is a sharp decline in the loss at some point? edit: segmentation loss only |
Are these validation losses? I need to see training and validation segmentation loss to understand behavior. The distance loss looks bad, consinstent with the visual result (I guess) which indicates a bug somewhere on this layer. Check that you have correct scaling in [0,1] in the distance transform. The way I was creating this on the distance transform is that during chopping I am scaling it to [0,100] initially so as to be able to store it in uint8 (for storage compression reasons). Then scale back to [0,1] in the dataset class when translating each datum to float32. Also, the learning rate does not look constant, I used constant learning rate --> Train till no improvement. Reduce by a factor of 10 increase the ftdepth of the loss by 10 --> restart training (clear states for optimizer) --> wait till no improvement --> redo etc. However, your results indicate a bug somewhere, so I don't think this is the issue. You should be having a really nice performance without learning rate reduction. Thank you for posting these, looking forward to the new plots. |
maybe it's not trained enough, now i can see some shape of preditions |
the training is boken by accident, when i trained enogh , i'll post to tell you the results |
Happy to help till you resolve this. You should be able to see nice results on your 8 gpus (16GB each?) after about 24h, basically above epoch 60 to LEVIRCD, and around epoch 100 you get first convergence stage. I trained this model on 24x4 = 96 P100 GPUs for ~4 days, but initial convergence appears after about 14-16 hours. You should wait for at least 24h before visualizing. Again, thank you very much for your interest in our work. |
Hi @cl886699 I got an email github notification from comments here but I cannot see them anymore, I assume you deleted them because you resolved it? Based on the email - although image resolution is small, this looks like successful training: but you need to train for far longer to get competitive performance. I don't understand why you are getting all zeros as you say, please post again training / validation segmentation loss vs number of epochs to be able to compare. |
Hi, you mention:
which suggests you interrupted training, and then changed labels, and restarted? I've never tried it so I don't know if it affects the result, given you may be starting with a lower learning rate, like finetuning? With regards to the 1hot representation, both are good and can work. The 2nd is better for projects like field boundaries, from memory I used the first in this repository and definitely the 2nd for field boundary projects (e.g. https://www.mdpi.com/2072-4292/13/11/2197 - check also repo: https://github.com/waldnerf/decode ) edit: I just saw that for the 2nd case you subtract from 1 the distance of the object, but that is not what I meant above. For the Fields, we just calculated distance for each object, not for background (was set to zero everywhere). Better go with 1st approach which is more sensible. I am really intrigued about what you mention that training tends to zero from some point and on. One thing you can try is start from scratch with FracTAL layer depth, ftdepth=0, which is numerically more stable than 5 or 10. Sometimes high ftdepth with some learning rates doesn't play very well. The total loss step I see at about 6k - assuming this is training loss? - may indicate overfitting or something breaking. Check to see if you get zeros after that point or before that. |
Hi,
I think - unless I missed something in the description - that this is in accordance with the fact that you changed the labels for distance after some initial optimization level, and the algorithm had to pass through a new path from some bad point (all outputs zeros) on its way to learn again with the new labels. The softmax for distance and segmentation is justified because segmentation and distance transform labels are mutually exclusive spatially (they do not overlap spatially). On the other hand, boundaries use crisp-sigmoid because they are common. The distance transform in the latest images you posted suggests a bug somewhere in the loss/distance estimation. Check to see that you don't get somewhere some error due to broadcasting (dimension of axis equals to 1 and broadcasting happens implicitly). The change segmentation maks and boundaries look OK, in the sense: the algorithm is learning and improving. They are not bad for a first result - you should also visualize the images so as to understand better the errors of the algorithm (or the labels!). I think once you identify the bug in the distance transform all masks should be fine and work as nice as in our paper. I've used this model for change detection and semantic segmentation several times and get great result, so I am fairly confident you can make it work :).
It is I who thank you for your interest in our work and your effort. I apologize for the incompleteness in the code repository and for not providing a TF implementation. Let me know how you go once you nail it. |
hi, boss, i trained on levircd datasets, when i just overfit one image, all losses could decline quickly and got a pretty results. but when i training on all after 60 epoches, the boundary loss and distance loss do not decline, and the segmentation loss is fluctuation,
the segmentation prediction tends to be all zeros. i have 8 gpus, 2 batch sizes per gpu, i changed lr from 1e-3 to 1e-7,but all got bad results. is there any suggestions to avoid the segmentation prediction tends to be all zeros?
The text was updated successfully, but these errors were encountered: