-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue with forward loop in the models.py #1
Comments
Hi Pranay, Thank you a lot for looking into it and all the useful feedbacks so far. About your question on l.139: loss = self.CE(pred,labels.squeeze()) pred is the raw output of the classifier, unbouded (before softmax) labels is the groundtruth binary human rating correct me if I am wrong, this follows the training regime you are applying for your paper ? Thanks ! |
I updated the codes and tried to clarify in the readme some parameters I added. pretrain.py should allow to load one of 3 models that were trained on different subsets and with a final test loss getting ~ close to 0.55 ; although it is borderline with the target loss you recommended, I did not manage to get below, so far ! If you spot some mistakes or weird behavior, or if there are some issues with running the codes, please let me know and I fix it. Thank you for trying that out, maybe we get a well working pytorch version ! |
Hi Adrien, Thanks, |
Hi Pranay, Thank you for making the effort to have a look into the Pytorch coding. I modified models.py, but as far as I defined the input shapes, it should be the same to put squeeze(1) or squeeze(-1). I guess there can be some confusion from my weird choice to shape labels as [batch,label] instead of a flat tensor [batch] ? It is kind of an habit, no matter the number of features/labels, I always keep the first dimension for the batch size and the additional dimensions for the sample sizes. In pretrain.py I actually make a forward "check" for a dummy batch of size 1 ; but I input a fake label of shape [1,1]. And it seems to forward correctly. Or are you still having issues with it ? About the codes you ran with the pretrained models I provide, do they seem to indicate the training is decently successful ? Because I did not implement any of the further evaluations you develop in the paper, so I cannot compare the result quality wrt. to your results. And of course, I am happy to be linked in your official repository ! Thanks, |
Hi Adrien, Thanks, |
Hi Pranay, Thank you for pointing this out. Equation (1) in the paper only divides by the time dimension, although it doesn't make sense I divided by the batch size too since this dimension is not reduced .. The CE loss is averaging by the batch size afterwards. I am correcting models.py line 71 according to your suggestion, that the average is as well on the channel dimension. I am sending again a training with this correction to see how it goes Thanks, |
Hi Pranay, I updated the pretrain.py and replaced pretrained models with two that were trained with the correct distance averaging (time and channel dimensions) The models are 'dataset_combined_linear' and 'dataset_combined_linear_tshrink' ; they both trained on the subsets combined+linear, the second had the tanhshrink activation on the distance. Test losses are respectively 0.569 and 0.564 ; I put the performance details in comment of pretrain.py, the tanhshrink activation tends to push the distance for similar audio closer to 0 and has a biggest average ratio between label 0 and 1 pairs. Which maybe indicates that it is useful to have this saturation close to 0 and increasing gradient with the increase of the distance .. not sure though, I put both but that was my "intuition" about adding an activation to the distance. If you cannot load them, please let me know any troubles. And if you have time for giving them a try, I'm happy to hear about it. I still run a few trainings, to see if I can get them performing better on test CE loss which is still imperfect .. Thanks, |
Hi Adrien,
I was actually just now going through your codebase and found something that I wanted to confirm.
In models.py, line 139, you take the CE loss of the output of the Classification network and the actual label. Is that correct?
In my model, I take the softmax of the outputs of the Classification network and take the CE loss with the actual labels.
I just wanted to make sure that you follow that regime. Does that make sense?
Thanks!
Pranay
The text was updated successfully, but these errors were encountered: