Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

val_loss #42

Open
wangdazi opened this issue Mar 23, 2019 · 8 comments
Open

val_loss #42

wangdazi opened this issue Mar 23, 2019 · 8 comments

Comments

@wangdazi
Copy link

hello,Mr He:
I divide one-tenth of the data from your training set as a cross-validation set. After training, the verification set error grows from the beginning. Shouldn't it be a gradual decline?

@HenryNebula
Copy link

Could you please share your setting for num_neg, which is used in training phase for negative sampling? I found this parameter has a strong effect on val loss.

@wangdazi
Copy link
Author

wangdazi commented Apr 1, 2019 via email

@HenryNebula
Copy link

So I suppose the error you mentioned here refers to the loss of binary-cross-entropy on the validation set? If so, I guess it would be more proper to use the performance like HR or NDCG as the metric. The reason is the numbers of negative samples are different between training set and validation set (1 vs 4 and 1 vs 99 in the data file *.neg.dat), so the loss may act weirdly.

@wangdazi
Copy link
Author

wangdazi commented Apr 1, 2019 via email

@HenryNebula
Copy link

So you pair another 4 negative samples with the positive one sampled from the original training set to make a new validation set? If so, I think maybe you can check if those negative samples overlap any positive training samples. If all negative samples are correctly chosen, I don't have a better idea about this issue at present.

@wangdazi
Copy link
Author

wangdazi commented Apr 1, 2019 via email

@HenryNebula
Copy link

You're welcome :)

@amithadiraju1694
Copy link

Hey @HenryNebula ,

My question is in some ways related to the value loss, so I felt I should comment here, rather than creating a new issue. The question is some what trivial. I've read the paper and ran the models successfully. Although, I'm a bit confused on one part.

The use of Softmax activation with a binary-cross-entropy is driving me fuzzy. To me this is more or less a regression problem, i.e., when applied to Movie Lens data set trying to predict movie ratings based on previous interactions, the loss function of 'mse' along with 'relu' or even linear activation's make sense, but how come a 'sigmoid' function is used for activation on the last layer ?

Wouldn't the sigmoid function always lead to outputs between [0,1] ? Even if we perform 'N' number of hyper-parameter tuning steps and other regularization techniques , technically a sigmoid function never crosses an output of 1.0 right ? I looked at other implementations of this paper and pretty much found the same thing, a sigmoid function at the end.

I'm not trying to contradict your or the original authors idea here, I'm just trying to figure out how sigmoid activation would make sense for this problem. Or is there any piece I'm missing fundamentally here. Let me what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants