-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
val_loss #42
Comments
Could you please share your setting for num_neg, which is used in training phase for negative sampling? I found this parameter has a strong effect on val loss. |
I set num_neg=4 the same as yours, used for negative sampling during the training phase.
------------------ 原始邮件 ------------------
发件人: "HenryNebula"<[email protected]>;
发送时间: 2019年4月1日(星期一) 下午2:55
收件人: "hexiangnan/neural_collaborative_filtering"<[email protected]>;
抄送: "愿你我依旧还在"<[email protected]>; "Author"<[email protected]>;
主题: Re: [hexiangnan/neural_collaborative_filtering] val_loss (#42)
Could you please share your setting for num_neg, which is used in training phase for negative sampling? I found this parameter has a strong effect on val loss.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
So I suppose the error you mentioned here refers to the loss of binary-cross-entropy on the validation set? If so, I guess it would be more proper to use the performance like HR or NDCG as the metric. The reason is the numbers of negative samples are different between training set and validation set (1 vs 4 and 1 vs 99 in the data file *.neg.dat), so the loss may act weirdly. |
the loss of binary-cross-entropy is on the validation set,But I divided a part of the training set as a validation set. The ratio of positive and negative samples is 1:4.
…------------------ 原始邮件 ------------------
发件人: "HenryNebula"<[email protected]>;
发送时间: 2019年4月1日(星期一) 下午3:11
收件人: "hexiangnan/neural_collaborative_filtering"<[email protected]>;
抄送: "愿你我依旧还在"<[email protected]>; "Author"<[email protected]>;
主题: Re: [hexiangnan/neural_collaborative_filtering] val_loss (#42)
So I suppose the error you mentioned here refers to the loss of binary-cross-entropy on the validation set? If so, I guess it would be more proper to use the performance like HR or NDCG as the metric. The reason is the numbers of negative samples are different between training set and validation set (1 vs 4 and 1 vs 99 in the data file *.neg.dat), so the loss may act weirdly.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
So you pair another 4 negative samples with the positive one sampled from the original training set to make a new validation set? If so, I think maybe you can check if those negative samples overlap any positive training samples. If all negative samples are correctly chosen, I don't have a better idea about this issue at present. |
Ok,Thank you very much!
…------------------ 原始邮件 ------------------
发件人: "HenryNebula"<[email protected]>;
发送时间: 2019年4月1日(星期一) 下午4:01
收件人: "hexiangnan/neural_collaborative_filtering"<[email protected]>;
抄送: "愿你我依旧还在"<[email protected]>; "Author"<[email protected]>;
主题: Re: [hexiangnan/neural_collaborative_filtering] val_loss (#42)
So you pair another 4 negative samples with the positive one sampled from the original training set to make a new validation set? If so, I think maybe you can check if those negative samples overlap any positive training samples. If all negative samples are correctly chosen, I don't have a better idea about this issue at present.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
You're welcome :) |
Hey @HenryNebula , My question is in some ways related to the value loss, so I felt I should comment here, rather than creating a new issue. The question is some what trivial. I've read the paper and ran the models successfully. Although, I'm a bit confused on one part. The use of Softmax activation with a binary-cross-entropy is driving me fuzzy. To me this is more or less a regression problem, i.e., when applied to Movie Lens data set trying to predict movie ratings based on previous interactions, the loss function of 'mse' along with 'relu' or even linear activation's make sense, but how come a 'sigmoid' function is used for activation on the last layer ? Wouldn't the sigmoid function always lead to outputs between [0,1] ? Even if we perform 'N' number of hyper-parameter tuning steps and other regularization techniques , technically a sigmoid function never crosses an output of 1.0 right ? I looked at other implementations of this paper and pretty much found the same thing, a sigmoid function at the end. I'm not trying to contradict your or the original authors idea here, I'm just trying to figure out how sigmoid activation would make sense for this problem. Or is there any piece I'm missing fundamentally here. Let me what you think. |
hello,Mr He:
I divide one-tenth of the data from your training set as a cross-validation set. After training, the verification set error grows from the beginning. Shouldn't it be a gradual decline?
The text was updated successfully, but these errors were encountered: