val_loss #42

wangdazi · 2019-03-23T12:05:34Z

hello,Mr He:
I divide one-tenth of the data from your training set as a cross-validation set. After training, the verification set error grows from the beginning. Shouldn't it be a gradual decline?

HenryNebula · 2019-04-01T06:55:08Z

Could you please share your setting for num_neg, which is used in training phase for negative sampling? I found this parameter has a strong effect on val loss.

wangdazi · 2019-04-01T07:00:37Z

I set num_neg=4 the same as yours, used for negative sampling during the training phase. ------------------ 原始邮件 ------------------ 发件人: "HenryNebula"<[email protected]>; 发送时间: 2019年4月1日(星期一) 下午2:55 收件人: "hexiangnan/neural_collaborative_filtering"<[email protected]>; 抄送: "愿你我依旧还在"<[email protected]>; "Author"<[email protected]>; 主题: Re: [hexiangnan/neural_collaborative_filtering] val_loss (#42) Could you please share your setting for num_neg, which is used in training phase for negative sampling? I found this parameter has a strong effect on val loss. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

HenryNebula · 2019-04-01T07:11:30Z

So I suppose the error you mentioned here refers to the loss of binary-cross-entropy on the validation set? If so, I guess it would be more proper to use the performance like HR or NDCG as the metric. The reason is the numbers of negative samples are different between training set and validation set (1 vs 4 and 1 vs 99 in the data file *.neg.dat), so the loss may act weirdly.

wangdazi · 2019-04-01T07:20:35Z

the loss of binary-cross-entropy is on the validation set,But I divided a part of the training set as a validation set. The ratio of positive and negative samples is 1:4.

…

------------------ 原始邮件 ------------------ 发件人: "HenryNebula"<[email protected]>; 发送时间: 2019年4月1日(星期一) 下午3:11 收件人: "hexiangnan/neural_collaborative_filtering"<[email protected]>; 抄送: "愿你我依旧还在"<[email protected]>; "Author"<[email protected]>; 主题: Re: [hexiangnan/neural_collaborative_filtering] val_loss (#42) So I suppose the error you mentioned here refers to the loss of binary-cross-entropy on the validation set? If so, I guess it would be more proper to use the performance like HR or NDCG as the metric. The reason is the numbers of negative samples are different between training set and validation set (1 vs 4 and 1 vs 99 in the data file *.neg.dat), so the loss may act weirdly. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

HenryNebula · 2019-04-01T08:01:33Z

So you pair another 4 negative samples with the positive one sampled from the original training set to make a new validation set? If so, I think maybe you can check if those negative samples overlap any positive training samples. If all negative samples are correctly chosen, I don't have a better idea about this issue at present.

wangdazi · 2019-04-01T08:06:13Z

Ok,Thank you very much!

…

------------------ 原始邮件 ------------------ 发件人: "HenryNebula"<[email protected]>; 发送时间: 2019年4月1日(星期一) 下午4:01 收件人: "hexiangnan/neural_collaborative_filtering"<[email protected]>; 抄送: "愿你我依旧还在"<[email protected]>; "Author"<[email protected]>; 主题: Re: [hexiangnan/neural_collaborative_filtering] val_loss (#42) So you pair another 4 negative samples with the positive one sampled from the original training set to make a new validation set? If so, I think maybe you can check if those negative samples overlap any positive training samples. If all negative samples are correctly chosen, I don't have a better idea about this issue at present. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

HenryNebula · 2019-04-01T09:00:28Z

You're welcome :)

amithadiraju1694 · 2019-05-10T16:11:53Z

Hey @HenryNebula ,

My question is in some ways related to the value loss, so I felt I should comment here, rather than creating a new issue. The question is some what trivial. I've read the paper and ran the models successfully. Although, I'm a bit confused on one part.

The use of Softmax activation with a binary-cross-entropy is driving me fuzzy. To me this is more or less a regression problem, i.e., when applied to Movie Lens data set trying to predict movie ratings based on previous interactions, the loss function of 'mse' along with 'relu' or even linear activation's make sense, but how come a 'sigmoid' function is used for activation on the last layer ?

Wouldn't the sigmoid function always lead to outputs between [0,1] ? Even if we perform 'N' number of hyper-parameter tuning steps and other regularization techniques , technically a sigmoid function never crosses an output of 1.0 right ? I looked at other implementations of this paper and pretty much found the same thing, a sigmoid function at the end.

I'm not trying to contradict your or the original authors idea here, I'm just trying to figure out how sigmoid activation would make sense for this problem. Or is there any piece I'm missing fundamentally here. Let me what you think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

val_loss #42

val_loss #42

wangdazi commented Mar 23, 2019

HenryNebula commented Apr 1, 2019

wangdazi commented Apr 1, 2019 via email

HenryNebula commented Apr 1, 2019

wangdazi commented Apr 1, 2019 via email

HenryNebula commented Apr 1, 2019

wangdazi commented Apr 1, 2019 via email

HenryNebula commented Apr 1, 2019

amithadiraju1694 commented May 10, 2019

val_loss #42

val_loss #42

Comments

wangdazi commented Mar 23, 2019

HenryNebula commented Apr 1, 2019

wangdazi commented Apr 1, 2019 via email

HenryNebula commented Apr 1, 2019

wangdazi commented Apr 1, 2019 via email

HenryNebula commented Apr 1, 2019

wangdazi commented Apr 1, 2019 via email

HenryNebula commented Apr 1, 2019

amithadiraju1694 commented May 10, 2019