-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two differences from the original implementation #17
Comments
yes
|
Hi @wangxin0716 and @ryh95 , As you have pointed out, the original paper mentions freezing the word embeddings. I had overlooked this, but have rectified my mistake and incorporated this via commit which adds the option of freezing the word embeddings during training. This results in a slight improvement to the metrics, and we can now reach Pearson's coefficient of We are now within |
BTW, @wangxin0716 , I also tried the change you suggested, i.e. |
I run with parameter --lr 0.025 --wd 0.0001 --optim adagrad --batchsize 25 --freeze_embed, however, the result is 0.857, 0.01 less than what it is supposed to be. What could possibly caused the situation? |
Thanks for the code. That was very helpful in understanding the paper. I ran the code with the following configuration :
and got the best result at 5th epoch: Epoch 5, Test Loss: 0.10324564972114664 Pearson: 0.8587949275970459 MSE: 0.2709934413433075 which is less than what is claimed. Could you please suggest, what I could be doing wrong? Is there anyone else facing the same issue? Thanks |
I got the same result as you,
~0.846
Pearson score. After checking the original implementation, I found two differences.You call .backward() for each sample in the mini-batch, and then perform one step update with self.optimizer.step(). Since the backward() function accumulate the gradients automatically, it seems you need to average both the losses and the gradients over the mini-batch. So I think the arrow line above should be changed to
0
.Furthermore, I did some simple calculations. The number of embedding parameters is more than
700000
, and286505
for the other model parameters. Consider the size of the training set is just4500
, it is too small to fine-tune the embeddings.After I made the two above modifications, I can get
0.854
Pearson score and0.274
MSE with Adagrad(learning_rate=0.05
)The text was updated successfully, but these errors were encountered: