GRU cells #20

guillitte · 2016-03-02T19:07:49Z

I added the possibility to use GRU cells.

jcjohnson · 2016-03-02T19:15:02Z

Wow, this looks amazing - thanks a bunch! There's even a unit test! I want to look through it in a bit more detail before merging, and I probably won't have time to do so today.

guillitte · 2016-03-02T20:30:33Z

Thanks. It could certainly be further optimized, but, at least, it seems to work fine.

JoostvDoorn · 2016-05-04T09:02:50Z

Any update on this?

test

guillitte · 2016-05-04T13:29:22Z

For those interested, I also added a gridgru adapted from http://arxiv.org/abs/1507.01526 in the Dev branch

guillitte · 2016-05-05T09:43:31Z

Running a small benchmark using 1000 iterations on tiny Shakespeare (Epoch 3.8), I got the following results :

LSTM :

{"i":1000,"val_loss_history":[1.6292053406889],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"lstm","lr_decay_every":5,"print_every":1,"wordvec_size":64,"seq_length":50,"input_json":"data/tiny-shakespeare.json","num_layers":3,"input_h5":"data/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv/lstm","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0}

GRU :

{"i":1000,"val_loss_history":[1.4681989658963],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"gru","lr_decay_every":5,"print_every":1,"wordvec_size":64,"seq_length":50,"input_json":"data/tiny-shakespeare.json","num_layers":3,"input_h5":"data/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv/gru","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0}

GRIDGRU :

{"i":1000,"val_loss_history":[1.4313773946329],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"gridgru","lr_decay_every":5,"print_every":1,"wordvec_size":800,"seq_length":50,"input_json":"data/tiny-shakespeare.json","num_layers":3,"input_h5":"data/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv/gridgru","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0}

NB : for GRIDGRU, wordvec_size is the size of the network along depth, so it should be about the same as rnn_size

JoostvDoorn · 2016-05-16T11:35:58Z

GRU.lua

+    cur_gates[{{}, {2 * H + 1, 3 * H}}]:addmm(next_h, Wh[{{}, {2 * H + 1, 3 * H}}]) -- hc += Wh * r . prev_h
+    local hc = cur_gates[{{}, {2 * H + 1, 3 * H}}]:tanh() --hidden candidate : hc = tanh(Wx * x + Wh * r . prev_h + b)
+    next_h:addcmul(prev_h,-1, u, prev_h)
+    next_h:addcmul(u,hc)  --next_h = (1-u) . prev_h + u . hc   


A small note: the original paper http://arxiv.org/pdf/1406.1078v3.pdf has it the other way around, see Equation 7.

It is true.
As always, there are many small variations for the same algorithm.
For the definition of GRU, I used the code in Karpathy's char-rnn and I didn't chek the original article.

AlekzNet · 2016-11-06T14:40:48Z

@guillitte I wonder how fair this comparison is. GRIDGRU has as twice as more parameters than LSTM, and 2.5 times more parameters, than GRU. 3x800 GRIDGRU has roughly the same amount of parameters as, say, 3x1070 LSTM or 3x1250 GRU. So, in this comparison, GRU wins hands down.

binary-person · 2019-02-27T22:39:19Z

This has been open for a while, mind if one of the contributors merge this?

JoostvDoorn · 2019-02-28T06:37:07Z

@scheng123 An equivalent implementation is also merged into https://github.com/torch/rnn/ with the name SeqGRU.

guillitte added 5 commits March 2, 2016 19:19

Create GRU.lua

7fdc179

Create GRU_test.lua

478046e

Update GRU.lua

0f5666a

Update LanguageModel.lua

50f0228

Update GRU.lua

bbdfaf9

guillitte added 4 commits March 5, 2016 18:15

Update LanguageModel.lua

45d16ea

Update LanguageModel.lua

fa3edc1

Update VanillaRNN.lua

688f14a

Update DataLoader.lua

c33a3dc

JoostvDoorn mentioned this pull request May 2, 2016

Should we also need an implementation of SeqGRU? Element-Research/rnn#235

Closed

Merge branch 'master' of https://github.com/jcjohnson/torch-rnn

4cfb3c0

test

Update GRU.lua

62b0ebc

JoostvDoorn reviewed May 16, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRU cells #20

GRU cells #20

guillitte commented Mar 2, 2016

jcjohnson commented Mar 2, 2016

guillitte commented Mar 2, 2016

JoostvDoorn commented May 4, 2016 •

edited

Loading

guillitte commented May 4, 2016

guillitte commented May 5, 2016

JoostvDoorn May 16, 2016

guillitte May 16, 2016

AlekzNet commented Nov 6, 2016 •

edited

Loading

binary-person commented Feb 27, 2019

JoostvDoorn commented Feb 28, 2019

GRU cells #20

Are you sure you want to change the base?

GRU cells #20

Conversation

guillitte commented Mar 2, 2016

jcjohnson commented Mar 2, 2016

guillitte commented Mar 2, 2016

JoostvDoorn commented May 4, 2016 • edited Loading

guillitte commented May 4, 2016

guillitte commented May 5, 2016

JoostvDoorn May 16, 2016

Choose a reason for hiding this comment

guillitte May 16, 2016

Choose a reason for hiding this comment

AlekzNet commented Nov 6, 2016 • edited Loading

binary-person commented Feb 27, 2019

JoostvDoorn commented Feb 28, 2019

JoostvDoorn commented May 4, 2016 •

edited

Loading

AlekzNet commented Nov 6, 2016 •

edited

Loading