Does CTranslate support distill-tiny model defined in Paper? #44

zdemillard · 2018-11-30T22:37:39Z

Hello, we have trained a bidirectional rnn encoder decoder (default OpenNMT-lua settings) and successfully released the model and tested using this repository. However, when we are working through the paper (http://aclweb.org/anthology/W18-2715) and try to replicate the distill-tiny model with a GRU encoder with 2-layers on the encoder but only 1-layer on the decoder, we run into the issue that the released model doesn't translate anything using the GPU (--cuda). When I run on the CPU, I get the following error:

Intel MKL ERROR: Parameter 10 was incorrect on entry to SGEMM .

The model can accurately translate using the lua code so we know it isn't any issue with the model but must be something incompatible when we try to release to CTranslate. Here is the full configuration used to train:

th train.lua -data data/demo-train.t7 \
	-save_model distill_tiny_model_unlemmatized_50k_gru \
	-gpuid 1    \
	-max_batch_size 512\
	-save_every 5000 \
	-src_vocab_size 50000 \
	-tgt_vocab_size 50000  \
	-src_words_min_frequency 5 \
	-tgt_words_min_frequency 5 \
	-rnn_type GRU \
	-rnn_size 512 \
	-optim adam \
	-learning_rate 0.0002  \
	-enc_layers 2 \
	-dec_layers 1 \
	-bridge dense \
	-continue true \
	-log_file log.txt

Does CTranslate support GRU as a rnn_type and does it support dense as an option for -bridge?

The text was updated successfully, but these errors were encountered:

guillaumekln · 2018-12-03T08:25:40Z

It does support GRU but not the "dense" bridge. You should use the "last" bridge from this branch instead:

https://github.com/OpenNMT/OpenNMT/tree/thin_dec

zdemillard · 2018-12-03T17:51:20Z

Thank you for the response. Does the "last" bridge support a different number of encoder/decoder layers (e.g. 2-layer encoder and 1-layer decoder)? I know "copy" doesn't work when the number of layers are different.

guillaumekln · 2018-12-03T20:07:50Z

Does the "last" bridge support a different number of encoder/decoder layers (e.g. 2-layer encoder and 1-layer decoder)?

Yes. If the decoder has N layers, it will only copy the last N layers of the encoder (assuming the encodre has more than N layers).

zdemillard · 2018-12-03T22:31:55Z

Great, we will use that then. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does CTranslate support distill-tiny model defined in Paper? #44

Does CTranslate support distill-tiny model defined in Paper? #44

zdemillard commented Nov 30, 2018

guillaumekln commented Dec 3, 2018

zdemillard commented Dec 3, 2018

guillaumekln commented Dec 3, 2018

zdemillard commented Dec 3, 2018

Does CTranslate support distill-tiny model defined in Paper? #44

Does CTranslate support distill-tiny model defined in Paper? #44

Comments

zdemillard commented Nov 30, 2018

guillaumekln commented Dec 3, 2018

zdemillard commented Dec 3, 2018

guillaumekln commented Dec 3, 2018

zdemillard commented Dec 3, 2018