This is our Tensor2Tensor implementation of Quaternion Transformers. This paper will be presented in the upcoming ACL 2019 in Florence.
- Tensorflow 1.12.0
- Tensor2Tensor 1.12.0
- Python 2.7
- The usage of this repository follows the original Tensor2Tensor repository (e.g., t2t-datagen, t2t-trainer followed by t2t-decoder). It helps to gain familiarity on T2T before attempting to run our code.
- Setting
--t2t_usr_dir=./QuaternionTransformers
will allow T2T to register Quaternion Transformers. To verify, usingt2t-trainer --registry_help
to verify that you are able to load Quaternion transformers. - You should be able to load
MODEL=quaternion_transformer
and use base or big setting as per normal. - Be sure to set
--hparams="self_attention_type="quaternion_dot_product""
to activate Quaternion Attention. - By default, Quaternion FFNs are activated for positional FFN layers. To revert and not use Quaternion FFNs on the position-wise FFN, set
--hparams="ffn_layer="raw_dense_relu_dense"
.
If you find our work useful, please consider citing our paper:
@article{tay2019lightweight,
title={Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks},
author={Tay, Yi and Zhang, Aston and Tuan, Luu Anh and Rao, Jinfeng and Zhang, Shuai and Wang, Shuohang and Fu, Jie and Hui, Siu Cheung},
journal={arXiv preprint arXiv:1906.04393},
year={2019}
}