XLNet: Generalized Autoregressive Pretraining for Language Understanding

Notes

Denoising Autoencoding based pertaining like BERT achieves better performance than pertaining approaches based on autoregressive language modeling.

Due to Masked inputs BERT suffers from pretrain-finetune discrepancy.

What is XLNet?

XLNet is a generalised autoregressive pretraining method that :

Enables learning bidirectional contexts by maximising the expected likelihood over all permutations of the factorization order
Overcomes the limitations of BERT by using autoregressive formulation

XLNet incorporates idea from Transformer-XL, which is SOTA autoregressive model, into pretraining.

Most successful pretraining objectives are

Autoregressive (AR) language modelling
Autoencoding

AR language modelling pretraining method estimates the probability distribution of a text corpus using autoregressive model. AR language model is only trained to encode only uni-directional context, therefore it is not a effective modeling deep bidirectional contexts. Downstream NLP tasks require bi-directional context information.

Auto-encoding based pretraining does not perform explicit density estimation but it reconstructs the original data from corrupted input. Example is BERT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XLNet.md

XLNet.md

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Notes

What is XLNet?

Files

XLNet.md

Latest commit

History

XLNet.md

File metadata and controls

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Notes

What is XLNet?