Transformers

In sequence-to-sequence problems such as the neural machine translation, the initial proposals were based on the use of RNNs in an encoder-decoder architecture. These architectures have a great limitation when working with long sequences, their ability to retain information from the first elements was lost when new elements were incorporated into the sequence.

Then to deal with this limitation, a new concept were introduced the attention mechanism.

The Transformer model extract features for each word using a self-attention mechanism to figure out how important all the other words in the sentence are w.r.t. to the aforementioned word. And no recurrent units are used to obtain this features, they are just weighted sums and activations, so they can be very parallelizable and efficient.

Research Paper : Attention Is All You Need

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Transformers_from_Scratch.ipynb		Transformers_from_Scratch.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformers

This is an implementation of Transformer Attention model using Tensorflow/Keras.

About

Releases

Packages

Languages

aju22/Transformers

Folders and files

Latest commit

History

Repository files navigation

Transformers

This is an implementation of Transformer Attention model using Tensorflow/Keras.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages