Recurrent Neural Networks

Recurrent neural networks are great at encapsulating sequence or time-series data, due to the connections formed between nodes create a directed graph along the sequence. A great resource that explains LSTM's better than I could here is found here. Check it out.

There has been a great study on the optimal parameters for LSTM networks in sequence classification tasks here. In section 7 there is a comparison of different features and additions and the impact they have on the network. I will summarize them here:

High Impact

Word Embeddings

Word embeddings improved accuracy when used accross the board.

Optimizer

Adam and Nadam proved the best, followed by RMSProp

Classifier (not relevant, for multilabel)

CRF instead of softmax proved to be better.

Dropout

Variational Dropout performed significantly better than naive or no dropout.

Gradient Clipping / Normalization

Gradient clipping did not help at all, however gradient normalization with T=1 proved to significatly increase accuracy.

Medium Impact

Tagging Scheme

The BIO and IOBES tagging scheme performed consistently better than the IOB tagging scheme.

No. of LSTM Layers

If the number of recurrent units is kept constant, two stacked BiLSTM-layers resulted in the best performance.

Mini-batch size

The optimal size for the mini-batch appears to depend on the task. For POS tagging and event recognition, a size of 1 was optimal,for chunking a size of 8 and for NER and Entity Recognition a size of 31.

Character representation

Character-based representations were in a lot of tested configurations not that helpful and could not improve the performance of the network.

Low/No Impact

Recurrent Units

The number of recurrent units, as long as it is not far too large or far too small, has only a minor effect on the results. A value of about 100 for each LSTM-network appears to be a good rule of thumb for the tested tasks.

Backend

Theano as well as Tensorflow performed equally in terms of test performance.

Sidebar

General

LUCAS (Backend)

API

Endpoint Definitions

Data Science

ACLSW 2019
Our datasets
Experiment Results
Research Analysis
Hypothesis
Machine Learning
- Naive Bayes
- Logistic Regression
Deep Learning
Paper Section Drafts
Word Embeddings
References/Resources
Correspondence with H. Aghakhani
The Gotcha! Collection

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recurrent Neural Networks

Recurrent Neural Networks

High Impact

Word Embeddings

Optimizer

Classifier (not relevant, for multilabel)

Dropout

Gradient Clipping / Normalization

Medium Impact

Tagging Scheme

No. of LSTM Layers

Mini-batch size

Character representation

Low/No Impact

Recurrent Units

Backend

Sidebar

General

LUCAS (Backend)

API

Data Science

Lucify (Frontend)

Clone this wiki locally