Deep learning

Yann LeCun, Yoshua Bengio, Geoffrey Hinton (2015)

Key points

Conventional ML: careful feature engineering + domain knowledge
DL: learn features + representation from data
- ML needs a good feature extractor to generalize well, while DL can automatically learn representations sensitive to details and invariant to irrelevant fluctuations
Advent of DL: unsupervised pre-training + fine-tuning with backprop works very well for generalization with smaller datasets
CNNs: convolutional and pooling layers + hierarchy
- Convolution: detect local conjunctions of features from the previous layer
- Pooling: merge semantically similar features into one
- Natural signals are hierarchies (edge --> motif --> part --> object)
- Pooling allows equal representation despite varying position/appearance
Deep nets make use distributed representations
- Enable generalization through combining learned features
- Same goes for using many layers of representations
RNNs: sequential inputs, maintain state vector of history
- Suffer from vanishing/exploding gradients
- Even longer-term memory: LSTM (memory cell with input and forget gate), neural Turing machine (explicit tape-like memory)
Future of DL: unsupervised learning, representation learning + complex reasoning