-
Notifications
You must be signed in to change notification settings - Fork 261
ALGORITHMS DEVELOPMENT
Note: imitation, guided and meta- learning are active-research areas aimed to find well-generalizable and robust to task shifts policies; there is a fusion of model-free and model-based rl methods, representation techniques and implementation options; research topics outlined here may be closely interrelated.
Idea: Learn state space embedding with relevant features:
-
beta-VAE autoencoder -- learning disentangled generative factors (DARLA).
-
structuring convolution encoder to promote relevant features
Pros: can learn efficient general state representations → find generalizable policies
Cons: can learn irrelevant features; usually trained wit sq. error, but our specific is finding local price minima and maxima (+something else?);
Maybe:
-
construct additional domain-specific terms to encoder train loss to promote relevant features;
-
construct convolution encoder such as learnt features represent local price min/max;
Links:
DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
beta-VAE paper, and its SCAN extension
and related Deep Variational Information Bottleneck paper;
Deep Spatial Autoencoders for Visuomotor Learning
Chelsea Finn CS294-112 lecture video - excellent topic intro
Aux: make learnt features visualization applet, add to tensorboard images: [simple solution from keras blog]
Links :
CS 294-112 spring'17 Lecture 6 slides and video
MAML
Idea: MAML with asynchronous setup (i.e. A3C)
Pros: finding generalizable policy;
Cons: active research area; generic maml algorithm may not be scalable to our domain; may need some implementation tricks such as in here: One-Shot Visual Imitation Learning via Meta-Learning.
Links:
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (MAML)
Meta-SGD: Learning to Learn Quickly for Few-Shot Learning
Learning To Reinforcement Learn (RL^2 simple framework idea)
Learning to Learn: Meta-Critic Networks for Sample Efficient Learning
Meta-Learning with Temporal Convolutions
Learning to Generalize: Meta-Learning for Domain Generalization
Guided policy search + meta-learning:
Idea: fit local policies (for single or several episodes of data) and use it as an expert demonstrating correct actions. Use direct actions imitation loss. Can use it as meta-learning loss (~MAML) on target (~trial test) data;
Pros: speed-up learning process, cut off irrelevant policy space regions
Cons: need computation time to fit local models; Unclear: is it better or not to use direct actions imitation loss vs just testing model on target data (like in original MAML formulation); local model parameterizing choice?
Links:
Overcoming Exploration in Reinforcement Learning with Demonstrations
End-to-End Training of Deep Visuomotor Policies -- links to optimal control theory, notation shortlist
Idea: fit different algorithm implementations to btgym domain;
ACKTR -- are LSTM's layers in for K-FAC optimizer?
Etc.
Pros: may perform better;
Links:
Asynchronous Methods for Deep Reinforcement Learning - modern classics on DeepMind
Reinforcement Learning with Unsupervised Auxiliary Tasks