Paper review: meta-reinforcement-learning

Challenges of meta-RL

design a set of tasks that are interrelated
find the inter-representation
fast adaptation to new tasks
Papers

environment

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning
source: PMLR 2020
method: None
environment:
- object manipulation
paper link: http://proceedings.mlr.press/v100/yu20a/yu20a.pdf
code: https://github.com/rlworkgroup/metaworld
interpretation:
- https://meta-world.github.io/

model-based meta-RL

Learning to reinforcement learn

source: CogSci 2017
method: deep meta-RL
environment:
- bandit problem
- Two-step task
- Harlow experiment
- 3D navigation (Deepmind Lab)
paper link: https://arxiv.org/pdf/1611.05763.pdf
code:
- https://github.com/Phu-archive/Learning2RL
interpretation:

RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning

source: ICLR 2017
method: $RL^2$
environment:
- multi-armed bandit problem
- tabular MDP
- 3D navigation (ViZDoom)
paper link: https://arxiv.org/pdf/1611.02779.pdf
code:
- https://github.com/mwufi/meta-rl-bandits (pytorch)
- https://github.com/VashishtMadhavan/rl2 (tensorflow)
interpretation:
- https://zhuanlan.zhihu.com/p/32606591
- https://openreview.net/forum?id=HkLXCE9lx

Prefrontal cortex as a meta-reinforcement learning system

source: Nature Neuroscience 2018
method: None
environment:
- Harlow learning task
paper link: https://www.nature.com/articles/s41593-018-0147-8
code:
interpretation:

A Simple Neural Attentive Meta-Learner

source: ICLR 2018
method: SNAIL (simple neural attentive learner)
environment:
- Multi-armed bandits
- Tabular MDPs
- Navigation (VizDoom)
- Robotic locomotion (Mujoco)
paper link: https://openreview.net/pdf?id=B1DmUzWAW
code: https://github.com/eambutu/snail-pytorch
interpretation:
- https://www.carsi.edu.cn/index_zh.htm

PixelSNAIL: An Improved Autoregressive Generative Model

source: ICML 2018
method:
environment:
paper link: https://arxiv.org/pdf/1712.09763v1.pdf
code:
- https://github.com/neocxi/pixelsnail-public
interpretation:

Concurrent Meta Reinforcement Learning

source: arXiv:1903.02710 preprint
method: CMRL
environment:
- N-Monty-Hall
- N-Color-Choice
- N-Reacher (Reacher-V2 from gym)
paper link: https://arxiv.org/pdf/1903.02710v1.pdf
code:
interpretation:

Reinforcement Learning, Fast and Slow

source: Trends in Cognitive Sciences 2019
method:
environment:
paper link: https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-66131930061-0
code:
interpretation:

Improving Generalization in Meta Reinforcement Learning using Learned Objectives

source: ICLR 2020
method:
environment:
paper link: https://arxiv.org/pdf/1910.04098.pdf
code: https://github.com/louiskirsch/metagenrl
interpretation:

Discovering Reinforcement Learning Algorithms

source: arXiv:2007.08794 2020
method:
environment:
paper link: https://arxiv.org/pdf/2007.08794
code:
interpretation:

Model-based Adversarial Meta-Reinforcement Learning

source: NeurIPS 2020
method: AdMRL
environment:
paper link: https://arxiv.org/pdf/2006.08875v2.pdf
code: https://github.com/LinZichuan/AdMRL
interpretation:

optimization-based meta-RL

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

source: ICML 2017
method: MAML-RL
environment:
- 2D navigation (rllab)
- locomotion (rllab)
paper link: https://arxiv.org/pdf/1703.03400.pdf
code:
- https://github.com/cbfinn/maml_rl(tensorflow)
- https://github.com/tristandeleu/pytorch-maml-rl
interpretation:

On First-Order Meta-Learning Algorithms

source: arXiv:1803.02999 2018
method: Reptile
environment:
- few-shot image classification
  - mini-ImageNet
  - Omniglot
paper link: https://arxiv.org/pdf/1803.02999.pdf
code:
- https://github.com/openai/supervised-reptile
interpretation:
- https://openai.com/blog/reptile/
- https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html#reptile

Meta-Reinforcement Learning of Structured Exploration Strategies

source: NeurIPS 2018
method: MAESN (model agnostic exploration with structured noise)
environment:
- robotic locomotion (rllab)
- object manipulation
paper link: https://arxiv.org/pdf/1802.07245.pdf
code: https://github.com/russellmendonca/maesn_suite
interpretation:
- https://zhuanlan.zhihu.com/p/63072582

Some Considerations on Learning to Explore via Meta-Reinforcement Learning

source: ICLR 2018
method:
- E-MAML(optimization-based)
- E-$RL^2$(model-based)
environment:
- Krazy World
paper link: https://arxiv.org/pdf/1803.01118v2.pdf
code:
- https://github.com/geyang/e-maml
interpretation:

ProMP: Proximal Meta-Policy Search

source: ICLR 2019
method: ProMP
environment:
- locomotion(gym & Mujoco)
  - HalfCheetahFwdBack
  - AntRandDir
  - HopperRandParams
  - WalkerFwdBack
  - HumanoidRandDir
  - WalkerRandParams
paper link: https://openreview.net/pdf?id=SkxXCi0qFX
code: https://github.com/jonasrothfuss/promp

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables

source: ICML2019
method: PEARL (probabilistic embeddings for actor-critic RL)
environment:
- robotic locomotion (MuJoCo, MuJoCo200, MuJoCu133)
paper link: https://arxiv.org/pdf/1903.08254.pdf
code: https://github.com/katerakelly/oyster
interpretation:
- https://bair.berkeley.edu/blog/2019/06/10/pearl/

Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning

source: ICLR 2019
method:
- Model-Based Meta-Reinforcement Learning (train time)
- Online Model Adaptation (test time)
environment:
- Mujoco
  - half cheetah:disabled joint, sloped terrain, pier
  - Ant: crippled leg
- real world robot
paper link: https://arxiv.org/pdf/1803.11347v6.pdf
code:
- https://github.com/iclavera/learning_to_adapt
interpretation:
- https://sites.google.com/berkeley.edu/metaadaptivecontrol

source: CVPR 2019
method: savn
environment:
- 3D navigation (ai2thor)
paper link: https://arxiv.org/pdf/1812.00971v2.pdf
code:
- https://github.com/allenai/savn
interpretation:
- https://zhuanlan.zhihu.com/p/154184867

Meta-Q-Learning

source: ICLR 2020
method:
environment:
paper link: https://arxiv.org/pdf/1910.00125.pdf
code:
interpretation:

Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices

source: ICML 2021
method: Dream
environment:
- gym-miniworld 3D navigation
paper link: https://arxiv.org/pdf/2008.02790v2.pdf
code:
- https://github.com/ezliu/dream
interpretation:
- https://ezliu.github.io/dream/

Meta Learning via Learned Loss

source: ICPR 2021
method: ML^3
environment:
paper link: https://arxiv.org/pdf/1906.05374.pdf
code:
interpretation:

未分类

Alchemy: A structured task distribution for meta-reinforcement learning

source: arXiv:2102.02926 preprint 2021
method:
environment:
paper link: https://arxiv.org/pdf/2102.02926v1.pdf
code:
- https://github.com/deepmind/dm_alchemy
interpretation:
- https://deepmind.com/research/publications/alchemy

Learning Robust State Abstractions for Hidden-Parameter Block MDPs

source: ICLR2021
method:
environment:
paper link: https://arxiv.org/pdf/2007.07206v4.pdf
code:
interpretation:

Meta reinforcement learning as task inference

source: arXiv:1905.06424 2019
method:
environment:
paper link: https://arxiv.org/pdf/1905.06424v2.pdf
code:
interpretation:

MELD: Meta-Reinforcement Learning from Images via Latent State Models

source: CoRL 2020
method:
environment:
paper link: https://arxiv.org/pdf/2010.13957v2.pdf
code:
- https://github.com/tonyzhaozh/meld
interpretation:
- https://sites.google.com/view/meld-lsm

Meta Reinforcement Learning with Task Embedding and Shared Policy

source: IJCAI 2019
method:
environment:
paper link: https://arxiv.org/pdf/1905.06527v3.pdf
code: https://github.com/llan-ml/tesp
interpretation:

Fast Adaptive Task Offloading in Edge Computing based on Meta Reinforcement Learning

source: ITPDS 2020
method:
environment:
paper link: https://arxiv.org/pdf/2008.02033v5.pdf
code: https://github.com/linkpark/metarl-offloading
interpretation:

Learning Associative Inference Using Fast Weight Memory

source: ICLR 2021
method:
environment:
paper link: https://arxiv.org/pdf/2011.07831v2.pdf
code: https://github.com/ischlag/Fast-Weight-Memory-public
interpretation:

Few-Shot Complex Knowledge Base Question Answering via Meta Reinforcement Learning

source: EMNLP 2020
method:
environment:
paper link: https://arxiv.org/pdf/2010.15877v1.pdf
code: https://github.com/DevinJake/MRL-CQA
interpretation:

Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies

source: ICLR 2020
method:
environment:
paper link: https://arxiv.org/pdf/2001.00248v2.pdf
code: https://github.com/srsohn/msgi
interpretation:

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

source: NeurIPS 2019
method:
environment:
paper link: http://papers.nips.cc/paper/9026-loaded-dice-trading-off-bias-and-variance-in-any-order-score-function-gradient-estimators-for-reinforcement-learning.pdf
code: https://github.com/oxwhirl/loaded-dice
interpretation:

Causal Reasoning from Meta-reinforcement Learning

source: ICLR 2019
method:
environment:
paper link: https://arxiv.org/pdf/1901.08162v1.pdf
code: https://github.com/kantneel/causal-metarl
interpretation:

Introducing Neuromodulation in Deep Neural Networks to Learn Adaptive Behaviours

source: arXiv:1812.09113 preprint 2019
method:
environment:
paper link: https://arxiv.org/pdf/1812.09113v3.pdf
code: https://github.com/nvecoven/nmd_net
interpretation:

Policy Gradient RL Algorithms as Directed Acyclic Graphs

source: arXiv:2012.07763 preprint 2020
method:
environment:
paper link: https://arxiv.org/pdf/2012.07763v2.pdf
code: https://github.com/jjgarau/DAGPolicyGradient
interpretation:

Evolving Inborn Knowledge For Fast Adaptation in Dynamic POMDP Problems

source: GECCO 2020
method:
environment:
paper link: https://arxiv.org/pdf/2004.12846v2.pdf
code: https://github.com/dlpbc/penn-a
interpretation:

Offline Meta Learning of Exploration

source: arXiv:2008.02598 preprint 2021
method:
environment:
paper link: https://arxiv.org/pdf/2008.02598v3.pdf
code: https://github.com/Rondorf/BOReL
interpretation:

Meta-Reinforcement Learning for Reliable Communication in THz/VLC Wireless VR Networks

source: ICC 2021
method:
environment:
paper link: https://arxiv.org/pdf/2102.12277v1.pdf
code: https://github.com/wyy0206/THzVR
interpretation:

上篇Tips on How To Succeed in an ECE Major(如何在电子和计算机工程专业取得成功？)

下篇Summary: Reinforcement learning benchmarks

Challenges of meta-RL

Papers

environment

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

model-based meta-RL

PixelSNAIL: An Improved Autoregressive Generative Model

Concurrent Meta Reinforcement Learning

Reinforcement Learning, Fast and Slow

Improving Generalization in Meta Reinforcement Learning using Learned Objectives

Discovering Reinforcement Learning Algorithms

Model-based Adversarial Meta-Reinforcement Learning

optimization-based meta-RL

Meta-Reinforcement Learning of Structured Exploration Strategies

Some Considerations on Learning to Explore via Meta-Reinforcement Learning

ProMP: Proximal Meta-Policy Search

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables

Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning

Meta-Q-Learning

Meta Learning via Learned Loss

未分类

Alchemy: A structured task distribution for meta-reinforcement learning

Learning Robust State Abstractions for Hidden-Parameter Block MDPs

Meta reinforcement learning as task inference

MELD: Meta-Reinforcement Learning from Images via Latent State Models

Meta Reinforcement Learning with Task Embedding and Shared Policy

Fast Adaptive Task Offloading in Edge Computing based on Meta Reinforcement Learning

Learning Associative Inference Using Fast Weight Memory

Few-Shot Complex Knowledge Base Question Answering via Meta Reinforcement Learning

Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Causal Reasoning from Meta-reinforcement Learning

Introducing Neuromodulation in Deep Neural Networks to Learn Adaptive Behaviours

Policy Gradient RL Algorithms as Directed Acyclic Graphs

Evolving Inborn Knowledge For Fast Adaptation in Dynamic POMDP Problems

Model-Based Meta-Reinforcement Learning for Flight with Suspended Payloads

Hierarchical Meta Reinforcement Learning for Multi-Task Environments

Modeling and Optimization Trade-off in Meta-learning

Meta-Learning of Structured Task Distributions in Humans and Machines

Offline Meta Learning of Exploration

Meta-Reinforcement Learning for Reliable Communication in THz/VLC Wireless VR Networks