awesome-offline-rl
This is a collection of research and review papers for offline reinforcement learning (offline rl). Feel free to star and fork.
Maintainers:
- Haruka Kiyohara (Tokyo Institute of Technology)
- Yuta Saito (Hanjuku-kaso Co., Ltd.)
We are looking for more contributors and maintainers! Please feel free to pull requests.
format:
- [title](paper link) [links]
- author1, author2, and author3. arXiv/conferences/jornals/, year.
For any question, feel free to contact: [email protected]
Papers
Review Papers
- Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
- Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. arXiv, 2020.
Offline RL: Theory/Methods
- Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning
- Lanqing Li, Yuanhao Huang, and Dijun Luo. arXiv, 2021.
- Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
- Luofeng Liao, Zuyue Fu, Zhuoran Yang, Mladen Kolar, and Zhaoran Wang. arXiv, 2021.
- GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning
- Guy Tennenholtz, Nir Baram, and Shie Mannor. arXiv, 2021.
- MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning
- DiJia Su, Jason D. Lee, John M. Mulvey, and H. Vincent Poor. arXiv, 2021.
- Continuous Doubly Constrained Batch Reinforcement Learning
- Rasool Fakoor, Jonas Mueller, Pratik Chaudhari, and Alexander J. Smola. arXiv, 2021.
- COMBO: Conservative Offline Model-Based Policy Optimization
- Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, and Chelsea Finn. arXiv, 2021.
- Representation Matters: Offline Pretraining for Sequential Decision Making
- Mengjiao Yang and Ofir Nachum. arXiv, 2021.
- Q-Value Weighted Regression: Reinforcement Learning with Limited Data
- Piotr Kozakowski, Łukasz Kaiser, Henryk Michalewski, Afroz Mohiuddin, and Katarzyna Kańska. arXiv, 2021.
- PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators
- Anish Agarwal, Abdullah Alomar, Varkey Alumootil, Devavrat Shah, Dennis Shen, Zhi Xu, and Cindy Yang. arXiv, 2021.
- Risk-Averse Offline Reinforcement Learning
- Núria Armengol Urpí, Sebastian Curi, and Andreas Krause. arXiv, 2021.
- Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency
- Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, and Tengyang Xie. arXiv, 2021.
- Fast Rates for the Regret of Offline Reinforcement Learning
- Yichun Hu, Nathan Kallus, and Masatoshi Uehara. arXiv, 2021.
- Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
- Ming Yin, Yu Bai, and Yu-Xiang Wang. arXiv, 2021.
- Identifying Decision Points for Safe and Interpretable Reinforcement Learning in Hypotension Treatment
- Kristine Zhang, Yuanheng Wang, Jianzhun Du, Brian Chu, Leo Anthony Celi, Ryan Kindle, and Finale Doshi-Velez. arXiv, 2021.
- Batch Reinforcement Learning Through Continuation Method
- Yijie Guo, Shengyu Feng, Nicolas Le Roux, Ed Chi, Honglak Lee, and Minmin Chen. ICLR, 2021.
- Model-Based Visual Planning with Self-Supervised Functional Distances
- Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, and Sergey Levine. ICLR, 2021.
- Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
- Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, and Shixiang Gu. ICLR, 2021.
- Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization
- Lanqing Li, Rui Yang, and Dijun Luo. ICLR, 2021.
- DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs
- Aayam Kumar Shrestha, Stefan Lee, Prasad Tadepalli, and Alan Fern. ICLR, 2021.
- What are the Statistical Limits of Offline RL with Linear Function Approximation?
- Ruosong Wang, Dean Foster, and Sham M. Kakade. ICLR, 2021.
- Reset-Free Lifelong Learning with Skill-Space Planning [website]
- Kevin Lu, Aditya Grover, Pieter Abbeel, and Igor Mordatch. ICLR, 2021.
- Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework
- Chuheng Zhang, Yuanying Cai, Longbo Huang, and Jian Li. AAAI, 2021.
- Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents
- Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, and Tamer Başar. IEEE T AUTOMATIC CONTROL, 2021.
- Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL
- Andrea Zanette. arXiv, 2020.
- Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
- Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, and Mengdi Wang. arXiv, 2020.
- A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting
- Philip Amortila, Nan Jiang, and Tengyang Xie. arXiv, 2020.
- Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient
- Samuele Tosatto, João Carvalho, and Jan Peters. arXiv, 2020.
- Batch Value-function Approximation with Only Realizability
- Tengyang Xie and Nan Jiang. arXiv2020.
- DRIFT: Deep Reinforcement Learning for Functional Software Testing
- Luke Harries, Rebekah Storan Clarke, Timothy Chapman, Swamy V. P. L. N. Nallamalli, Levent Ozgur, Shuktika Jain, Alex Leung, Steve Lim, Aaron Dietrich, José Miguel Hernández-Lobato, Tom Ellis, Cheng Zhang, and Kamil Ciosek. arXiv, 2020.
- Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains
- James Bannon, Brad Windsor, Wenbo Song, and Tao Li. arXiv, 2020.
- Goal-conditioned Batch Reinforcement Learning for Rotation Invariant Locomotion [code]
- Aditi Mavalankar. arXiv, 2020.
- Semi-Supervised Reward Learning for Offline Reinforcement Learning
- Ksenia Konyushkova, Konrad Zolna, Yusuf Aytar, Alexander Novikov, Scott Reed, Serkan Cabi, and Nando de Freitas. arXiv, 2020.
- Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation
- Chaochao Lu, Biwei Huang, Ke Wang, José Miguel Hernández-Lobato, Kun Zhang, and Bernhard Schölkopf. arXiv, 2020.
- Offline Reinforcement Learning from Images with Latent Space Models [website]
- Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, and Chelsea Finn. arXiv, 2020.
- POPO: Pessimistic Offline Policy Optimization
- Qiang He and Xinwen Hou. arXiv, 2020.
- Is Pessimism Provably Efficient for Offline RL?
- Ying Jin, Zhuoran Yang, and Zhaoran Wang. arXiv, 2020.
- Reinforcement Learning with Videos: Combining Offline Observations with Interaction
- Karl Schmeckpeper, Oleh Rybkin, Kostas Daniilidis, Sergey Levine, and Chelsea Finn. arXiv, 2020.
- Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones [website]
- Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg. arXiv, 2020.
- Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
- Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, and Sergey Levine. arXiv, 2020.
- OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning [website]
- Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, and Ofir Nachum. arXiv, 2020.
- Batch Exploration with Examples for Scalable Robotic Reinforcement Learning
- Annie S. Chen, HyunJi Nam, Suraj Nair, and Chelsea Finn. arXiv, 2020.
- Learning Dexterous Manipulation from Suboptimal Experts [website]
- Rae Jeong, Jost Tobias Springenberg, Jackie Kay, Daniel Zheng, Yuxiang Zhou, Alexandre Galashov, Nicolas Heess, and Francesco Nori. arXiv, 2020.
- The Reinforcement Learning-Based Multi-Agent Cooperative Approach for the Adaptive Speed Regulation on a Metallurgical Pickling Line
- Anna Bogomolova, Kseniia Kingsep, and Boris Voskresenskii. arXiv, 2020.
- Offline Meta-Reinforcement Learning with Advantage Weighting
- Eric Mitchell, Rafael Rafailov, Xue Bin Peng, Sergey Levine, and Chelsea Finn. arXiv, 2020.
- Model-Based Offline Planning [video]
- Arthur Argenson and Gabriel Dulac-Arnold. arXiv, 2020.
- Overcoming Model Bias for Robust Offline Deep Reinforcement Learning
- Phillip Swazinna, Steffen Udluft, and Thomas Runkler. arXiv, 2020.
- Offline Meta Learning of Exploration
- Ron Dorfman, Idan Shenfeld, and Aviv Tamar. arXiv, 2020.
- EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
- Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, and Shixiang Shane Gu. arXiv, 2020.
- Hyperparameter Selection for Offline Reinforcement Learning
- Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, and Nando de Freitas. arXiv, 2020.
- Interpretable Control by Reinforcement Learning
- Daniel Hein, Steffen Limmer, and Thomas A. Runkler. arXiv, 2020.
- Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning [code]
- Nathan Kallus and Masatoshi Uehara. arXiv, 2020.
- Accelerating Online Reinforcement Learning with Offline Datasets [website]
- Ashvin Nair, Murtaza Dalal, Abhishek Gupta, and Sergey Levine. arXiv, 2020.
- DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
- Aviral Kumar, Abhishek Gupta, and Sergey Levine. arXiv, 2020.
- Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
- Nathan Kallus and Masatoshi Uehara. NeurIPS, 2020.
- Critic Regularized Regression
- Ziyu Wang, Alexander Novikov, Konrad Zolna, Josh S. Merel, Jost Tobias Springenberg, Scott E. Reed, Bobak Shahriari, Noah Siegel, Caglar Gulcehre, Nicolas Heess, and Nando de Freitas. NeurIPS, 2020
- Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration
- Yao Liu, Adith Swaminathan, Alekh Agarwal, and Emma Brunskill. NeurIPS, 2020.
- Conservative Q-Learning for Offline Reinforcement Learning [website] [code]
- Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. NeurIPS, 2020.
- BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
- Xinyue Chen, Zijian Zhou, Zheng Wang, Che Wang, Yanqiu Wu, and Keith Ross. NeurIPS, 2020.
- MOPO: Model-based Offline Policy Optimization [code]
- Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Y. Zou, Sergey Levine, Chelsea Finn, and Tengyu Ma. NeurIPS, 2020.
- MOReL: Model-Based Offline Reinforcement Learning
- Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, and Thorsten Joachims. NeurIPS, 2020.
- Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation
- Aaron Sonabend, Junwei Lu, Leo Anthony Celi, Tianxi Cai, and Peter Szolovits. NeurIPS, 2020.
- Multi-task Batch Reinforcement Learning with Metric Learning
- Jiachen Li, Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Henrik Christensen, and Hao Su. NeurIPS, 2020.
- Counterfactual Data Augmentation using Locally Factored Dynamics
- Silviu Pitis, Elliot Creager, and Animesh Garg. NeurIPS, 2020. [code]
- On Reward-Free Reinforcement Learning with Linear Function Approximation
- Ruosong Wang, Simon S. Du, Lin Yang, and Russ R. Salakhutdinov. NeurIPS, 2020.
- Constrained Policy Improvement for Safe and Efficient Reinforcement Learning
- Elad Sarafian, Aviv Tamar, and Sarit Kraus. IJCAI, 2020.
- BRPO: Batch Residual Policy Optimization [code]
- Sungryull Sohn, Yinlam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, and Craig Boutilier. IJCAI, 2020.
- From Importance Sampling to Doubly Robust Policy Gradient
- Jiawei Huang and Nan Jiang. ICML, 2020.
- Batch Stationary Distribution Estimation
- Junfeng Wen, Bo Dai, Lihong Li, and Dale Schuurmans. ICML, 2020.
- GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
- Shangtong Zhang, Bo Liu, and Shimon Whiteson. ICML, 2020.
- GenDICE: Generalized Offline Estimation of Stationary Values
- Ruiyi Zhang, Bo Dai, Lihong Li, and Dale Schuurmans. ICLR, 2020.
- Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning
- Noah Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, and Martin Riedmiller. ICLR, 2020.
- Accelerating Reinforcement Learning with Learned Skill Priors
- Karl Pertsch, Youngwoon Lee, and Joseph J. Lim. CoRL, 2020.
- Scaling data-driven robotics with reward sketching and batch reinforcement learning [website]
- Serkan Cabi, Sergio Gómez Colmenarejo, Alexander Novikov, Ksenia Konyushkova, Scott Reed, Rae Jeong, Konrad Zolna, Yusuf Aytar, David Budden, Mel Vecerik, Oleg Sushkov, David Barker, Jonathan Scholz, Misha Denil, Nando de Freitas, and Ziyu Wang. RSS, 2020.
- Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping
- Cristian Bodnar, Adrian Li, Karol Hausman, Peter Pastor, and Mrinal Kalakrishnan. RSS, 2020.
- Defining Admissible Rewards for High Confidence Policy Evaluation in Batch Reinforcement Learning
- Niranjani Prasad, Barbara E Engelhardt, and Finale Doshi-Velez. CHIL, 2020.
- Learning When-to-Treat Policies
- Xinkun Nie, Emma Brunskill, and Stefan Wager. JASA, 2020.
- Batch-Constrained Reinforcement Learning for Dynamic Distribution Network Reconfiguration
- Yuanqi Gao, Wei Wang, Jie Shi, and Nanpeng Yu. IEEE T SMART GRID, 2020.
- Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
- Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, and Rosalind Picard. arXiv, 2019.
- Behavior Regularized Offline Reinforcement Learning
- Yifan Wu, George Tucker, and Ofir Nachum. arXiv, 2019.
- Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift
- Riashat Islam, Komal K. Teru, Deepak Sharma, and Joelle Pineau. arXiv, 2019.
- Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
- Xue Bin Peng, Aviral Kumar, Grace Zhang, and Sergey Levine. arXiv, 2019.
- AlgaeDICE: Policy Gradient from Arbitrary Experience
- Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, and Dale Schuurmans. arXiv, 2019.
- Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction [website] [code]
- Aviral Kumar, Justin Fu, George Tucker, and Sergey Levine. NeurIPS, 2019.
- Off-Policy Deep Reinforcement Learning without Exploration
- Scott Fujimoto, David Meger, and Doina Precup. ICML, 2019.
- Safe Policy Improvement with Baseline Bootstrapping
- Romain Laroche, Paul Trichelair, and Remi Tachet Des Combes. ICML, 2019.
- Information-Theoretic Considerations in Batch Reinforcement Learning
- Jinglin Chen and Nan Jiang. ICML, 2019.
- Batch Recurrent Q-Learning for Backchannel Generation Towards Engaging Agents
- Nusrah Hussain, Engin Erzin, T. Metin Sezgin, and Yucel Yemez. ACII, 2019.
- Safe Policy Improvement with Soft Baseline Bootstrapping
- Kimia Nadjahi, Romain Laroche, and Rémi Tachet des Combes. ECML, 2019.
- Importance Weighted Transfer of Samples in Reinforcement Learning
- Andrea Tirinzoni, Andrea Sessa, Matteo Pirotta, and Marcello Restelli. ICML, 2018.
- Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation [website]
- Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, and Sergey Levine. CoRL, 2018.
- Off-Policy Policy Gradient with State Distribution Correction
- Yao Liu, Adith Swaminathan, Alekh Agarwal, and Emma Brunskill. UAI, 2018.
- Deep Exploration via Bootstrapped DQN
- Ian Osband, Charles Blundell, Alexander Pritzel, and Benjamin Van Roy. NeurIPS, 2016.
- Safe Policy Improvement by Minimizing Robust Baseline Regret
- Mohammad Ghavamzadeh, Marek Petrik, and Yinlam Chow. NeurIPS, 2016.
- Residential Demand Response Applications Using Batch Reinforcement Learning
- Frederik Ruelens, Bert Claessens, Stijn Vandael, Bart De Schutter, Robert Babuska, and Ronnie Belmans. arXiv, 2015.
- Structural Return Maximization for Reinforcement Learning
- Joshua Joseph, Javier Velez, and Nicholas Roy. arXiv, 2014.
- Simultaneous Perturbation Algorithms for Batch Off-Policy Search
- Raphael Fonteneau, and L.A. Prashanth. CDC, 2014.
- Guided Policy Search
- Sergey Levine, and Vladlen Koltun. ICML, 2013.
- Off-Policy Actor-Critic
- Thomas Degris, Martha White, and Richard S. Sutton. ICML, 2012.
- PAC-Bayesian Policy Evaluation for Reinforcement Learning
- Mahdi MIlani Fard, Joelle Pineau, and Csaba Szepesvari. UAI, 2011.
- Tree-Based Batch Mode Reinforcement Learning
- Damien Ernst, Pierre Geurts, and Louis Wehenkel. JMLR, 2005.
- Neural Fitted Q Iteration–First Experiences with a Data Efficient Neural Reinforcement Learning Method
- Martin Riedmiller. ECML, 2005.
- Off-Policy Temporal-Difference Learning with Function Approximation
- Doina Precup, Richard S. Sutton, and Sanjoy Dasgupta. ICML, 2001.
Offline RL: Benchmarks/Experiments/Applications
- DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning
- Xianyuan Zhan, Haoran Xu, Yue Zhang, Yusen Huo, Xiangyu Zhu, Honglei Yin, and Yu Zheng. arXiv, 2021.
- Personalization for Web-based Services using Offline Reinforcement Learning
- Pavlos Athanasios Apostolopoulos, Zehui Wang, Hanson Wang, Chad Zhou, Kittipat Virochsiri, Norm Zhou, and Igor L. Markov. arXiv, 2021.
- NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning [website] [code]
- Rongjun Qin, Songyi Gao, Xingyuan Zhang, Zhen Xu, Shengkai Huang, Zewen Li, Weinan Zhang, and Yang Yu. arXiv, 2021.
- Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation
- Diksha Garg, Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. arXiv, 2020.
- An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare
- Taylor W. Killian, Haoran Zhang, Jayakumar Subramanian, Mehdi Fatemi, and Marzyeh Ghassemi. arXiv, 2020.
- Learning from Human Feedback: Challenges for Real-World Reinforcement Learning in NLP
- Julia Kreutzer, Stefan Riezler, and Carolin Lawrence. arXiv, 2020.
- Remote Electrical Tilt Optimization via Safe Reinforcement Learning
- Filippo Vannella, Grigorios Iakovidis, Ezeddin Al Hakim, Erik Aumayr, and Saman Feghhi. arXiv, 2020.
- Offline Reinforcement Learning Hands-On
- Louis Monier, Jakub Kmec, Alexandre Laterre, Thomas Pierrot, Valentin Courgeau, Olivier Sigaud, Karim Beguir. arXiv, 2020.
- D4RL: Datasets for Deep Data-Driven Reinforcement Learning [code] [website]
- Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. arXiv, 2020.
- RL Unplugged: Benchmarks for Offline Reinforcement Learning [code] [dataset]
- Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, and Nando de Freitas. arXiv, 2020.
- An Optimistic Perspective on Offline Reinforcement Learning [website]
- Rishabh Agarwal, Dale Schuurmans, and Mohammad Norouzi. ICML, 2020.
- Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
- Amin Rakhsha, Goran Radanovic, Rati Devidze, Xiaojin Zhu, and Adish Singla. ICML, 2020.
- Off-policy Learning in Two-stage Recommender Systems
- Jiaqi Ma, Zhe Zhao, Xinyang Yi, Ji Yang, Minmin Chen, Jiaxi Tang, Lichan Hong, and Ed H Chi. WWW, 2020.
- Human-centric Dialog Training via Offline Reinforcement Learning
- Natasha Jaques, Judy Hanwen Shen, Asma Ghandeharioun, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Shane Gu, and Rosalind Picard. EMNLP, 2020.
- Definition and evaluation of model-free coordination of electrical vehicle charging with reinforcement learning
- Nasrin Sadeghianpourhamami, Johannes Deleu, and Chris Develder. IEEE T SMART GRID, 2020.
- Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning
- Hanchen Xu, Alejandro D. Domínguez-García, and Peter W. Sauer. IEEE T POWER SYSTEMS, 2020.
- Benchmarking Batch Deep Reinforcement Learning Algorithms
- Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, and Joelle Pineau. arXiv, 2019.
- Top-K Off-Policy Correction for a REINFORCE Recommender System
- Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed Chi. WSDM, 2019.
- A Clustering-Based Reinforcement Learning Approach for Tailored Personalization of E-Health Interventions
- Ali el Hassouni, Mark Hoogendoorn, Martijn van Otterlo, A. E. Eiben, Vesa Muhonen, and Eduardo Barbaro. arXiv, 2018.
- Generating Interpretable Fuzzy Controllers using Particle Swarm Optimization and Genetic Programming
- Daniel Hein, Steffen Udluft, and Thomas A. Runkler. GECCO, 2018.
- End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient
- Li Zhou, Kevin Small, Oleg Rokhlenko, and Charles Elkan. arXiv, 2017.
- Batch Reinforcement Learning on the Industrial Benchmark: First Experiences
- Daniel Hein, Steffen Udluft, Michel Tokic, Alexander Hentschel, Thomas A. Runkler, and Volkmar Sterzing. IJCNN, 2017.
- Policy Networks with Two-Stage Training for Dialogue Systems
- Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, and Kaheer Suleman. SIGDial, 2016.
- Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning
- Arthur Guez, Robert D. Vincent, Massimo Avoli, and Joelle Pineau. IAAI, 2008.
Off-Policy Evaluation: Theory/Methods
Contextual Bandits
- Piecewise-Stationary Off-Policy Optimization
- Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, and Amr Ahmed. AISTATS, 2021.
- Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting [video]
- Ilja Kuzborskij, Claire Vernade, András György, and Csaba Szepesvári. AISTATS, 2021.
- High-Confidence Off-Policy (or Counterfactual) Variance Estimation
- Yash Chandak, Shiv Shankar, and Philip S. Thomas. AAAI, 2021.
- Learning from eXtreme Bandit Feedback
- Romain Lopez, Inderjit Dhillon, and Michael I. Jordan. AAAI, 2021.
- Off-Policy Evaluation of Slate Policies under Bayes Risk
- Nikos Vlassis, Fernando Amat Gil, and Ashok Chandrashekar. arXiv, 2021.
- Doubly Robust Off-Policy Learning on Low-Dimensional Manifolds by Deep Neural Networks
- Minshuo Chen, Hao Liu, Wenjing Liao, and Tuo Zhao. arXiv, 2020.
- Bandit Overfitting in Offline Policy Learning
- David Brandfonbrener, William F. Whitney, Rajesh Ranganath, and Joan Bruna. arXiv, 2020.
- Counterfactual Learning of Continuous Stochastic Policies
- Houssam Zenati, Alberto Bietti, Matthieu Martin, Eustache Diemert, and Julien Mairal. arXiv, 2020.
- Optimal Off-Policy Evaluation from Multiple Logging Policies [code]
- Nathan Kallus, Yuta Saito, and Masatoshi Uehara. arXiv, 2020.
- A Practical Guide of Off-Policy Evaluation for Bandit Problems
- Masahiro Kato, Kenshi Abe, Kaito Ariu, and Shota Yasui. arXiv, 2020.
- Off-Policy Evaluation and Learning for External Validity under a Covariate Shift
- Masatoshi Uehara, Masahiro Kato, and Shota Yasui. NeurIPS, 2020.
- Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions
- James McInerney, Brian Brost, Praveen Chandar, Rishabh Mehrotra, and Ben Carterette. KDD, 2020.
- Off-policy Bandits with Deficient Support
- Noveen Sachdeva, Yi Su, and Thorsten Joachims. KDD, 2020.
- Doubly robust off-policy evaluation with shrinkage
- Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dudik. ICML, 2020.
- Adaptive Estimator Selection for Off-Policy Evaluation
- Yi Su, Pavithra Srinath, and Akshay Krishnamurthy. ICML, 2020.
- Off-policy Bandit and Reinforcement Learning
- Yusuke Narita, Shota Yasui, and Kohei Yata. arXiv, 2020.
- Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits
- Nian Si, Fan Zhang, Zhengyuan Zhou, and Jose Blanchet. ICML, 2020.
- Efficient Policy Learning from Surrogate-Loss Classification Reductions [code]
- Andrew Bennett and Nathan Kallus. ICML, 2020.
- More Efficient Policy Learning via Optimal Retargeting
- Nathan Kallus. JASA, 2020.
- Semi-Parametric Efficient Policy Learning with Continuous Actions
- Victor Chernozhukov, Mert Demirer, Greg Lewis, and Vasilis Syrgkanis. NeurIPS, 2019.
- Balanced Off-Policy Evaluation in General Action Spaces
- Arjun Sondhi, David Arbour, and Drew Dimmery. AISTATS, 2019.
- Policy Evaluation with Latent Confounders via Optimal Balance
- Andrew Bennett and Nathan Kallus. NeuIPS, 2019.
- Focused Context Balancing for Robust Offline Policy Evaluation
- Hao Zou, Kun Kuang, Boqi Chen, Peixuan Chen, and Peng Cui. KDD, 2019.
- On the Design of Estimators for Bandit Off-Policy Evaluation
- Nikos Vlassis, Aurelien Bibaut, Maria Dimakopoulou, and Tony Jebara. ICML, 2019.
- CAB: Continuous Adaptive Blending for Policy Evaluation and Learning
- Yi Su, Lequn Wang, Michele Santacatterina, and Thorsten Joachims. ICML, 2019.
- Efficient Counterfactual Learning from Bandit Feedback
- Yusuke Narita, Shota Yasui, and Kohei Yata. AAAI, 2019.
- Policy Evaluation and Optimization with Continuous Treatments
- Nathan Kallus and Angela Zhou. AISTATS, 2019.
- Offline Evaluation of Ranking Policies with Click Models
- Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, and Zheng Wen. KDD, 2018.
- Effective Evaluation using Logged Bandit Feedback from Multiple Loggers
- Aman Agarwal, Soumya Basu, Tobias Schnabel, and Thorsten Joachims. KDD, 2018.
- Deep Learning with Logged Bandit Feedback
- Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. ICLR, 2018.
- Off-policy Evaluation for Slate Recommendation
- Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, and Imed Zitouni. NeurIPS, 2017.
- Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
- Yu-Xiang Wang, Alekh Agarwal, and Miroslav Dudik. ICML, 2017.
- Data-Efficient Policy Evaluation Through Behavior Policy Search
- Josiah P. Hanna, Philip S. Thomas, Peter Stone, and Scott Niekum. ICML, 2017.
- The Self-Normalized Estimator for Counterfactual Learning
- Adith Swaminathan and Thorsten Joachims. NeurIPS, 2015.
- Doubly Robust Policy Evaluation and Optimization
- Miroslav Dudík, Dumitru Erhan, John Langford, and Lihong Li. ICML, 2011.
- Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
- Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. WSDM, 2011.
Reinforcement Learning
- Sequential causal inference in a single world of connected units
- Aurelien Bibaut, Maya Petersen, Nikos Vlassis, Maria Dimakopoulou, and Mark van der Laan, arXiv, 2021.
- Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders
- Andrew Bennett, Nathan Kallus, Lihong Li, and Ali Mousavi. AISTATS, 2021.
- Bootstrapping Statistical Inference for Off-Policy Evaluation
- Botao Hao, Xiang (Jack)Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, and Mengdi Wang. arXiv, 2021.
- Average-Reward Off-Policy Policy Evaluation with Function Approximation
- Shangtong Zhang, Yi Wan, Richard S. Sutton, and Shimon Whiteson. arXiv, 2021.
- Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning
- Ming Yin, Yu Bai, and Yu-Xiang Wang. arXiv, 2020.
- Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies
- Jinlin Lai, Lixin Zou, and Jiaxing Song. arXiv, 2020.
- Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning
- Rahul Singh, Liyuan Xu, and Arthur Gretton. arXiv, 2020.
- Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
- Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, and Emma Brunskill. NeurIPS, 2020.
- CoinDICE: Off-Policy Confidence Interval Estimation
- Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvari, and Dale Schuurmans. NeurIPS, 2020.
- Off-Policy Interval Estimation with Lipschitz Value Iteration
- Ziyang Tang, Yihao Feng, Na Zhang, Jian Peng, and Qiang Liu. NeurIPS, 2020.
- Off-Policy Evaluation via the Regularized Lagrangian
- Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, and Dale Schuurmans. NeurIPS, 2020.
- Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
- Nan Jiang and Jiawei Huang. NeurIPS, 2020.
- Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation
- Ilya Kostrikov and Ofir Nachum. arXiv, 2020.
- Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control [video]
- Bingqing Chen, Ming Jin, Zhe Wang, Tianzhen Hong, and Mario Bergés, RLEM, 2020.
- Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies
- Xinyun Chen, Lu Wang, Yizhe Hang, Heng Ge, and Hongyuan Zha. ICLR, 2020.
- Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
- Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, and Qiang Liu. ICLR, 2020.
- Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning
- Ali Mousavi, Lihong Li, Qiang Liu, and Denny Zhou. ICLR, 2020.
- Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
- Yaqi Duan, Zeyu Jia, and Mengdi Wang. ICML, 2020.
- Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
- Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Celi, Emma Brunskill, and Finale Doshi-Velez. ICML, 2020.
- Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
- Nathan Kallus and Masatoshi Uehara. ICML, 2020.
- Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
- Yao Liu, Pierre-Luc Bacon, and Emma Brunskill. ICML, 2020.
- Minimax Weight and Q-Function Learning for Off-Policy Evaluation
- Masatoshi Uehara, Jiawei Huang, and Nan Jiang. ICML, 2020.
- Accountable Off-Policy Evaluation With Kernel Bellman Statistics
- Yihao Feng, Tongzheng Ren, Ziyang Tang, and Qiang Liu. ICML, 2020.
- Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
- Ming Yin and Yu-Xiang Wang. ICML, 2020.
- Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
- Nathan Kallus and Masatoshi Uehara. arXiv, 2019.
- Off-Policy Evaluation in Partially Observable Environments
- Guy Tennenholtz, Uri Shalit, and Shie Mannor. AAAI, 2019.
- Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
- Nathan Kallus and Masatoshi Uehara. NeurIPS, 2019.
- Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
- Tengyang Xie, Yifei Ma, and Yu-Xiang Wang. NeuIPS, 2019.
- Off-Policy Evaluation via Off-Policy Classification
- Alexander Irpan, Kanishka Rao, Konstantinos Bousmalis, Chris Harris, Julian Ibarz, and Sergey Levine. NeuIPS, 2019.
- Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy
- Yuan Xie, Boyi Liu, Qiang Liu, Zhaoran Wang, Yuan Zhou, and Jian Peng. ICLR, 2019.
- More Efficient Off-Policy Evaluation through Regularized Targeted Learning
- Aurelien Bibaut, Ivana Malenica, Nikos Vlassis, and Mark Van Der Laan. ICML, 2019.
- Combining parametric and nonparametric models for off-policy evaluation
- Omer Gottesman, Yao Liu, Scott Sussex, Emma Brunskill, and Finale Doshi-Velez. ICML, 2019.
- Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models
- Michael Oberst and David Sontag. ICML, 2019.
- Importance Sampling Policy Evaluation with an Estimated Behavior Policy
- Josiah Hanna, Scott Niekum, and Peter Stone. ICML, 2019.
- When People Change their Mind: Off-Policy Evaluation in Non-Stationary Recommendation Environments
- Rolf Jagerman, Ilya Markov, and Maarten de Rijke. WSDM, 2019.
- Representation Balancing MDPs for Off-policy Policy Evaluation
- Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo A. Faisal, Finale Doshi-Velez, and Emma Brunskill. NeuIPS, 2018.
- Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
- Qiang Liu, Lihong Li, Ziyang Tang, and Dengyong Zhou. NeuIPS, 2018.
- Confounding-Robust Policy Improvement
- Nathan Kallus and Angela Zhou. NeuIPS, 2018.
- Balanced Policy Evaluation and Learning
- Nathan Kallus. NeuIPS, 2018.
- More Robust Doubly Robust Off-policy Evaluation
- Mehrdad Farajtabar, Yinlam Chow, and Mohammad Ghavamzadeh. ICML, 2018.
- Importance Sampling for Fair Policy Selection
- Shayan Doroudi, Philip Thomas, and Emma Brunskill. UAI, 2017.
- Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing
- Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh, Ishan Durugkar, and Emma Brunskill. AAAI, 2017.
- Consistent On-Line Off-Policy Evaluation
- Assaf Hallak and Shie Mannor. ICML, 2017.
- Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
- Josiah P. Hanna, Peter Stone, and Scott Niekum. AAAMS, 2016.
- Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
- Nan Jiang and Lihong Li. ICML, 2016.
- Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
- Philip Thomas and Emma Brunskill. ICML, 2016.
- High Confidence Off-Policy Evaluation
- Philip S. Thomas, Georgios Theocharous, and Mohammad Ghavamzadeh. AAAI, 2015.
- Eligibility Traces for Off-Policy Policy Evaluation
- Doina Precup, Richard S. Sutton, and Satinder P. Singh. ICML, 2000.
Off-Policy Evaluation: Benchmarks/Experiments/Applications
- Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach
- Haoming Jiang, Bo Dai, Mengjiao Yang, Wei Wei, and Tuo Zhao. arXiv, 2021.
- Benchmarks for Deep Off-Policy Evaluation [code]
- Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, ziyu wang, Alexander Novikov, Mengjiao Yang, Michael R Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, and Thomas Paine. ICLR, 2021.
- Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation [software] [public dataset]
- Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita. arXiv, 2020.
- Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling
- Randell Cotta, Dan Jiang, Mingyang Hu, and Peizhou Liao. WSDM, 2019.
- Offline Evaluation to Make Decisions About Playlist Recommendation
- Alois Gruson, Praveen Chandar, Christophe Charbuillet, James McInerney, Samantha Hansen, Damien Tardieu, and Ben Carterette. WSDM, 2019.
- Evaluating Reinforcement Learning Algorithms in Observational Health Settings
- Omer Gottesman, Fredrik Johansson, Joshua Meier, Jack Dent, Donghun Lee, Srivatsan Srinivasan, Linying Zhang, Yi Ding, David Wihl, Xuefeng Peng, Jiayu Yao, Isaac Lage, Christopher Mosch, Li-wei H. Lehman, Matthieu Komorowski, Matthieu Komorowski, Aldo Faisal, Leo Anthony Celi, David Sontag, and Finale Doshi-Velez. arXiv, 2018.
- Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems
- Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, and Fernando Diaz. CIKM, 2018.
- Offline A/B testing for Recommender Systems
- Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. WSDM, 2018.
- Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback
- Ben Carterette and Praveen Chandar. SIGIR, 2018.
Open Source Software / Implementations
- Open Bandit Pipeline: a research framework for bandit algorithms and off-policy evaluation [paper] [documentation] [public dataset]
- Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita.
- d3rlpy: A data-driven deep reinforcement learning library as an out-of-the-box tool [website] [documentation]
- Takuma Seno.
- MINERVA: An out-of-the-box GUI tool for data-driven deep reinforcement learning [website] [documentation]
- Takuma Seno.
- Benchmarks for Deep Off-Policy Evaluation [paper]
- Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, ziyu wang, Alexander Novikov, Mengjiao Yang, Michael R Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, and Thomas Paine.
- D4RL: Datasets for Deep Data-Driven Reinforcement Learning [paper] [website]
- Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine.
- RL Unplugged: Benchmarks for Offline Reinforcement Learning [paper] [dataset]
- Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, and Nando de Freitas.
- NeoRL: Near Real-World Benchmarks for Offline Reinforcement Learning [paper] [website]
- Rongjun Qin, Songyi Gao, Xingyuan Zhang, Zhen Xu, Shengkai Huang, Zewen Li, Weinan Zhang, and Yang Yu.
- RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising [paper]
- David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou.
- MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces [paper] [documantation]
- Marlesson R. O. Santana, Luckeciano C. Melo, Fernando H. F. Camargo, Bruno Brandão, Anderson Soares, Renan M. Oliveira, and Sandor Caetano.
Related Workshops
- Reinforcement Learning Day 2021
- Offline Reinforcement Learning Workshop (NeurIPS 2020)
- Reinforcement Learning from Batch Data and Simulation
- Virtual Conference on Reinforcement Learning for Real Life (RL4RealLife 2020)
- Safety and Robustness in Decision Making (NeurIPS 2019)
Tutorials/Talks/Lectures
- Offline RL
- Nando de Freitas. NeurIPS2020 OfflineRL Workshop.
- Data Scalability for Robot Learning
- Chelsea Finn. RI Seminar2020.
- Learning a Multi-Agent Simulator from Offline Demonstrations
- Brandyn White. NeurIPS2020 OfflineRL Workshop.
- Towards Reliable Validation and Evaluation for Offline RL
- Nan Jiang. NeurIPS2020 OfflineRL Workshop.
- Batch RL Models Built for Validation
- Finale Doshi-Velez. NeurIPS2020 OfflineRL Workshop.
- Offline Reinforcement Learning: From Algorithms to Practical Challenges
- Aviral Kumar and Sergey Levine. NeurIPS2020.
- Statistically Efficient Offline Reinforcement Learning
- Nathan Kallus. ARL Seminor2020.
- Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning
- Yu-Xiang Wang. RL Theory Seminar2020.
- Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
- Mengdi Wang. RL Theory Seminar2020.
- Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL
- Andrea Zanette. RL Theory Seminar2020.
- Beyond the Training Distribution: Embodiment, Adaptation, and Symmetry
- Chelsea Finn. EI Seminar2020.
- Combining Statistical methods with Human Input for Evaluation and Optimization in Batch Settings
- Finale Doshi-Velez. NeurIPS2019 Workshop on Safety and Robustness in Decision Making.
- Efficiently Breaking the Curse of Horizon with Double Reinforcement Learning
- Nathan Kallus. NeurIPS2019 Workshop on Safety and Robustness in Decision Making.
- Scaling Probabilistically Safe Learning to Robotics
- Scott Niekum. NeurIPS2019 Workshop on Safety and Robustness in Decision Making.