Deep Reinforcement Learning Algorithms
This repository contains implementations of various deep reinforcement learning algorithms completed as part of the Spring 2017 offering of CS 294-112, UC Berkeley's Deep Reinforcement Learning course.
Disclaimer: The code contained in this repository may or may not relate to coursework in future offerings of CS 294-112. The implementations here are provided for educational purposes only; if you are a student in the course, I highly suggest attempting the problems yourself.
The dependencies of the algorithms include:
HW1: Imitation Learning and DAgger on MuJoCo
I implemented behavior cloning on multiple MuJoCo environments. Expert policies produce rollouts that are used as training data for a feedforward neural network. In addition to normal behavior cloning, I also implemented the DAgger algorithm, which performs significantly better. Finally, I varied the number of rollouts used to train the agent, and observed that more rollouts as training data produces better results, as expected.
HW2: Policy Iteration and Value Iteration for Markov Decision Processes (MDPs)
This is a fairly straightforward implementation of Policy Iteration and Value Iteration on a simple gridworld environment.
HW3: Deep Q-Networks on Atari Games
HW4: Policy Gradients
The code for this project has not been released yet, but my writeup can be found here.