Deep Reinforcement Learning Algorithms
This repository contains implementations of various deep reinforcement learning algorithms completed as part of the Spring 2017 offering of CS 294-112, UC Berkeley's Deep Reinforcement Learning course.
Disclaimer: The code contained in this repository may or may not relate to coursework in future offerings of CS 294-112. The implementations here are provided for educational purposes only; if you are a student in the course, I highly suggest attempting the problems yourself.
Dependencies
The dependencies of the algorithms include:
- TensorFlow
- Keras
- NumPy
- OpenAI Gym
- MuJoCo [Paid library, but there is a free student license]
HW1: Imitation Learning and DAgger on MuJoCo
I implemented behavior cloning on multiple MuJoCo environments. Expert policies produce rollouts that are used as training data for a feedforward neural network. In addition to normal behavior cloning, I also implemented the DAgger algorithm, which performs significantly better. Finally, I varied the number of rollouts used to train the agent, and observed that more rollouts as training data produces better results, as expected.
HW2: Policy Iteration and Value Iteration for Markov Decision Processes (MDPs)
This is a fairly straightforward implementation of Policy Iteration and Value Iteration on a simple gridworld environment.
HW3: Deep Q-Networks on Atari Games
I implemented the DQN algorithm on the Pong Atari environment in the OpenAI Gym. Using pixel data gives better results than using only RAM data.
HW4: Policy Gradients
I extended the existing discrete Policy Gradients algorithm to Pendulum on OpenAI Gym, a continuous environment. In addition, I used a neural network to learn the value function.
Final Project
The code for this project has not been released yet, but my writeup can be found here.