Decentralized Reinforcement Learning
This is the code complementing the paper Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions by Michael Chang, Sid Kaushik, Matt Weinberg, Tom Griffiths, and Sergey Levine, accepted to the International Conference on Machine Learning, 2020.
Check out the accompanying blog post.
Set the PYTHONPATH:
Create a conda environment with python version 3.6.
pip install -r requirements.txt. This should also install
For the TwoRooms environment, comment out
if self.step_count >= self.max_steps: done = True
gym_minigrid/minigrid.py in your gym-minigrid installation. By handling time-outs on the algorithm side rather than the environment side, we can treat the environment as an infinite-horizon problem. Otherwise, we'd have to put the time-step into the state to preserve the Markov property.
For GPU, set OMP_NUM_THREADS to 1:
python runner.py --<experiment-name> to print out example commands for the environments in the paper. Add the
--for-real flag to run those commands. You can enable parallel data collection with the
--parallel_collect flag. You can also specify the gpu ids. As examples, in
runner.py, the methods that launch
duality do not use gpu while the others use gpu 0.
For the TwoRooms environment, you would need to pre-train the subpolicies first. Then you would need to specify the expriment folders for training the society using the pre-trained primitives. Instructions are in
You can view the training curves in
<exp_folder>/<seed_folder>/group_0/<env-name>_train/quantitative and you can view visualizations (for environments that have image observations) in
The PPO update is based on this repo.