Code for the paper "Scalable Model-based Policy Optimization for Decentralized Networked Systems" in IROS 2022 arXiv
- DMPO (Our method)
- DPPO (Decentralized PPO)
- CPPO (Centralized PPO)
- IC3Net
- CACC Catchup
- CACC Slowdown
- Ring Attenuation
- Figure Eight
- ATSC Grid
- ATSC Monaco
- UAVFC
- SUMO installation. Version 1.11.0
The commit number of SUMO, available at https://github.com/eclipse/sumo used to run the results is 2147d155b1. To install SUMO, you are recommended to refer to https://sumo.dlr.de/docs/Installing/Linux_Build.html to install the specific version via repository checkout. Note that the latest version of SUMO is not compatible with Flow environments. In brief, after you checkout to that version, run the following command to build the SUMO binaries.
sudo apt-get install cmake python g++ libxerces-c-dev libfox-1.6-dev libgdal-dev libproj-dev libgl2ps-dev swig
cd <sumo_dir> # please insert the correct directory name here
export SUMO_HOME="$PWD"
mkdir build/cmake-build && cd build/cmake-build
cmake ../..
make -j$(nproc)
After building, you need to manually ad the bin folder into your path:
export PATH=$PATH:$SUMO_HOME/bin
- Setting up the environment.
It's recommended to set up the environment via Anaconda. The environment specification is in environment.yml. After installing the required packages, run
export PYTHONPATH="$SUMO_HOME/tools:$PYTHONPATH"
in terminal to include the SUMO python packages.
- Setting up WandB.
Our code uses WandB as logger. Before running our code, you should log in to WandB locally. Please refer to https://docs.wandb.ai/quickstart for more detail.
python launcher.py --env ENV --algo ALGO --name NAME --para PARA
ENV
specifies which environment to run in, including eight
, ring
, catchup
, slowdown
, Grid
, Monaco
, UAV_9d
.
ALGO
specifies the algorithm to use, including IC3Net
, CPPO
, DPPO
, DMPO
.
NAME
is the additional name for the logger, which is set to ''
as default.
PARA
is the hyperparameter json string. The default parameters are loaded from config folder, and this would override specific parameters.
This is the network structure of 5*5 intersections in ATSC-Grid. The highlighted areas of the red frame are shown in the following videos and they are also the key area for displaying execution result in DPPO and DMPO. Through the key indicator "Insertion-backlogged vehicles" in the numerical panel on the left, we can observe that "Insertion-backlogged vehicles" will gradually increase with the load of the traffic flow, and then gradually decrease with the effective decision of the traffic light. The maximum value of this indicator is 1486 vehs in DPPO and 1033 vehs in DMPO, which indicates that DMPO can reduce the backlog of vehicles at intersections. Therefore, compared with DPPO, DMPO is more effective in solving traffic jams. Through the three intersections we focused on in the video, we can also observe that DMPO can make traffic jams last for less time.
Execution.result.of.DPPO.in.ATSC-Grid.mp4
Execution.result.of.DMPO.in.ATSC-Grid.mp4
This is a more challenging scenario with a heterogeneous network structure with diverse action and observation spaces: ATSC-Monaco traffic network with 28 intersections of real traffic network in Monaco city. The highlighted areas of the red frame are shown in the following videos and they are also the key area for displaying execution result in DPPO and DMPO. Through the key indicator "arrived vehicles" in the numerical panel on the left, we can observe that " arrived vehicles " will gradually increase with the load of the traffic flow. The maximum value of this indicator is 563 vehs in DPPO and 752 vehs in DMPO, which indicates that DMPO can reduce the backlog of vehicles at intersections and allow more vehicles to arrive at their destinations. Therefore, compared with DPPO, DMPO is more effective in solving traffic jams. Through the three intersections we focused on in the video, we can also observe that DMPO can make traffic jams last for less time.
Execution.result.of.DPPO.in.ATSC-Monaco.mp4
Execution.result.of.DMPO.in.ATSC-Monaco.mp4
This is a 5×5 formation grid of UAV swarm. The objective of UAVFC is to reach the destination, avoid forest and maintain formation. We tested the performance of different algorithms under the same scenario. From the following videos, we can observe that the swarm of UAVs all approached their destination, avoided the trees and maintained the formation in DMPO, but in this process there are some UAVs out of the formation in CPPO.