Skip to content

Proximal Policy Optimization with Parametrized Quantum Policies or Values

License

Notifications You must be signed in to change notification settings

BAQIS-Quantum/PPO-Q

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PPO-Q: Proximal Policy Optimization with Parametrized Quantum Policies or Values

Setup

# Ensure that the Python version is 3.10
pip install --editable ./third_party/torchquantum
pip install quarkstudio==7.0.5
pip install gymnasium[box2d]==0.29.1

Usage

We offer a user-friendly Python script and accompanying configuration files to facilitate training hybrid quantum-classical models in diverse reinforcement learning environments.

python main.py <config_file_name>
  • Replace <config_file_name> with the desired environment from the ./config directory or create a custom configuration of your own.

Description of Configuration Parameters

Parameter Description Example Value
env_name Name of the reinforcement learning environment. LunarLander-v2
n_steps Number of steps per environment per update. 1024
mini_batch_size Size of the mini-batch. 64
max_train_steps Maximum number of training steps. 1,750,000
lr_a Learning rate for the actor network. 0.003
lr_c Learning rate for the critic network. 0.0003
gamma Discount factor. 0.999
lamda GAE parameter. 0.98
epsilon PPO clip parameter. 0.2
K_epochs Number of PPO epochs. 4
entropy_coef Entropy coefficient. 0.01
num_envs Number of environments to run in parallel. 16
n_blocks Number of blocks in the quantum reinforcement learning network. 1
n_wires Number of qubits in the quantum circuit. 4
use_quafu Specify whether to use Quafu quantum hardware True
key Token required for accessing Quafu cloud quantum hardware ' '

Training results can be visualized using TensorBoard:

tensorboard --logdir=./runs

Results

Benchmark reinforcement learning environments have been successfully solved using PPO-Q, as illustrated in the following table and figures.

Environment State Space Dimension Action Space Dimension
CartPole 4 2
MountainCar 2 3
Acrobot 6 3
LunarLander 8 4
MountainCar(C) 2 1
Pendulum 3 1
LunarLander(C) 8 2
BipedalWalker 24 4
CartPole Acrobot LunarLander
MountainCarC Pendulum BipedalWalker

Citation

arxiv is coming soon!

About

Proximal Policy Optimization with Parametrized Quantum Policies or Values

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages