# Ensure that the Python version is 3.10
pip install --editable ./third_party/torchquantum
pip install quarkstudio==7.0.5
pip install gymnasium[box2d]==0.29.1
We offer a user-friendly Python script and accompanying configuration files to facilitate training hybrid quantum-classical models in diverse reinforcement learning environments.
python main.py <config_file_name>
- Replace
<config_file_name>
with the desired environment from the./config
directory or create a custom configuration of your own.
Parameter | Description | Example Value |
---|---|---|
env_name |
Name of the reinforcement learning environment. | LunarLander-v2 |
n_steps |
Number of steps per environment per update. | 1024 |
mini_batch_size |
Size of the mini-batch. | 64 |
max_train_steps |
Maximum number of training steps. | 1,750,000 |
lr_a |
Learning rate for the actor network. | 0.003 |
lr_c |
Learning rate for the critic network. | 0.0003 |
gamma |
Discount factor. | 0.999 |
lamda |
GAE parameter. | 0.98 |
epsilon |
PPO clip parameter. | 0.2 |
K_epochs |
Number of PPO epochs. | 4 |
entropy_coef |
Entropy coefficient. | 0.01 |
num_envs |
Number of environments to run in parallel. | 16 |
n_blocks |
Number of blocks in the quantum reinforcement learning network. | 1 |
n_wires |
Number of qubits in the quantum circuit. | 4 |
use_quafu |
Specify whether to use Quafu quantum hardware | True |
key |
Token required for accessing Quafu cloud quantum hardware | ' ' |
Training results can be visualized using TensorBoard:
tensorboard --logdir=./runs
Benchmark reinforcement learning environments have been successfully solved using PPO-Q, as illustrated in the following table and figures.
Environment | State Space Dimension | Action Space Dimension |
---|---|---|
CartPole | 4 | 2 |
MountainCar | 2 | 3 |
Acrobot | 6 | 3 |
LunarLander | 8 | 4 |
MountainCar(C) | 2 | 1 |
Pendulum | 3 | 1 |
LunarLander(C) | 8 | 2 |
BipedalWalker | 24 | 4 |
CartPole | Acrobot | LunarLander |
---|---|---|
MountainCarC | Pendulum | BipedalWalker |
---|---|---|
arxiv is coming soon!