-
Notifications
You must be signed in to change notification settings - Fork 2
RL‐based Goal Selection
To make use of reinforcement-learning-based goal selection, the following CLIPS-Executive core files need to be loaded in:
- goal-rl.clp instead of goal.clp as the corresponding goal template features additional fact slots which are important for RL compatibility.
- reinforcementlearning.clp defines the CLIPS rules for RL-based goal selection.
- reset-game.clp handles resetting the environment during the training phase. Furthermore it is assumed that the major-cleanup version of the ROS2 Clips Executive is used.
To create a new RL agent, one can largely follow the tutorial on setting up a regular agent without adding a rule for selecting a goal, i.e. setting the mode of a goal from FORMULATED to SELECTED. For that, a python-based RL agent is used which is given the current state of the environment and selects an executable action according to its policy.
The RL agent employs invalid action masking, filtering out all goals not executable in the current state. For that, it needs to be ensured that each goal fact's is-executable
slot is set accordingly. This can be done by adding a rule for each goal checking if the goal's preconditions in the domain are fulfilled and setting the slot value to TRUE if this is the case. When a new goal has been selected all other goals are automatically set back to FALSE so that the custom executability rules can check the new state again.
During training, the environment needs to be reset at the end of each episode. For that, a well-defined initial state must be set which is done by calling (save-facts reset-save)
right after loading all initial domain facts and objects. The saved fact-base is then loaded during each environment reset.
To trigger the reset, the rl-episode-end
fact must be asserted which features a success
slot indicating if the episode reached a desired state (possible values: TRUE or FALSE).
While the already existing clips_gym
implements most of the necessary gym-functions, a custom environment class is still needed to define the action space and to adjust the existing functions to fit the custom domain (e.g. adding special logging).
As the underlying multi-robot maskable PPO agent takes a lot of parameters, a config file is needed which sets parameters for the CXRL Node. In the following such a file is shown and the different components are explained:
cxrl_node/custom_agent_rl_node:
ros__parameters:
package_dir: "cxrl_ws/src/cx_reinforcement_learning/cx_reinforcement_learning" # Path to cx_reinforcement_learning package
agent_name: "TestAgent" # Name with which the agent will be saved/loaded
rl_mode: "TRAINING" # RL-Mode, either "TRAINING", "EVALUATION" or "EXECUTION"
number_of_robots: 1
training:
retraining: false # if true, existing agent will be trained further; if false, a new agent will be generated
max_episodes: 500 # maximum number of episodes per training session
timesteps: 100000000 # maximum number of timesteps per training session
env:
entrypoint: <REFERENCE TO CUSTOM ENVIRONMENT CLASS>
model:
learning_rate: 0.0003
gamma: 0.99
gae_lambda: 0.95
ent_coef: 0.0
vf_coef: 0.5
max_grad_norm: 0.5
batch_size: 64
n_steps: 50
seed: 42
verbose: 1
time_based: false # For now keep at false
n_time: 300
deadzone: 5
wait_for_all_robots: false # Wait for all robots to finish their goal before updating the policy
In this tutorial it is assumed that the config is saved as rl-config.yaml
in the params
folder of the custom agent's package.
To start the CXRL node, one can simply add the following lines to the launch_with_context
function in the launchfile:
custom_agent_dir = get_package_share_directory(<NAME OF CUSTOM AGENT PACKAGE>)
rl_config = os.path.join(custom_agent_dir, 'params', 'rl-config.yaml')
cxrl_node = Node(
package='cx_reinforcement_learning',
executable='cxrl_node',
namespace='cxrl_node',
name='custom_agent_rl_node',
output='screen',
emulate_tty=True,
parameters= [rl_config]
)
return [cx_node, cxrl_node] # Add "cxrl_node" to already existing return list
(COMING SOON)