RL‐based Goal Selection

(WIP) Goal Selection using Reinforcement Learning

To make use of reinforcement-learning-based goal selection, the following CLIPS-Executive core files need to be loaded in:

goal-rl.clp instead of goal.clp as the corresponding goal template features additional fact slots which are important for RL compatibility.
reinforcementlearning.clp defines the CLIPS rules for RL-based goal selection.
reset-game.clp handles resetting the environment during the training phase. Furthermore it is assumed that the major-cleanup version of the ROS2 Clips Executive is used.

Setting up a Reinforcement Learning Agent

To create a new RL agent, one can largely follow the tutorial on setting up a regular agent without adding a rule for selecting a goal, i.e. setting the mode of a goal from FORMULATED to SELECTED. For that, a python-based RL agent is used which is given the current state of the environment and selects an executable action according to its policy.

Goal Executability

The RL agent employs invalid action masking, filtering out all goals not executable in the current state. For that, it needs to be ensured that each goal fact's is-executable slot is set accordingly. This can be done by adding a rule for each goal checking if the goal's preconditions in the domain are fulfilled and setting the slot value to TRUE if this is the case. When a new goal has been selected all other goals are automatically set back to FALSE so that the custom executability rules can check the new state again.

Environment Reset

During training, the environment needs to be reset at the end of each episode. For that, a well-defined initial state must be set which is done by calling (save-facts reset-save) right after loading all initial domain facts and objects. The saved fact-base is then loaded during each environment reset.
To trigger the reset, the rl-episode-end fact must be asserted which features a success slot indicating if the episode reached a desired state (possible values: TRUE or FALSE).

Custom environment class

While the already existing clips_gym implements most of the necessary gym-functions, a custom environment class is still needed to define the action space and to adjust the existing functions to fit the custom domain (e.g. adding special logging).

CXRL Config

As the underlying multi-robot maskable PPO agent takes a lot of parameters, a config file is needed which sets parameters for the CXRL Node. In the following such a file is shown and the different components are explained:

cxrl_node/custom_agent_rl_node:
  ros__parameters:
    package_dir: "cxrl_ws/src/cx_reinforcement_learning/cx_reinforcement_learning" # Path to cx_reinforcement_learning package

    agent_name: "TestAgent" # Name with which the agent will be saved/loaded
    rl_mode: "TRAINING" # RL-Mode, either "TRAINING", "EVALUATION" or "EXECUTION"
    number_of_robots: 1 

    training:
      retraining: false # if true, existing agent will be trained further; if false, a new agent will be generated
      max_episodes: 500 # maximum number of episodes per training session
      timesteps: 100000000 # maximum number of timesteps per training session

    env:
      entrypoint: <REFERENCE TO CUSTOM ENVIRONMENT CLASS>

    model:
      learning_rate: 0.0003 
      gamma: 0.99
      gae_lambda: 0.95
      ent_coef: 0.0
      vf_coef: 0.5
      max_grad_norm: 0.5
      batch_size: 64
      n_steps: 50
      seed: 42
      verbose: 1
      time_based: false # For now keep at false
      n_time: 300
      deadzone: 5
      wait_for_all_robots: false # Wait for all robots to finish their goal before updating the policy

In this tutorial it is assumed that the config is saved as rl-config.yaml in the params folder of the custom agent's package.

Adding CXRL node to launchfile

To start the CXRL node, one can simply add the following lines to the launch_with_context function in the launchfile:

 custom_agent_dir = get_package_share_directory(<NAME OF CUSTOM AGENT PACKAGE>)
 rl_config = os.path.join(custom_agent_dir, 'params', 'rl-config.yaml')
 cxrl_node = Node(
        package='cx_reinforcement_learning',
        executable='cxrl_node',
        namespace='cxrl_node',
        name='custom_agent_rl_node',
        output='screen',
        emulate_tty=True,
        parameters= [rl_config]
    )
 return [cx_node, cxrl_node] # Add "cxrl_node" to already existing return list

Training the Agent

Executing the Agent

(COMING SOON)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly