Reward Function

Reward Function for Rotation Environment

The reward function for the rotation environment is found in the file RotationEnvironment.py in the function reward_function.

At each step, the agent’s reward is determined by this reward function.

First, we calculate ∆θA (goal_difference_after), ∆θB (goal_difference_before), and ∆θ (delta_changes) where the angles they measure are illustrated in the image below.

If delta_changes falls below a predefined noise tolerance threshold, the agent receives a penalty of -1 because there was negligible valve movement.

Conversely, if goal_difference_after is within precision_tolerance, indicating task completion, the agent earns a substantial reward of +10.

For all other cases, the agent’s reward is delta_changes divided by goal_difference_before, encouraging movements in the right direction. This reward scheme is designed to motivate the agent even when starting far from the goal, preventing it from becoming stuck and promoting progress toward successful task completion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward Function

Reward Function for Rotation Environment

Clone this wiki locally