-
Notifications
You must be signed in to change notification settings - Fork 0
Reward Function
The reward function for the rotation environment is found in the file RotationEnvironment.py
in the function reward_function
.
At each step, the agent’s reward is determined by this reward function.
First, we calculate ∆θA (goal_difference_after
), ∆θB (goal_difference_before
), and ∆θ (delta_changes
) where the angles they measure are illustrated in the image below.
If delta_changes
falls below a predefined noise tolerance threshold, the agent receives a penalty of -1 because there was negligible valve movement.
Conversely, if goal_difference_after
is within precision_tolerance
, indicating task completion, the agent earns a substantial reward of +10.
For all other cases, the agent’s reward is delta_changes
divided by goal_difference_before
, encouraging movements in the right direction. This reward scheme is designed to motivate the agent even when starting far from the goal, preventing it from becoming stuck and promoting progress toward successful task completion.