Skip to content

Commit

Permalink
Remove validation frequency from example.
Browse files Browse the repository at this point in the history
The tutorial falsely indicated that validation frequency would be
evaluation frequency. Timestep preprocessors are evaluated each step.
Validation frequency refers to validating the specs.
  • Loading branch information
JeanElsner committed Dec 4, 2023
1 parent 0adab39 commit e1a2927
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 9 deletions.
7 changes: 2 additions & 5 deletions doc/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -450,17 +450,14 @@ predefined timestep preprocessors to add a reward.
observation['panda_tcp_pos'])
return np.clip(1.0 - goal_distance, 0, 1)
reward = rewards.ComputeReward(
goal_reward,
validation_frequency=timestep_preprocessor.ValidationFrequency.ALWAYS)
reward = rewards.ComputeReward(goal_reward)
panda_env.add_timestep_preprocessors([reward])
``ComputeReward`` is a timestep preprocessor that computes a reward based on a callable that takes
an observation and returns a scalar which is added to the timestep. The callable ``goal_reward``
computes a reward based on the distance between the robot's end-effector and the ball's pose
observation which we added above. This reward is computed for every timestep. Alternatively rewards
may also be computed only at the end of an epiode.
observation which we added above.


Domain Randomization
Expand Down
5 changes: 1 addition & 4 deletions examples/rl_environment.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,7 @@ def goal_reward(observation: spec_utils.ObservationValue):

# ComputeReward is a timestep preprocessor that accepts a callable which computes
# a scalar reward based on the observation and adds it to the timestep.
# We configure the validation frequency so this reward is computed for every timestep.
reward = rewards.ComputeReward(
goal_reward,
validation_frequency=timestep_preprocessor.ValidationFrequency.ALWAYS)
reward = rewards.ComputeReward(goal_reward)

# Instantiate props
ball = Ball()
Expand Down

0 comments on commit e1a2927

Please sign in to comment.