Remove validation frequency from example.

The tutorial falsely indicated that validation frequency would be evaluation frequency. Timestep preprocessors are evaluated each step. Validation frequency refers to validating the specs.
JeanElsner · Dec 4, 2023 · e1a2927 · e1a2927
1 parent 0adab39
commit e1a2927
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 9 deletions.
diff --git a/doc/tutorial.rst b/doc/tutorial.rst
@@ -450,17 +450,14 @@ predefined timestep preprocessors to add a reward.
                                     observation['panda_tcp_pos'])
      return np.clip(1.0 - goal_distance, 0, 1)
 
-   reward = rewards.ComputeReward(
-       goal_reward,
-       validation_frequency=timestep_preprocessor.ValidationFrequency.ALWAYS)
+   reward = rewards.ComputeReward(goal_reward)
 
    panda_env.add_timestep_preprocessors([reward])
 
 ``ComputeReward`` is a timestep preprocessor that computes a reward based on a callable that takes
 an observation and returns a scalar which is added to the timestep. The callable ``goal_reward``
 computes a reward based on the distance between the robot's end-effector and the ball's pose
-observation which we added above. This reward is computed for every timestep. Alternatively rewards
-may also be computed only at the end of an epiode.
+observation which we added above.
 
 
 Domain Randomization

diff --git a/examples/rl_environment.py b/examples/rl_environment.py
@@ -92,10 +92,7 @@ def goal_reward(observation: spec_utils.ObservationValue):
 
   # ComputeReward is a timestep preprocessor that accepts a callable which computes
   # a scalar reward based on the observation and adds it to the timestep.
-  # We configure the validation frequency so this reward is computed for every timestep.
-  reward = rewards.ComputeReward(
-      goal_reward,
-      validation_frequency=timestep_preprocessor.ValidationFrequency.ALWAYS)
+  reward = rewards.ComputeReward(goal_reward)
 
   # Instantiate props
   ball = Ball()