Performance Impact of Reward Calculators State Collection #45

dvanbrug · 2024-05-29T18:46:14Z

By default, the RewardCalculator class will collect the current true state of the environment to pass on to each reward calculator subclass.

cage-challenge-4/CybORG/Shared/RewardCalculator.py

Lines 39 to 45 in 313bf33

    
           def calculate_simulation_reward(self, env_controller): 
        
               """Calculates the reward from the environment controller""" 
        
               current_state = env_controller._filter_obs(env_controller.get_true_state(env_controller.INFO_DICT['True'])).data 
        
               action = env_controller.action 
        
               agent_observations = env_controller.observation 
        
               done = env_controller.done 
        
               return self.calculate_reward(current_state, action, agent_observations, done, env_controller.state)

However, neither BlueRewardMachine nor EmptyRewardCalculator use this state as part of their calculate_reward methods. Additionally, the collection of this state is very time consuming, taking up 40-60% of the time associated with stepping through the environment.

If both RewardCalculator subclasses remove this current state collection, performance of the environment is improved dramatically. For example, stepping through 500 steps goes from 12s down to 4s.

The text was updated successfully, but these errors were encountered:

dvanbrug linked a pull request May 29, 2024 that will close this issue

Override unused state collection #46

Open

MitchellKiely self-assigned this May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Impact of Reward Calculators State Collection #45

Performance Impact of Reward Calculators State Collection #45

dvanbrug commented May 29, 2024

Performance Impact of Reward Calculators State Collection #45

Performance Impact of Reward Calculators State Collection #45

Comments

dvanbrug commented May 29, 2024