feat: add SPS for trainer #129

EdanToledo · 2024-11-11T17:02:55Z

What?

Add a steps per second metric to the trainer.

Why?

Useful to know how many optimiser steps are being performed per second.

How?

Multiply the number of updates per eval by the algorithms epoch number of epoch*num minibatch number.

thomashirtz · 2024-11-11T21:11:00Z

Looks good to me!

thomashirtz · 2024-11-12T18:23:58Z

stoix/systems/ppo/anakin/ff_ppo_continuous.py

@@ -511,7 +511,15 @@ def run_experiment(_config: DictConfig) -> float:
        logger.log({"timestep": t}, t, eval_step, LogEvent.MISC)
        if ep_completed:  # only log episode metrics if an episode was completed in the rollout.
            logger.log(episode_metrics, t, eval_step, LogEvent.ACT)


Isn't a little bit ambiguous to call that ACTOR steps_per_seconds as it is capturing the "learn" function ?

I'm not sure i fully understand. In the anakin systems, the "Actors" generate the data within the "learn" function. The amount of data produced / elapsed time. The only real issue is that the elapsed time includes the learn step so the actual actor steps per second would be faster if you only timed that specific chunk of code.

Unless by saying capturing the learn function, you mean this issue:

The only real issue is that the elapsed time includes the learn step so the actual actor steps per second would be faster if you only timed that specific chunk of code.

That was my point, the actor sps include the learning (which include backprop + inferences e.g. within the _actor_loss_fn or the _critic_loss_fn) It makes me think that the actual "acting" time is mostly composed of this learning component than anything else

Yeah you are right. This is actually something necessary to fix. Its going to be annoying since it'll need to be fixed in every single file. Can you make an issue and then we can chat about it there.

I would be happy to help you fix it, we just need to investigate how to do it for one file

awesome. Thanks so much.

EdanToledo added 2 commits November 11, 2024 17:00

feat: add SPS for trainer

30583c8

chore: revert anakin config

42ef99d

EdanToledo linked an issue Nov 11, 2024 that may be closed by this pull request

[FEATURE] Log the trainer time #128

Closed

EdanToledo merged commit 5c48fea into main Nov 12, 2024
3 checks passed

EdanToledo deleted the chore/add_trainer_sps branch November 12, 2024 10:05

thomashirtz reviewed Nov 12, 2024

View reviewed changes

thomashirtz mentioned this pull request Nov 15, 2024

[BUG] Actor SPS includes learning elements #130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add SPS for trainer #129

feat: add SPS for trainer #129

EdanToledo commented Nov 11, 2024

thomashirtz commented Nov 11, 2024

thomashirtz Nov 12, 2024

EdanToledo Nov 13, 2024

EdanToledo Nov 13, 2024

thomashirtz Nov 14, 2024 •

edited

Loading

EdanToledo Nov 14, 2024

thomashirtz Nov 15, 2024

EdanToledo Nov 15, 2024

feat: add SPS for trainer #129

feat: add SPS for trainer #129

Conversation

EdanToledo commented Nov 11, 2024

What?

Why?

How?

thomashirtz commented Nov 11, 2024

thomashirtz Nov 12, 2024

Choose a reason for hiding this comment

EdanToledo Nov 13, 2024

Choose a reason for hiding this comment

EdanToledo Nov 13, 2024

Choose a reason for hiding this comment

thomashirtz Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

EdanToledo Nov 14, 2024

Choose a reason for hiding this comment

thomashirtz Nov 15, 2024

Choose a reason for hiding this comment

EdanToledo Nov 15, 2024

Choose a reason for hiding this comment

thomashirtz Nov 14, 2024 •

edited

Loading