-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add time units to throughput metrics and also secs_per_step metric. #3693
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific use case for secs_per_step?
'time/train': train_wct / self.divider, | ||
'time/val': self.total_eval_wct / self.divider, | ||
'time/total': (train_wct + self.total_eval_wct) / self.divider, | ||
f'time_{self.time_unit}/train': train_wct / self.divider, | ||
f'time_{self.time_unit}/val': self.total_eval_wct / self.divider, | ||
f'time_{self.time_unit}/total': (train_wct + self.total_eval_wct) / self.divider, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just rename time/train
-> time/train/hours
?
Changing log keys is a little annoying to end user... but duplicate logging is more annoying imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvpatel2000 i was a bit worried that would be a breaking change for any downstream pipelines that process metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, it's better to avoid overwhelming loggers
secs_per_step = 0 | ||
if len(self.history_wct) > 1: | ||
secs_per_step = self.history_wct[-1] - self.history_wct[-2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably average over history_wct as all metrics respect window_size
. Something like:
elapsed_wct = (self.history_wct[-1] - self.history_wct[0]) / (len(history-wct)-1)
What does this PR do?
The time units didn't have units, this makes the units explicit and adds a new secs_per_step metric.