-
Notifications
You must be signed in to change notification settings - Fork 334
[MLH Phase 2] Refactor Logging System #284
base: main
Are you sure you want to change the base?
[MLH Phase 2] Refactor Logging System #284
Conversation
979419f
to
bdf5fd1
Compare
bdf5fd1
to
e694347
Compare
e694347
to
fef448c
Compare
The TensorboardWriter can also log rich images and histograms
Co-authored-by: Grace Omotoso <[email protected]>
Implement TensorboardWriter
if key == "ep" | ||
else f"{stdout_data}{key}: {value}; " | ||
evt_stg.put_scalars( | ||
rolling_btime=rolling_btime, rolling_eta=rolling_eta_secs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I prefer keeping all times in human readable string. We may want to add a VisslEventStorage#put_string or change #put_scalars to #put_values and make it type agnostic.
@@ -126,6 +126,7 @@ config: | |||
# ----------------------------------------------------------------------------------- # | |||
TENSORBOARD_SETUP: | |||
# whether to use tensorboard for the visualization | |||
VISUALIZATION_SAMPLE_PERIOD: -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment what this is.
nit: move comment below USE_TENSORBOARD
@@ -52,7 +52,7 @@ class SSLClassyHookFunctions(Enum): | |||
on_end = auto() | |||
|
|||
|
|||
def default_hook_generator(cfg: AttrDict) -> List[ClassyHook]: | |||
def default_hook_generator(cfg: AttrDict, event_storage) -> List[ClassyHook]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Type hint for event_storage.
@@ -116,6 +116,13 @@ def standard_train_step(task): | |||
sample = next(task.data_iterator) | |||
|
|||
sample = construct_sample_for_model(sample, task) | |||
vis_period = task.config.HOOKS.TENSORBOARD_SETUP.VISUALIZATION_SAMPLE_PERIOD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we comment this code block?
storage = task.event_storage | ||
storage.put_images() | ||
name = f"Model input sample: iteration: {task.iteration_num}" | ||
for idx, vis_img in enumerate(sample["input"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: If batch size is large, this will upload a lot of images -- do we want to offer the option to only sample a small portion of the inputs?
several backends (json, tensorboard, etc) | ||
""" | ||
|
||
def __init__(self, start_iter=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add logic that ensures that we set the start_iter correctly upon loading a checkpoint, I don't see this logic anywherere?
@@ -86,6 +88,7 @@ def __init__( | |||
) | |||
logging.info("Setting up SSL Tensorboard Hook...") | |||
self.tb_writer = tb_writer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need tb_writer anymore in this class? Can we refactor ActivationStatisticsMonitor
to use event_storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirming: none of these methods write to tensorboard anymore and that it's only handled in TensorBoardWriter#Write.
) | ||
logging.info(stdout_data.strip()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I believe we lose the logging to stdout here, which imo we should definitely keep -- super helpful for debugging.
We could create another EventStorageWriter called StdOutWriter and add to event_storage_writers.
] | ||
|
||
@property | ||
def event_storage(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just use self.event_storage internally and externally.
Add a scalar `value` to the `HistoryBuffer` associated with `name`. | ||
""" | ||
history = self._history[name] | ||
value = float(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Do we need this?
@iseessel I see some of the comments are unaddressed, do we need to work on these? Lmk if there is anything I could do! |
Summary: Pull Request resolved: fairinternal/ssl_scaling#284 Reviewed By: odelalleau Differential Revision: D42220017 Pulled By: QuentinDuval fbshipit-source-id: 742419aa859fdbe4bc80f1f9e9f4771fee0f41a2
The new logging system will be based around an event storage for each
SelfSupervisionTask
and events will be logged to their corresponding output device byVisslEventWriter
s like theJsonWriter
andTensorboardWriter