You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm opening this PR to keep track of the work needed to port the content of the #996 PR to the main branch.
The idea is to split that PR (which is huge and based on a quite old version of the codebase) and, starting from the current state of the main branch, port its main elements in smaller PRs.
I'll keep this issue updated as I work on this.
Many changes are not strictly related to supporting distributed training but may benefit Avalanche in general.
I'm starting with porting the modernized object detection/segmentation dataset, strategies, and metrics. I'll also port the generalized batch collate functionality.
🔲 Strategy templates: wrap various lifecycle methods to allow for seamless support of distributed training
Implementations should now be in _backward(), _forward(), ... while wrapping happens in backward, forward. Wrapper methods should be final, but Python is not strict on this (flexibility).
I'm opening this PR to keep track of the work needed to port the content of the #996 PR to the main branch.
The idea is to split that PR (which is huge and based on a quite old version of the codebase) and, starting from the current state of the main branch, port its main elements in smaller PRs.
I'll keep this issue updated as I work on this.
Many changes are not strictly related to supporting distributed training but may benefit Avalanche in general.
Changes in Distributed Training PR #996:
Legend:
Base elements
Strategy e plugins
supports_distributed
plugin field (Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370)_distributed_check
strategy field and related_check_distributed_training_compatibility()
check (Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370)wrap_distributed_model
strategy lifecycle method. Called from..._observation.py
_obtain_common_dataloader_parameters
strategy method (unrelated to distributed training) (Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370)_obtain_common_dataloader_parameters
(unrelated to distributed training) (Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370)_backward()
,_forward()
, ... while wrapping happens inbackward
,forward
. Wrapper methods should be final, but Python is not strict on this (flexibility).Models
avalanche_forward
, generalize usingis_multi_task_module
to consider DDP wrapping (Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370)Detection
Data Loader
Loggers and metrics
evaluator
constructor parameter (evaluator=default_evaluator()
->evaluator=default_evaluator
). (Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370)evaluator
parameter value to use a factory. (Add base elements to support distributed comms. Add supports_distributed plugin flag. #1370)Unit tests
Typing
The text was updated successfully, but these errors were encountered: