You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: #1712
## Problem
Torchrec comm_ops do pipelining logic for forwards and backwards using Awaitables and Custom Autograd Functions.
Custom Autograd Functions are not fully supported by dynamo and have a lot of limitations, pipelining logic is not traceable right now.
Legacy torch.distributed collectives are not traceable by dynamo
## Solution
1/ Adding syncronous path without pipelining logic for dynamo compilation.
NoWait()
2/ Using traceable functional_collectives instead of legacy collectives
3/ functional_collectives do not have Autograd formulas in pytorch as they are not differentiable.
adding Autograd formulas with BC check in torchrec/distributed/comm_ops.py
The dispatch is called below Autograd and dynamo will see them as a leafs in the graph and will not trace through
4/ dist.distributed_c10d._get_default_group() is not traceable right now => test specifies PG explicitly.
Changed rank/world_size from dist.get_world_size to pg.size() and pg.rank() as they are traceable by PGVariable.
5/ Syntactic changes for dynamo:
Dynamo does not support collection generators => replace with for each range() etc.
SymInts do not support divmod => replacing with // and %
Reviewed By: joshuadeng
Differential Revision: D53707387
fbshipit-source-id: 6c4febf68471cb71da65973d1e4ff6e82eeb94d4
0 commit comments