@@ -117,6 +117,76 @@ try to limit the cases where a deepcopy will be executed. The following chart sh
117
117
118
118
Policy copy decision tree in Collectors.
119
119
120
+ Weight Synchronization in Distributed Environments
121
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
122
+ In distributed and multiprocessed environments, ensuring that all instances of a policy are synchronized with the
123
+ latest trained weights is crucial for consistent performance. The API introduces a flexible and extensible
124
+ mechanism for updating policy weights across different devices and processes, accommodating various deployment scenarios.
125
+
126
+ Local and Remote Weight Updaters
127
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
128
+
129
+ The weight synchronization process is facilitated by two main components: :class: `~torchrl.collectors.LocalWeightUpdaterBase `
130
+ and :class: `~torchrl.collectors.RemoteWeightUpdaterBase `. These base classes provide a structured interface for
131
+ implementing custom weight update logic, allowing users to tailor the synchronization process to their specific needs.
132
+
133
+ - :class: `~torchrl.collectors.LocalWeightUpdaterBase `: This component is responsible for updating the policy weights on
134
+ the local inference worker. It is particularly useful when the training and inference occur on the same machine but on
135
+ different devices. Users can extend this class to define how weights are fetched from a server and applied locally.
136
+ It is also the extension point for collectors where the workers need to ask for weight updates (in contrast with
137
+ situations where the server decides when to update the worker policies).
138
+ - :class: `~torchrl.collectors.RemoteWeightUpdaterBase `: This component handles the distribution of policy weights to
139
+ remote inference workers. It is essential in distributed systems where multiple workers need to be kept in sync with
140
+ the central policy. Users can extend this class to implement custom logic for synchronizing weights across a network of
141
+ devices or processes.
142
+
143
+ Extending the Updater Classes
144
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
145
+
146
+ To accommodate diverse use cases, the API allows users to extend the updater classes with custom implementations.
147
+ This flexibility is particularly beneficial in scenarios involving complex network architectures or specialized hardware
148
+ setups. By implementing the abstract methods in these base classes, users can define how weights are retrieved,
149
+ transformed, and applied, ensuring seamless integration with their existing infrastructure.
150
+
151
+ Default Implementations
152
+ ~~~~~~~~~~~~~~~~~~~~~~~
153
+
154
+ For common scenarios, the API provides default implementations of these updaters, such as
155
+ :class: `~torchrl.collectors.VanillaLocalWeightUpdater `, :class: `~torchrl.collectors.MultiProcessedRemoteWeightUpdate `,
156
+ :class: `~torchrl.collectors.RayRemoteWeightUpdater `, :class: `~torchrl.collectors.RPCRemoteWeightUpdater `, and
157
+ :class: `~torchrl.collectors.DistributedRemoteWeightUpdater `.
158
+ These implementations cover a range of typical deployment configurations, from single-device setups to large-scale
159
+ distributed systems.
160
+
161
+ Practical Considerations
162
+ ~~~~~~~~~~~~~~~~~~~~~~~~
163
+
164
+ When designing a system that leverages this API, consider the following:
165
+
166
+ - Network Latency: In distributed environments, network latency can impact the speed of weight updates. Ensure that your
167
+ implementation accounts for potential delays and optimizes data transfer where possible.
168
+ - Consistency: Ensure that all workers receive the updated weights in a timely manner to maintain consistency across
169
+ the system. This is particularly important in reinforcement learning scenarios where stale weights can lead to
170
+ suboptimal policy performance.
171
+ - Scalability: As your system grows, the weight synchronization mechanism should scale efficiently. Consider the
172
+ overhead of broadcasting weights to a large number of workers and optimize the process to minimize bottlenecks.
173
+
174
+ By leveraging the API, users can achieve robust and efficient weight synchronization across a variety of deployment
175
+ scenarios, ensuring that their policies remain up-to-date and performant.
176
+
177
+ .. currentmodule :: torchrl.collectors
178
+
179
+ .. autosummary ::
180
+ :toctree: generated/
181
+ :template: rl_template.rst
182
+
183
+ LocalWeightUpdaterBase
184
+ RemoteWeightUpdaterBase
185
+ VanillaLocalWeightUpdater
186
+ MultiProcessedRemoteWeightUpdate
187
+ RayRemoteWeightUpdater
188
+ DistributedRemoteWeightUpdater
189
+ RPCRemoteWeightUpdater
120
190
121
191
Collectors and replay buffers interoperability
122
192
----------------------------------------------
0 commit comments