Issues in `ReplayBuffer` #201

hyeok9855 · 2024-10-18T13:15:43Z

Here I note several issues in the current ReplayBuffer, including the points raised by @younik in #193.

Types of objects, (Fix linter and add pre-commit CI #193): Currently, ReplayBuffer can take Trajectories, Transitions (which are inherited from Container), and tuple[States, States] (which is not). This makes the overall code a bit more messy. Possible solutions are as follows:
a. Make State inherit from Container, and let the ReplayBuffer take Container objects.
b. Make subclasses of ReplayBuffer depending on the objects
Supporting terminal-state-only Buffer: I think the most popular form of the ReplayBuffer is just storing a terminal state x (with reward) as in Max's code, and sample a training trajectory with backward policy. This is preferred since 1) it significantly reduces memory usage (vs. Trajectory-based), and 2) the backward sampling gives more diverse learning signals, which I believe helps training. We need to support this, but it needs to be considered along with the issue 1 above.
Prioritization: I think using PrioritizedReplayBuffer by default (instead of ReplayBuffer), with a proper optional parameter that enables the prioritization will make the code more concise.
Device (minor): Currently, the device of objects in the buffer follows that of env. Maybe we should support options for both CPU and cuda, since they have different pros and cons, e.g, storing as cuda tensor is faster if we use cuda, but it consumes more GPU memory. BTW, there exist some minor errors in ReplayBuffer when cuda is enabled. We need to check.

I hope to discuss enough before tackling those!

The text was updated successfully, but these errors were encountered:

josephdviviano · 2024-10-18T13:36:18Z

Thanks @hyeok9855 this is great.

My thoughts - I think that there are many use-cases where a buffer would be better off having the full trajectory (for algorithmic reasons) - but I also agree that often you only want to store the terminal state.

While making States a type of Container actually seems reasonable to me, I'm not at all sure how to you reason about buffers that only accept states. It seems like we would need to, in some cases at least, reinvent all the trajectory-level features already present in the Trajectories or Transitions classes.

So based on this I think making a bunch of buffer types makes sense using subclassing.

I am thinking the hierarchy could be that you have Transition, Trajectory, and State level buffers, which can optionally inherit from different kinds of buffers (right now we have Replay and PrioritizedReplay) - one could imagine many kinds of buffers.

Agree RE: merging prioritized and normal buffers - but I could imagine more kinds of buffers in the future that should be their own class. Our "prioritized" buffer is really a specific kind of prioritized buffer that won't work for everyone unfortunately.

Agree 100% for the device handling.

josephdviviano · 2024-10-18T13:39:21Z

I'm also happy to help with this :)

josephdviviano · 2024-10-31T16:07:34Z

Thanks -- re this - I'm wondering whether we can simply use the ReplayBuffer implementation in the torchrl distribution and inherit all of their functionality. This would obviously be it's own PR and might not be possible if too many decisions we made elsewhere in the repo are incompatible (but at the very least this is a good reference implementation for us).

@saleml @younik thoughts?

https://github.com/pytorch/rl/tree/main/torchrl/data/replay_buffers

younik · 2024-10-31T16:14:02Z

Reusing TorchRL ReplayBuffer is a very good idea. They decouple sampling from the replay buffer, which is a great design choice.

I will look at the challenges to migrating to it more in depth

hyeok9855 added enhancement New feature or request help wanted Extra attention is needed labels Oct 18, 2024

hyeok9855 assigned hyeok9855 and younik Oct 18, 2024

hyeok9855 changed the title ~~Issues in replay buffer~~ Issues in ReplayBuffer Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues in `ReplayBuffer` #201

Issues in `ReplayBuffer` #201

hyeok9855 commented Oct 18, 2024 •

edited

Loading

josephdviviano commented Oct 18, 2024

josephdviviano commented Oct 18, 2024

josephdviviano commented Oct 31, 2024 •

edited

Loading

younik commented Oct 31, 2024

Issues in ReplayBuffer #201

Issues in ReplayBuffer #201

Comments

hyeok9855 commented Oct 18, 2024 • edited Loading

josephdviviano commented Oct 18, 2024

josephdviviano commented Oct 18, 2024

josephdviviano commented Oct 31, 2024 • edited Loading

younik commented Oct 31, 2024

Issues in `ReplayBuffer` #201

Issues in `ReplayBuffer` #201

hyeok9855 commented Oct 18, 2024 •

edited

Loading

josephdviviano commented Oct 31, 2024 •

edited

Loading