Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use With Vectorized Environments #130

Closed
wbrenton opened this issue May 23, 2023 · 6 comments
Closed

Use With Vectorized Environments #130

wbrenton opened this issue May 23, 2023 · 6 comments

Comments

@wbrenton
Copy link

What are the best practices for use with a vectorized environment? Any help is appreciated thank you

@acassirer
Copy link
Contributor

Hey,

Not really sure what you are actually asking but the type of environment should not really have much impact on how you use Reverb.

@wbrenton
Copy link
Author

wbrenton commented May 24, 2023

@acassirer Vectorized environments meaning you interact with a batch of environments every time you call .reset() or .step() on your environment api.

Here is a motivating example for why I think the question is worth while.

envs = make_envs(num_parallel_env=N, env_id="Breakout-v5")
obs = envs.reset()
print(obs.shape) # (N, 4, 86, 86)

# one trajectory writer for each env
trajectory_writers = [rb_client.trajectory_writer(num_keep_alive_refs=args.rollout_length) for _ in range(N)]
while True:
     next_obs, rewards, dones, infos = envs.step(actions)
     # next_obs.shape = (N, 4, 86, 86)
     # rewards = (N,) # scalar rewards
     
     # loop over every environment and write the experience to it's respective writer
     for idx in range(args.num_envs):
            trajectory_writer = trajectory_writers[idx]
            trajectory_writer.append({
                'obs': obs[idx],
                'actions': actions[idx],
                'rewards': rewards[idx],
                'dones': dones[idx]
            })
            if trajectory_writer.epsiode_steps >= 2:
                trajectory_writer.create_item(
                    table='uniform_experience_replay',
                    priority=1.,
                    trajectory={
                        'obs': trajectory_writer.history['obs'][:-1],
                        'next_obs': trajectory_writer.history['obs'][-1:],
                        'actions': trajectory_writer.history['actions'][:-1],
                        'rewards': trajectory_writer.history['rewards'][:-1],
                        'dones': trajectory_writer.history['dones'][:-1],
                })

Having to iterate over every environment is quite slow and defeats the purpose of using a vectorized environment. Surely there must be a better way, I'm just unable to find it in the codebase.

In case it's still not 100% clear what I'm looking for is a way to write a batch of experiences from N environments to the table without having to maintain a writer for each one of the N environments.

@thomasbbrunner
Copy link

This is a very relevant and common use-case in modern DRL. I also came looking for an answer to this.

Ideally there would be a section in the documentation regarding batched writing of trajectories.

Also, don't understand why this issue was closed. It's clearly not resolved.

@thomasbbrunner
Copy link

This topic is also discussed in this other issue: #78

@wbrenton
Copy link
Author

@thomasbbrunner glad you replied to this thread, I killed so much time trying to use reverb with vectorized envs. What framework are you using (PyTorch, JAX, etc.)?

@thomasbbrunner
Copy link

I'm using a combination of PyTorch + Numpy. Currently facing problems, as between 50% and 90% of the time in my rollouts is spend on reverb, with the remaining being spent on stepping the environment + metrics.

I tried using multithreading as described in #72 (comment), however, it did not lead to improvements (prob. limited by the GIL). Multiprocessing is a pain in Python, so prob. not an option (data has to pickleable).

Not sure what to try next. Did you end up finding a solution for your use-case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants