feat: switch sebublba to using shard_map like mava #127

sash-a · 2024-11-03T16:02:18Z

What?

Some mava upgrades, as promised 😄
Quite a few nice changes by upgrading to shard_map over pmap here, which avoids some unnecessary device transfers. I found these in mava using transfer guard, quite a nice tool.

Why?

To make sebulba go brrr

How?

Switch to shard_map
Remove the default actor device - this causing some unnecessary transfers in the pipeline
Stop using flax replicate/unreplicate rather explicitly put
Move a block_until_ready from the params source to the learner. I think the unreplicate that in the learner before was doing this, without this block we get weird and undefined seg faults
One issue I see is that we're now passing the same key to all the learners, we are actually doing this in mava also and I realize it is a minor issue, I'm not entirely sure how to fix it, I tried quickly to switch the sharding for the key to the data sharding which I think should fix it, but it didn't...it's Sunday, hopefully I have time to look at it during the week or if you could find a solution that would be awesome

NOTE:

This is very much not benchmarked. I pulled in Mava's changes in about an hour and I tested it locally and it solves cartpole, but I haven't checked on a TPU or with a harder env

EdanToledo · 2024-11-03T16:46:22Z

Thanks so much, I'll try review this and test it tomorrow on a GPU.

EdanToledo · 2024-11-04T11:06:21Z

stoix/systems/ppo/sebulba/ff_ppo.py

@@ -684,7 +706,7 @@ def run_experiment(_config: DictConfig) -> float:
        )

    # Get initial parameters
-    initial_params = unreplicate(learner_state.params)
+    initial_params = jax.device_put(learner_state.params, actor_devices[0])


This is the only thing I am not sure of. Why do we put the initial params on the first actor device instead of the cpu or something.

EdanToledo · 2024-11-04T12:35:15Z

I just did a comparison, and it seems like sebulba on main is faster. Looking at all the timing statistics, its the pipeline that is slowing things down. Everything else on this branch is faster. I am trying this on a 2-gpu system. I'll try figure out why its slow.

EdanToledo · 2024-11-04T12:54:15Z

I just did a comparison, and it seems like sebulba on main is faster. Looking at all the timing statistics, its the pipeline that is slowing things down. Everything else on this branch is faster. I am trying this on a 2-gpu system. I'll try figure out why its slow.

So basically, because the actors are faster now, the pipeline fills up faster and then causes the insertions to slow down a lot i think because there is more waiting and blocking which i imagine increases the overheads. I'm not sure on the solution to it though.

sash-a · 2024-11-04T14:47:29Z

stoix/systems/ppo/sebulba/ff_ppo.py

@@ -809,9 +809,9 @@ def run_experiment(_config: DictConfig) -> float:
        logger.log(train_metrics, t, eval_step, LogEvent.TRAIN)

        # Evaluate the current model and log the metrics
-        learner_state_cpu = jax.device_get(learner_state)
+        eval_learner_state = jax.device_put(learner_state, evaluator_device)


I think CPU for this is best because it should never block actors or learners, but not sure because it would be much faster on an accelerator 🤔

sash-a · 2024-11-04T14:48:13Z

So basically, because the actors are faster now, the pipeline fills up faster and then causes the insertions to slow down a lot i think because there is more waiting and blocking which i imagine increases the overheads. I'm not sure on the solution to it though.

I think faster actors should almost always be better, maybe it just needs some hyper parameter tuning?

feat: switch sebublba to using shard_map like mava

db25a6c

EdanToledo reviewed Nov 4, 2024

View reviewed changes

EdanToledo added 2 commits November 4, 2024 11:10

chore: add comments

1dd5267

chore: change stack code

0e04987

chore: slight chhanges to eval learner state device placement

ef08c9a

sash-a commented Nov 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: switch sebublba to using shard_map like mava #127

feat: switch sebublba to using shard_map like mava #127

sash-a commented Nov 3, 2024

EdanToledo commented Nov 3, 2024

EdanToledo Nov 4, 2024

EdanToledo commented Nov 4, 2024

EdanToledo commented Nov 4, 2024

sash-a Nov 4, 2024

sash-a commented Nov 4, 2024

feat: switch sebublba to using shard_map like mava #127

Are you sure you want to change the base?

feat: switch sebublba to using shard_map like mava #127

Conversation

sash-a commented Nov 3, 2024

What?

Why?

How?

NOTE:

EdanToledo commented Nov 3, 2024

EdanToledo Nov 4, 2024

Choose a reason for hiding this comment

EdanToledo commented Nov 4, 2024

EdanToledo commented Nov 4, 2024

sash-a Nov 4, 2024

Choose a reason for hiding this comment

sash-a commented Nov 4, 2024