Shape of buffered log_probs #101

Maxtoq · 2024-02-19T17:59:30Z

Hi,

I find something odd and I'd like to know if there's something I'm missing or if it's normal.

In the buffers, you define the action_log_probs to have "act_shape" as their last dimension (https://github.com/marlbenchmark/on-policy/blob/d53c4902cf2c291c93ced2c42c621371982ca2eb/onpolicy/utils/shared_buffer.py#L79C9-L80C100).
With continuous actions, this means that the last dimension of action_log_probs would be the dimension of the action. But, the actual log probability of an action is just a single value. The model actually outputs one value for each action when actions are evaluated (and then we store them in an array of shape (ep_len, n_rollouts, n_agents, act_dim), which broadcasts the single value to act_dim).

Now, this actually doesn't cause any problem during training. So I guess you may have put this to fit the needs of other action spaces (maybe multidiscrete?).
And I guess, for continuous actions only, I could replace the "act_shape" by "1" in the dimensions of action_log_probs in the buffer.

Have I understood this correctly? Or is there something I'm missing?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shape of buffered log_probs #101

Shape of buffered log_probs #101

Maxtoq commented Feb 19, 2024

Shape of buffered log_probs #101

Shape of buffered log_probs #101

Comments

Maxtoq commented Feb 19, 2024