Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape of buffered log_probs #101

Open
Maxtoq opened this issue Feb 19, 2024 · 0 comments
Open

Shape of buffered log_probs #101

Maxtoq opened this issue Feb 19, 2024 · 0 comments

Comments

@Maxtoq
Copy link

Maxtoq commented Feb 19, 2024

Hi,

I find something odd and I'd like to know if there's something I'm missing or if it's normal.

In the buffers, you define the action_log_probs to have "act_shape" as their last dimension (https://github.com/marlbenchmark/on-policy/blob/d53c4902cf2c291c93ced2c42c621371982ca2eb/onpolicy/utils/shared_buffer.py#L79C9-L80C100).
With continuous actions, this means that the last dimension of action_log_probs would be the dimension of the action. But, the actual log probability of an action is just a single value. The model actually outputs one value for each action when actions are evaluated (and then we store them in an array of shape (ep_len, n_rollouts, n_agents, act_dim), which broadcasts the single value to act_dim).

Now, this actually doesn't cause any problem during training. So I guess you may have put this to fit the needs of other action spaces (maybe multidiscrete?).
And I guess, for continuous actions only, I could replace the "act_shape" by "1" in the dimensions of action_log_probs in the buffer.

Have I understood this correctly? Or is there something I'm missing?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant