-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the priority computation correct in prioritized DQN #205
Comments
Hi @ethanluoyc , Indeed it seems that the TD error in Acme is NOT clipped. I am not sure if this causes any performance issues, however, do you have an indication that it does so? I believe @nino-vieillard recently managed to successfully reproduce DQN results, maybe he can shine some light into whether this matters. FWIW, it seems that it would be easy to clip the td_error for the priority update. Would this help? |
I have not had the chance to investigate this. Actually, I came across this problem in a different context where I want to implement prioritized sampling in a continuous control agent. I would like to implement a behavior where newly inserted experience will be immediately sampled following the insertion, this can be implemented if the adder knows about the current max priority, which seems tricky to do right now in Acme. |
I'm approaching this naively and without much knowledge of what you are trying to do. But could you squash the priority in some fixed range (e.g. (0, 1]) and insert fresh experience with priority of 1? |
Hmm I guess squashing would cause issues with changing the relative importance. My solution now is to just not use reverb for this case :D |
Hi,
Cross posting a related issue here
google-deepmind/reverb#91
After looking at the DQN agent implementation in Acme, I am not sure if the prioritized variant matches the original PER paper.
Specifically, in the original PER, the priority for newly inserted transitions is the maximum of the priority seen so far. However, in Acme, the priority is uniformly set to be 1.0. This deviates from the original PER in that the newly inserted transitions will not be sampled immediately if 1.0 is comparably smaller than the typical TD error in training.
I looked into if it's possible to get the maximum priority in reverb but haven't found a nice way to do so.
The text was updated successfully, but these errors were encountered: