Is the priority computation correct in prioritized DQN #205

ethanluoyc · 2022-02-22T15:10:03Z

Hi,

Cross posting a related issue here
google-deepmind/reverb#91

After looking at the DQN agent implementation in Acme, I am not sure if the prioritized variant matches the original PER paper.

Specifically, in the original PER, the priority for newly inserted transitions is the maximum of the priority seen so far. However, in Acme, the priority is uniformly set to be 1.0. This deviates from the original PER in that the newly inserted transitions will not be sampled immediately if 1.0 is comparably smaller than the typical TD error in training.

I looked into if it's possible to get the maximum priority in reverb but haven't found a nice way to do so.

nikolamomchev · 2022-03-31T15:06:47Z

Hi @ethanluoyc ,

Indeed it seems that the TD error in Acme is NOT clipped. I am not sure if this causes any performance issues, however, do you have an indication that it does so?

I believe @nino-vieillard recently managed to successfully reproduce DQN results, maybe he can shine some light into whether this matters.

FWIW, it seems that it would be easy to clip the td_error for the priority update. Would this help?

ethanluoyc · 2022-03-31T15:12:28Z

I have not had the chance to investigate this.

Actually, I came across this problem in a different context where I want to implement prioritized sampling in a continuous control agent. I would like to implement a behavior where newly inserted experience will be immediately sampled following the insertion, this can be implemented if the adder knows about the current max priority, which seems tricky to do right now in Acme.

nikolamomchev · 2022-03-31T17:08:21Z

I'm approaching this naively and without much knowledge of what you are trying to do. But could you squash the priority in some fixed range (e.g. (0, 1]) and insert fresh experience with priority of 1?

ethanluoyc · 2022-03-31T17:11:19Z

Hmm I guess squashing would cause issues with changing the relative importance. My solution now is to just not use reverb for this case :D
In case you are interested, I would like to replicate this paper with Acme. https://arxiv.org/abs/2107.00591. They have some setup with using prioritized replay which is currently difficult to do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the priority computation correct in prioritized DQN #205

Is the priority computation correct in prioritized DQN #205

ethanluoyc commented Feb 22, 2022

nikolamomchev commented Mar 31, 2022

ethanluoyc commented Mar 31, 2022

nikolamomchev commented Mar 31, 2022

ethanluoyc commented Mar 31, 2022

Is the priority computation correct in prioritized DQN #205

Is the priority computation correct in prioritized DQN #205

Comments

ethanluoyc commented Feb 22, 2022

nikolamomchev commented Mar 31, 2022

ethanluoyc commented Mar 31, 2022

nikolamomchev commented Mar 31, 2022

ethanluoyc commented Mar 31, 2022