You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
agetnts/dtqn.py
in observe() function, we tiled self.context.obs, self.context.action, self.bag.obss and self.bag.actions, and pass them into the self.policy_network, then get the possible_bag_actions, possible_bag_obss, finally update the obss of the bag with the optimal obss, but we only used the updated bag in get_action() function , why ?
in replay_buffer.py
in sample_with_bag() function, we just randomly sampled the previous record before current step , why these randomly sampled data can be used to train "persistent memory" in the bag_attention network, i got so much puzzle here, hope your reply. THKS
The text was updated successfully, but these errors were encountered:
agetnts/dtqn.py
in observe() function, we tiled self.context.obs, self.context.action, self.bag.obss and self.bag.actions, and pass them into the self.policy_network, then get the possible_bag_actions, possible_bag_obss, finally update the obss of the bag with the optimal obss, but we only used the updated bag in get_action() function , why ?
in replay_buffer.py
in sample_with_bag() function, we just randomly sampled the previous record before current step , why these randomly sampled data can be used to train "persistent memory" in the bag_attention network, i got so much puzzle here, hope your reply. THKS
The text was updated successfully, but these errors were encountered: