get puzzled in bag #14

915288938lx · 2025-03-06T08:33:25Z

agetnts/dtqn.py
in observe() function, we tiled self.context.obs, self.context.action, self.bag.obss and self.bag.actions, and pass them into the self.policy_network, then get the possible_bag_actions, possible_bag_obss, finally update the obss of the bag with the optimal obss, but we only used the updated bag in get_action() function , why ?

in replay_buffer.py
in sample_with_bag() function, we just randomly sampled the previous record before current step , why these randomly sampled data can be used to train "persistent memory" in the bag_attention network, i got so much puzzle here, hope your reply. THKS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get puzzled in bag #14

get puzzled in bag #14

915288938lx commented Mar 6, 2025

get puzzled in bag #14

get puzzled in bag #14

Comments

915288938lx commented Mar 6, 2025