-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Policy Gradient, SAC doesn't learn #65
Comments
The pybullet_envs was originally used for the test environment: InvertedPendulumBulletEnv-v0, but unfortunately pybullet hasn't updated their code to be compliant with the new gym specifications. Hence the errors when you try to import. I'm dealing with this problem in my Academy right now (and, spoiler, I'm writing a deep RL framework that I'll release a 0.1.dev build of very soon) and will be able to address these particular issues, and more, in the coming days. But, to get you started, you want to make sure that you actually get the "truncated" boolean flag back from the env.step() function. The reason is that the done flag doesn't flip to True when max_steps is reached, rather the truncated flag takes care of this. So your while loop should be while not (done or truncated), so that you don't get an infinite loop. As far as learning issues, I'll have to come back and update. I'm validating the initial commit of my framework, and will test SAC today. |
Thank you so much for your answer! Please keep me updated on the learning issues whenever you have time to test it, it'd be greatly appreciated. I hope you have a good day! |
In
By multiplying with |
Hi! I have a few more questions about the code that I don't quite get.
First, I was wondering what pybullet_envs is for. I installed the library but got errors when i tried to import it. I also dont see where its being used.
Second, I was getting really bad scores when i ran the code. I cloned the code from your git, and changed a few things as follows. The first thing I changed is the environment. More specifically, I changed it to
env = gym.make("InvertedPendulum-v4")
and as a result I also changed the followingobs, _ = env.reset()
andobs_, reward, done, *_ = env.step(action)
. Finally, I commented out the lines in sac_torch.py where we use the reparameterize=True since I ran into some nan Tensors when calling rsample().That's all I've changed, and when I run the code, the score actually decreases (oddly enough). It starts with a score of approx 10 like a random agent, and decreases down to 3 or 4 after 250 episodes.
Would you have any idea of why this is happening? It would be so greatly appreciated!
Thanks a lot for your time
The text was updated successfully, but these errors were encountered: