You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've downloaded your code and made the following small changes:
-removed all loading/checkpointing/saving functions/calls
-switched the gym environment to env = gym.make("InvertedPendulum-v2")
After some training (variable amount of time before error occurs) I get the following bug: File "C:\Users\john\Desktop\project\Clone\sac_torch.py", line 32, in choose_action actions, _ = self.actor.sample_normal(state, reparameterize=False) File "C:\Users\john\Desktop\project\Clone\networks.py", line 105, in sample_normal probabilities = Normal(mu, sigma) File "C:\Users\john\anaconda3\lib\site-packages\torch\distributions\normal.py", line 50, in __init__ super(Normal, self).__init__(batch_shape, validate_args=validate_args) File "C:\Users\john\anaconda3\lib\site-packages\torch\distributions\distribution.py", line 53, in __init__ raise ValueError("The parameter {} has invalid values".format(param)) ValueError: The parameter loc has invalid values
I print out the mu and sigma and see that immediately before the error they have become equal to nan: tensor([[nan]], device='cuda:0', grad_fn=<AddmmBackward>) tensor([[nan]], device='cuda:0', grad_fn=<ClampBackward1>)
(This appears to be occurring during a forward pass, not buffer sampling, due to the tensor being 1 dimensional)
Thanks again for the quick reply in your video!
The text was updated successfully, but these errors were encountered:
In the sample_normal method from ActorNetwork, we compute log_probs using log(1-actions+epsilon), however, action was defined before to be equal to tanh(actions)*max_actions, which may be superior to 0. By computing log_probs with tanh(actions) instead of tanh(actions)*max_actions, it may work ( it solves the same issue you have in my case using the Pendulum-v0 env)
In the sample_normal method from ActorNetwork, we compute log_probs using log(1-actions+epsilon), however, action was defined before to be equal to tanh(actions)*max_actions, which may be superior to 0. By computing log_probs with tanh(actions) instead of tanh(actions)*max_actions, it may work ( it solves the same issue you have in my case using the Pendulum-v0 env)
Edit : tanh(actions)*max_actions may be superior to 1, which in return could mean that we get NaN
I've downloaded your code and made the following small changes:
-removed all loading/checkpointing/saving functions/calls
-switched the gym environment to
env = gym.make("InvertedPendulum-v2")
After some training (variable amount of time before error occurs) I get the following bug:
File "C:\Users\john\Desktop\project\Clone\sac_torch.py", line 32, in choose_action actions, _ = self.actor.sample_normal(state, reparameterize=False) File "C:\Users\john\Desktop\project\Clone\networks.py", line 105, in sample_normal probabilities = Normal(mu, sigma) File "C:\Users\john\anaconda3\lib\site-packages\torch\distributions\normal.py", line 50, in __init__ super(Normal, self).__init__(batch_shape, validate_args=validate_args) File "C:\Users\john\anaconda3\lib\site-packages\torch\distributions\distribution.py", line 53, in __init__ raise ValueError("The parameter {} has invalid values".format(param)) ValueError: The parameter loc has invalid values
I print out the mu and sigma and see that immediately before the error they have become equal to nan:
tensor([[nan]], device='cuda:0', grad_fn=<AddmmBackward>) tensor([[nan]], device='cuda:0', grad_fn=<ClampBackward1>)
(This appears to be occurring during a forward pass, not buffer sampling, due to the tensor being 1 dimensional)
Thanks again for the quick reply in your video!
The text was updated successfully, but these errors were encountered: