Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: The parameter loc has invalid values #36

Open
johnschwarcz opened this issue Aug 22, 2021 · 3 comments
Open

ValueError: The parameter loc has invalid values #36

johnschwarcz opened this issue Aug 22, 2021 · 3 comments

Comments

@johnschwarcz
Copy link

I've downloaded your code and made the following small changes:
-removed all loading/checkpointing/saving functions/calls
-switched the gym environment to env = gym.make("InvertedPendulum-v2")

After some training (variable amount of time before error occurs) I get the following bug:
File "C:\Users\john\Desktop\project\Clone\sac_torch.py", line 32, in choose_action actions, _ = self.actor.sample_normal(state, reparameterize=False) File "C:\Users\john\Desktop\project\Clone\networks.py", line 105, in sample_normal probabilities = Normal(mu, sigma) File "C:\Users\john\anaconda3\lib\site-packages\torch\distributions\normal.py", line 50, in __init__ super(Normal, self).__init__(batch_shape, validate_args=validate_args) File "C:\Users\john\anaconda3\lib\site-packages\torch\distributions\distribution.py", line 53, in __init__ raise ValueError("The parameter {} has invalid values".format(param)) ValueError: The parameter loc has invalid values

I print out the mu and sigma and see that immediately before the error they have become equal to nan:
tensor([[nan]], device='cuda:0', grad_fn=<AddmmBackward>) tensor([[nan]], device='cuda:0', grad_fn=<ClampBackward1>)
(This appears to be occurring during a forward pass, not buffer sampling, due to the tensor being 1 dimensional)

Thanks again for the quick reply in your video!

@otouat
Copy link

otouat commented Sep 4, 2021

In the sample_normal method from ActorNetwork, we compute log_probs using log(1-actions+epsilon), however, action was defined before to be equal to tanh(actions)*max_actions, which may be superior to 0. By computing log_probs with tanh(actions) instead of tanh(actions)*max_actions, it may work ( it solves the same issue you have in my case using the Pendulum-v0 env)

@otouat
Copy link

otouat commented Sep 4, 2021

In the sample_normal method from ActorNetwork, we compute log_probs using log(1-actions+epsilon), however, action was defined before to be equal to tanh(actions)*max_actions, which may be superior to 0. By computing log_probs with tanh(actions) instead of tanh(actions)*max_actions, it may work ( it solves the same issue you have in my case using the Pendulum-v0 env)

Edit : tanh(actions)*max_actions may be superior to 1, which in return could mean that we get NaN

@peter890331
Copy link

Thanks @otouat, I just had the same problem with tensor([[nan]]), and here is a big THANK after two years !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants