ValueError: The parameter loc has invalid values #36

johnschwarcz · 2021-08-22T18:48:54Z

I've downloaded your code and made the following small changes:
-removed all loading/checkpointing/saving functions/calls
-switched the gym environment to env = gym.make("InvertedPendulum-v2")

After some training (variable amount of time before error occurs) I get the following bug:
File "C:\Users\john\Desktop\project\Clone\sac_torch.py", line 32, in choose_action actions, _ = self.actor.sample_normal(state, reparameterize=False) File "C:\Users\john\Desktop\project\Clone\networks.py", line 105, in sample_normal probabilities = Normal(mu, sigma) File "C:\Users\john\anaconda3\lib\site-packages\torch\distributions\normal.py", line 50, in __init__ super(Normal, self).__init__(batch_shape, validate_args=validate_args) File "C:\Users\john\anaconda3\lib\site-packages\torch\distributions\distribution.py", line 53, in __init__ raise ValueError("The parameter {} has invalid values".format(param)) ValueError: The parameter loc has invalid values

I print out the mu and sigma and see that immediately before the error they have become equal to nan:
tensor([[nan]], device='cuda:0', grad_fn=<AddmmBackward>) tensor([[nan]], device='cuda:0', grad_fn=<ClampBackward1>)
(This appears to be occurring during a forward pass, not buffer sampling, due to the tensor being 1 dimensional)

Thanks again for the quick reply in your video!

The text was updated successfully, but these errors were encountered:

otouat · 2021-09-04T17:33:59Z

In the sample_normal method from ActorNetwork, we compute log_probs using log(1-actions+epsilon), however, action was defined before to be equal to tanh(actions)*max_actions, which may be superior to 0. By computing log_probs with tanh(actions) instead of tanh(actions)*max_actions, it may work ( it solves the same issue you have in my case using the Pendulum-v0 env)

otouat · 2021-09-04T17:57:18Z

In the sample_normal method from ActorNetwork, we compute log_probs using log(1-actions+epsilon), however, action was defined before to be equal to tanh(actions)*max_actions, which may be superior to 0. By computing log_probs with tanh(actions) instead of tanh(actions)*max_actions, it may work ( it solves the same issue you have in my case using the Pendulum-v0 env)

Edit : tanh(actions)*max_actions may be superior to 1, which in return could mean that we get NaN

peter890331 · 2023-09-19T13:10:54Z

Thanks @otouat, I just had the same problem with tensor([[nan]]), and here is a big THANK after two years !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: The parameter loc has invalid values #36

ValueError: The parameter loc has invalid values #36

johnschwarcz commented Aug 22, 2021

otouat commented Sep 4, 2021

otouat commented Sep 4, 2021

peter890331 commented Sep 19, 2023

ValueError: The parameter loc has invalid values #36

ValueError: The parameter loc has invalid values #36

Comments

johnschwarcz commented Aug 22, 2021

otouat commented Sep 4, 2021

otouat commented Sep 4, 2021

peter890331 commented Sep 19, 2023