-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaN in gradients #6
Comments
What do you mean by NaN safe mode? |
You shouldn't need that here. |
Why do you get a NaN in the first place? |
It looks like the problem comes from the loss computation using ForwardDiff
x0′ = ForwardDiff.Dual{:tag}.(x0, 1)
test_p = SciMLStructures.replace(Tunable(), prob.p, x0′)
test_prob = remake(prob, p = test_p)
test_sol = solve(test_prob, Rodas4(autodiff=false), saveat=sol_ref.t)
sum(sqrt.(abs2.(get_vars(test_sol, 1) .- get_refs(sol_ref, 1)))) gives
I also see that NaNs appear if I print in the loss with for i in eachindex(new_sol.u)
loss += sum(sqrt.(abs2.(get_vars(new_sol, i) .- get_refs(sol_ref, i))))
if any(isnan.(ForwardDiff.partials(loss)))
@info i
end
end |
What's the first spot of nan? |
It's due to |
What are the values? |
|
Yes but what are the values that go in? |
Hmm, let me check why they are the same 🤔 |
aah, it because we start with the same initial conditions
gives |
Yeah the gradient at zero is NaN for sqrt. That seems like a loss function issue. |
I started with the same initial conditions as in https://docs.sciml.ai/Overview/stable/showcase/missing_physics/, which means that at the very first time point we get 0 and NaN in the gradient, which ends up poisoning the whole loss. |
So not a bug, but we should document this. |
When working locally on this I initially encountered an issue where the gradient would always be NaN, which is what I think it's causing #5. I enabled NaN safe mode and that seemed to fix the issue. Is that a bug or should we just document this?
Also, what's the best way of setting this up in CI?
The text was updated successfully, but these errors were encountered: