You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't really know what to title this issue, but basically I'm having trouble differentiating my model when I use Lux.jl. The error output I'm getting is also incredibly sparse, and doesn't say much about what's going on.
Basically, I defined a one-layer NN and added it in to the RHS of my model, and I now want to use Enzyme to compute derivatives. I'm seeing the following behaviors when testing everything:
When I run a one-day integration with Enzyme.jl without using checkpointing, everything seems to be okay and the model finishes running
When I run a ten-day integration with Enzyme.jl, still without checkpointing, I get the following output in my terminal
swilliamson@CRIOS-A66253 ~/D/G/S/eddy-stresses> julia --project=. eddy_paper.jl &>flux_nn_output_nocp_10days.txt :) main!?#
[1] 68956 killed julia --project=. eddy_paper.jl &> flux_nn_output_nocp_10days.txt
with no actual error output. This is similar to when I've run out of memory in the past, but I shouldn't be running out of memory in a ten-day integration
In response to (2) I instead tried running the ten-day integration with Enzyme.jl and Checkpointing.jl, my terminal now warns me:
swilliamson@CRIOS-A66253 ~/D/G/S/eddy-stresses> julia --project=. eddy_paper.jl &>flux_nn_output_withcp_10days.txt :( main!?#
[1] 69589 abort julia --project=. eddy_paper.jl &> flux_nn_output_withcp_10days.txt
I'm also seeing substantial slowdowns when I use Enzyme on the Lux NN, versus if I just handwrite a single layer NN, so getting to these errors takes a long time. All my code is in a private repo, but @wsmoses should have access. I'm also happy to elaborate if anything is unclear. Any and all advice and assistance here is greatly appreciated!!
The text was updated successfully, but these errors were encountered:
I don't really know what to title this issue, but basically I'm having trouble differentiating my model when I use Lux.jl. The error output I'm getting is also incredibly sparse, and doesn't say much about what's going on.
Basically, I defined a one-layer NN and added it in to the RHS of my model, and I now want to use Enzyme to compute derivatives. I'm seeing the following behaviors when testing everything:
When I run a one-day integration with Enzyme.jl without using checkpointing, everything seems to be okay and the model finishes running
When I run a ten-day integration with Enzyme.jl, still without checkpointing, I get the following output in my terminal
with no actual error output. This is similar to when I've run out of memory in the past, but I shouldn't be running out of memory in a ten-day integration
with the specific error message
I'm running
I'm also seeing substantial slowdowns when I use Enzyme on the Lux NN, versus if I just handwrite a single layer NN, so getting to these errors takes a long time. All my code is in a private repo, but @wsmoses should have access. I'm also happy to elaborate if anything is unclear. Any and all advice and assistance here is greatly appreciated!!
The text was updated successfully, but these errors were encountered: