-
-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enzyme gradient example broken #2554
Comments
Note that the linked example has Edit: I can reproduce this on Enzyme v0.13.21, but upgrading to 0.13.22 removes it: (jl_3JKgRP) pkg> st Flux Enzyme
Status `/private/var/folders/yq/4p2zwd614y59gszh7y9ypyhh0000gn/T/jl_3JKgRP/Project.toml`
[7da242da] Enzyme v0.13.22
[587475ba] Flux v0.15.2
julia> VERSION
v"1.10.4" 0.13.22 also solved some perhaps-related problems I encountered. Maybe that's all? Below, attempts to isolate this on v0.13.21. The last Flux function appearing in the stacktrace has the job of converting input eltype to match parameters (to get BLAS etc not much-slower fallbacks), and some simple ways to call Enzyme on that, which work fine: julia> Flux._match_eltype("justforprinting", Float32, [1, 2f0]) # does nothing
2-element Vector{Float32}:
1.0
2.0
julia> Flux._match_eltype("justforprinting", Float32, [1, 2]) # convert integers, silently
2-element Vector{Float32}:
1.0
2.0
julia> Flux._match_eltype("justforprinting", Float32, [1, 2.0]) # convert Float64, print warning once
┌ Warning: Layer with Float32 parameters got Float64 input.
│ The input will be converted, but any earlier layers may be very slow.
│ layer = "justforprinting"
│ summary(x) = "2-element Vector{Float64}"
└ @ Flux ~/.julia/packages/Flux/5vIRy/src/layers/stateless.jl:60
2-element Vector{Float32}:
1.0
2.0
julia> Enzyme.gradient(Reverse, sum∘Flux._match_eltype, "str", Float32, [1, 2f0])
(nothing, nothing, Float32[1.0, 1.0])
julia> Enzyme.gradient(Reverse, sum∘Flux._match_eltype, "str", Float32, [1, 2.0])
(nothing, nothing, [1.0, 1.0]) In a fresh session, to ensure the julia> Enzyme.gradient(Reverse, sum∘Flux._match_eltype, "str", Float32, [1, 2f0])
(nothing, nothing, Float32[1.0, 1.0])
julia> Enzyme.gradient(Reverse, sum∘Flux._match_eltype, "str", Float32, [1, 2.0])
┌ Warning: Layer with Float32 parameters got Float64 input.
│ The input will be converted, but any earlier layers may be very slow.
│ layer = "str"
│ summary(x) = "2-element Vector{Float64}"
└ @ Flux ~/.julia/packages/Flux/HBF2N/src/layers/stateless.jl:60
(nothing, nothing, [1.0, 1.0])
(jl_vUrrev) pkg> st Enzyme
Status `/private/var/folders/yq/4p2zwd614y59gszh7y9ypyhh0000gn/T/jl_vUrrev/Project.toml`
[7da242da] Enzyme v0.13.21
julia> VERSION
v"1.10.4" |
Oh, you are right; looks like I copied the I'm surprised that you cannot reproduce the issue. Even if run the example with 1.10.4 (instead of the latest lts patch release, 1.10.7), I see the same issue. I see the issue both on Enzyme 0.13.21 and 0.13.22 and main. julia> versioninfo()
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 12 × Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 12 default, 1 interactive, 6 GC (on 12 virtual cores)
Environment:
JULIA_SYSIMAGE_LIB_DIR = /home/lassepe/dotfiles/dotfiles/builds
JULIA_LSP_JULIA_BIN = /home/lassepe/helperScripts/bin/julia_lsp_julia_bin
JULIA_NUM_THREADS = auto,auto Output of st -m
|
Do any of my other tests trigger this error for you? For me it also works on 1.11: julia> reproducer()
((layers = ((weight = Float32[-0.00023009 -0.0014316047 … -0.0033742953 0.0007075156; -0.002407181 -0.014977321 … -0.035301577 0.007401965; …
julia> VERSION
v"1.11.0"
(jl_wutXbb) pkg> st
Status `/private/var/folders/yq/4p2zwd614y59gszh7y9ypyhh0000gn/T/jl_wutXbb/Project.toml`
[7da242da] Enzyme v0.13.22
[587475ba] Flux v0.16.0 `~/.julia/dev/Flux`
julia> versioninfo()
Julia Version 1.11.0
Commit 501a4f25c2b (2024-10-07 11:40 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 11 × Apple M3 Pro
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, apple-m3) |
FWIW, I see that you are on libLLVM 16 and I'm on 15 |
I see failure again on another machine, intel linux but libLLVM-16.0.6 Somehow this must have to do with logging macro
Maybe these rules don't cover enough? julia> versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 12 × Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, broadwell)
Threads: 4 default, 0 interactive, 2 GC (on 12 virtual cores)
Environment:
JULIA_NUM_THREADS = 4
(jl_7brNeg) pkg> st
Status `/tmp/jl_7brNeg/Project.toml`
[7da242da] Enzyme v0.13.22
[587475ba] Flux v0.16.0
julia> reproducer()
ERROR: Constant memory is stored (or returned) to a differentiable variable.
As a result, Enzyme cannot provably ensure correctness and throws this error.
This might be due to the use of a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Runtime-Activity).
If Enzyme should be able to prove this use non-differentable, open an issue!
To work around this issue, either:
a) rewrite this variable to not be conditionally active (fastest, but requires a code change), or
b) set the Enzyme mode to turn on runtime activity (e.g. autodiff(set_runtime_activity(Reverse), ...) ). This will maintain correctness, but may slightly reduce performance.
Mismatched activity for: store {} addrspace(10)* %61, {} addrspace(10)* addrspace(10)* %.repack32, align 8, !dbg !262, !tbaa !264, !alias.scope !202, !noalias !266 const val: %61 = call fastcc nonnull {} addrspace(10)* @julia_summary_15589({} addrspace(10)* nocapture noundef nonnull readonly align 8 dereferenceable(32) %1) #213, !dbg !252
Type tree: {[-1]:Pointer}
llvalue= %61 = call fastcc nonnull {} addrspace(10)* @julia_summary_15589({} addrspace(10)* nocapture noundef nonnull readonly align 8 dereferenceable(32) %1) #213, !dbg !252
Stacktrace:
[1] #invokelatest#2
@ ./essentials.jl:1057
[2] invokelatest
@ ./essentials.jl:1052
[3] macro expansion
@ ./logging/logging.jl:388
[4] _match_eltype
@ ~/.julia/packages/Flux/Mhg1r/src/layers/stateless.jl:60
Stacktrace:
[1] #invokelatest#2
@ ./essentials.jl:1057 [inlined]
[2] invokelatest
@ ./essentials.jl:1052 [inlined]
[3] macro expansion
@ ./logging/logging.jl:388 [inlined]
[4] _match_eltype
@ ~/.julia/packages/Flux/Mhg1r/src/layers/stateless.jl:60
[5] _match_eltype
@ ~/.julia/packages/Flux/Mhg1r/src/layers/stateless.jl:85 [inlined]
[6] Dense
@ ~/.julia/packages/Flux/Mhg1r/src/layers/basic.jl:198
[7] macro expansion
@ ~/.julia/packages/Flux/Mhg1r/src/layers/basic.jl:68 [inlined]
[8] _applychain
@ ~/.julia/packages/Flux/Mhg1r/src/layers/basic.jl:68
[9] Chain
@ ~/.julia/packages/Flux/Mhg1r/src/layers/basic.jl:65 [inlined]
[10] #2
@ ./REPL[5]:8 [inlined]
[11] diffejulia__2_15066_inner_184wrap
@ ./REPL[5]:0
[12] macro expansion
@ ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:5204 [inlined]
[13] enzyme_call
@ ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:4750 [inlined]
[14] CombinedAdjointThunk
@ ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:4622 [inlined]
[15] autodiff(::EnzymeCore.ReverseMode{…}, ::EnzymeCore.Const{…}, ::Type{…}, ::EnzymeCore.Duplicated{…}, ::EnzymeCore.Const{…}, ::EnzymeCore.Const{…})
@ Enzyme ~/.julia/packages/Enzyme/haqjK/src/Enzyme.jl:503
[16] _enzyme_gradient(::Function, ::EnzymeCore.Duplicated{Flux.Chain{…}}, ::Vararg{Union{…}}; zero::Bool)
@ FluxEnzymeExt ~/.julia/packages/Flux/Mhg1r/ext/FluxEnzymeExt/FluxEnzymeExt.jl:49
[17] gradient(::Function, ::EnzymeCore.Duplicated{Flux.Chain{…}}, ::Vararg{Union{…}}; zero::Bool)
@ Flux ~/.julia/packages/Flux/Mhg1r/src/gradient.jl:122
[18] reproducer()
@ Main ./REPL[5]:7
[19] top-level scope
@ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types. |
Following the Enzyme.jl example:
I run into the following error both on Julia 1.10.7 and Julia 1.11.2
Interestingly, each of the following changes resolves the issue:
Enzyme.gradient(Enzyme.set_runtime_activity(Enzyme.Reverse), ...)
as suggested by the error messages works.input
toFloat32
before passing it toFlux.gradient
also resolves the issueHowever, it seems that the conversion to
Float32
alone is not the culprit. The following works as expected:The text was updated successfully, but these errors were encountered: