Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enzyme gradient example broken #2554

Open
lassepe opened this issue Dec 14, 2024 · 5 comments
Open

Enzyme gradient example broken #2554

lassepe opened this issue Dec 14, 2024 · 5 comments
Labels

Comments

@lassepe
Copy link
Contributor

lassepe commented Dec 14, 2024

Following the Enzyme.jl example:

using Flux: Flux
using Enzyme: Enzyme

function reproducer()
    model = Flux.Chain(Flux.Dense(28^2 => 32, Flux.sigmoid), Flux.Dense(32 => 10), Flux.softmax)
    # this allocates space for the gradient
    model_shadow = Enzyme.Duplicated(model)
    input = randn(28^2, 1)
    label = [i == 3 for i in 1:10]
    grad_model = Flux.gradient(
        (m, x, y) -> sum(abs2, m(x) .- y),
        model_shadow,
        # input and label are const because we differentiate w.r.t. to the model parameters, not input or output.
        Enzyme.Const(input),
        Enzyme.Const(label),
    )
end

reproducer()

I run into the following error both on Julia 1.10.7 and Julia 1.11.2

ERROR: Constant memory is stored (or returned) to a differentiable variable.
As a result, Enzyme cannot provably ensure correctness and throws this error.
This might be due to the use of a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Runtime-Activity).
If Enzyme should be able to prove this use non-differentable, open an issue!
To work around this issue, either:
 a) rewrite this variable to not be conditionally active (fastest, but requires a code change), or
 b) set the Enzyme mode to turn on runtime activity (e.g. autodiff(set_runtime_activity(Reverse), ...) ). This will maintain correctness, but may slightly reduce performance.
Mismatched activity for:   store {} addrspace(10)* %68, {} addrspace(10)* addrspace(10)* %.repack65, align 8, !dbg !331, !tbaa !333, !alias.scope !268, !noalias !335 const val:   %68 = call fastcc nonnull {} addrspace(10)* @julia_summary_3629({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %1) #308, !dbg !323
Type tree: {[-1]:Pointer}
 llvalue=  %68 = call fastcc nonnull {} addrspace(10)* @julia_summary_3629({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %1) #308, !dbg !323

Stacktrace:
 [1] #invokelatest#2
   @ ./essentials.jl:894
 [2] invokelatest
   @ ./essentials.jl:889
 [3] macro expansion
   @ ./logging.jl:377
 [4] _match_eltype
   @ ~/.julia/packages/Flux/fteq2/src/layers/stateless.jl:60

Interestingly, each of the following changes resolves the issue:

  1. Calling the gradient computation as Enzyme.gradient(Enzyme.set_runtime_activity(Enzyme.Reverse), ...) as suggested by the error messages works.
  2. Explicitly converting input to Float32 before passing it to Flux.gradient also resolves the issue

However, it seems that the conversion to Float32 alone is not the culprit. The following works as expected:

function sum_converted(x)
    sum(convert(AbstractArray{Float32}, x))
end

function reproducer_reduced()
    input = randn(10)
    Enzyme.gradient(Enzyme.Reverse, sum_converted, input)
end
# returns `([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)`
@mcabbott
Copy link
Member

mcabbott commented Dec 15, 2024

Note that the linked example has x1 = randn32(28*28, 1) i.e. Float32 data, which avoids the issue -- matching change 2.

Edit: I can reproduce this on Enzyme v0.13.21, but upgrading to 0.13.22 removes it:

(jl_3JKgRP) pkg> st Flux Enzyme
Status `/private/var/folders/yq/4p2zwd614y59gszh7y9ypyhh0000gn/T/jl_3JKgRP/Project.toml`
  [7da242da] Enzyme v0.13.22
  [587475ba] Flux v0.15.2

julia> VERSION
v"1.10.4"

0.13.22 also solved some perhaps-related problems I encountered. Maybe that's all?

Below, attempts to isolate this on v0.13.21.

The last Flux function appearing in the stacktrace has the job of converting input eltype to match parameters (to get BLAS etc not much-slower fallbacks), and some simple ways to call Enzyme on that, which work fine:

julia> Flux._match_eltype("justforprinting", Float32, [1, 2f0])  # does nothing
2-element Vector{Float32}:
 1.0
 2.0
 
julia> Flux._match_eltype("justforprinting", Float32, [1, 2])  # convert integers, silently
2-element Vector{Float32}:
 1.0
 2.0

julia> Flux._match_eltype("justforprinting", Float32, [1, 2.0])  # convert Float64, print warning once
┌ Warning: Layer with Float32 parameters got Float64 input.
│   The input will be converted, but any earlier layers may be very slow.
│   layer = "justforprinting"summary(x) = "2-element Vector{Float64}"
└ @ Flux ~/.julia/packages/Flux/5vIRy/src/layers/stateless.jl:60
2-element Vector{Float32}:
 1.0
 2.0

julia> Enzyme.gradient(Reverse, sumFlux._match_eltype, "str", Float32, [1, 2f0])
(nothing, nothing, Float32[1.0, 1.0])

julia> Enzyme.gradient(Reverse, sumFlux._match_eltype, "str", Float32, [1, 2.0])
(nothing, nothing, [1.0, 1.0])

In a fresh session, to ensure the @warn maxlog=1 still prints, these Enzyme examples still work fine:

julia> Enzyme.gradient(Reverse, sumFlux._match_eltype, "str", Float32, [1, 2f0])
(nothing, nothing, Float32[1.0, 1.0])

julia> Enzyme.gradient(Reverse, sumFlux._match_eltype, "str", Float32, [1, 2.0])
┌ Warning: Layer with Float32 parameters got Float64 input.
│   The input will be converted, but any earlier layers may be very slow.
│   layer = "str"summary(x) = "2-element Vector{Float64}"
└ @ Flux ~/.julia/packages/Flux/HBF2N/src/layers/stateless.jl:60
(nothing, nothing, [1.0, 1.0])

(jl_vUrrev) pkg> st Enzyme
Status `/private/var/folders/yq/4p2zwd614y59gszh7y9ypyhh0000gn/T/jl_vUrrev/Project.toml`
  [7da242da] Enzyme v0.13.21

julia> VERSION
v"1.10.4"

@lassepe
Copy link
Contributor Author

lassepe commented Dec 15, 2024

Oh, you are right; looks like I copied the randn call incorrectly from the example.

I'm surprised that you cannot reproduce the issue. Even if run the example with 1.10.4 (instead of the latest lts patch release, 1.10.7), I see the same issue. I see the issue both on Enzyme 0.13.21 and 0.13.22 and main.

julia> versioninfo()
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 12 default, 1 interactive, 6 GC (on 12 virtual cores)
Environment:
  JULIA_SYSIMAGE_LIB_DIR = /home/lassepe/dotfiles/dotfiles/builds
  JULIA_LSP_JULIA_BIN = /home/lassepe/helperScripts/bin/julia_lsp_julia_bin
  JULIA_NUM_THREADS = auto,auto
Output of st -m

[621f4979] AbstractFFTs v1.5.0
[7d9f7c33] Accessors v0.1.39
[79e6a3ab] Adapt v4.1.1
[66dad0bd] AliasTables v1.1.3
[dce04be8] ArgCheck v2.4.0
[a9b6321e] Atomix v1.0.1
[198e06fe] BangBang v0.4.3
[9718e550] Baselet v0.1.1
[fa961155] CEnum v0.5.0
[082447d4] ChainRules v1.72.1
[d360d2e6] ChainRulesCore v1.25.0
[bbf7d656] CommonSubexpressions v0.3.1
[34da2185] Compat v4.16.0
[a33af91c] CompositionsBase v0.1.2
[187b0558] ConstructionBase v1.5.8
[6add18c4] ContextVariablesX v0.1.3
[9a962f9c] DataAPI v1.16.0
[864edb3b] DataStructures v0.18.20
[e2d170a0] DataValueInterfaces v1.0.0
[244e2a9f] DefineSingletons v0.1.2
[8bb1440f] DelimitedFiles v1.9.1
[163ba53b] DiffResults v1.1.0
[b552c78f] DiffRules v1.15.1
[ffbed154] DocStringExtensions v0.9.3
[7da242da] Enzyme v0.13.22
[f151be2c] EnzymeCore v0.8.8
[e2ba6199] ExprTools v0.1.10
[cc61a311] FLoops v0.2.2
[b9860ae5] FLoopsBase v0.1.1
[1a297f60] FillArrays v1.13.0
[587475ba] Flux v0.16.0
[f6369f11] ForwardDiff v0.10.38
[d9f16b24] Functors v0.5.2
[0c68f7d7] GPUArrays v11.1.0
[46192b85] GPUArraysCore v0.2.0
[61eb1bfa] GPUCompiler v1.0.1
[7869d1d1] IRTools v0.4.14
[22cec73e] InitialValues v0.3.1
[3587e190] InverseFunctions v0.1.17
[92d709cd] IrrationalConstants v0.2.2
[82899510] IteratorInterfaceExtensions v1.0.0
[692b3bcd] JLLWrappers v1.6.1
[b14d175d] JuliaVariables v0.2.4
[63c18a36] KernelAbstractions v0.9.31
[929cbde3] LLVM v9.1.3
[2ab3a3ac] LogExpFunctions v0.3.29
[7e8f7934] MLDataDevices v1.6.5
[d8e11817] MLStyle v0.4.17
[f1d291b0] MLUtils v0.4.4
[1914dd2f] MacroTools v0.5.13
[128add7d] MicroCollections v0.2.0
[e1d29d7a] Missings v1.2.0
[872c559c] NNlib v0.9.26
[77ba4419] NaNMath v1.0.2
[71a1bf82] NameResolution v0.1.5
[d8793406] ObjectFile v0.4.2
[0b1bfda6] OneHotArrays v0.2.6
[3bd65402] Optimisers v0.4.2
[bac558e1] OrderedCollections v1.7.0
[aea7be01] PrecompileTools v1.2.1
[21216c6a] Preferences v1.4.3
[8162dcfd] PrettyPrint v0.2.0
[33c8b6b6] ProgressLogging v0.1.4
[43287f4e] PtrArrays v1.2.1
[c1ae055f] RealDot v0.1.0
[189a3867] Reexport v1.2.2
[ae029012] Requires v1.3.0
[6c6a2e73] Scratch v1.2.1
[efcf1570] Setfield v1.1.1
[605ecd9f] ShowCases v0.1.0
[699a6c99] SimpleTraits v0.9.4
[a2af1166] SortingAlgorithms v1.2.1
[dc90abb0] SparseInverseSubset v0.1.2
[276daf66] SpecialFunctions v2.5.0
[171d559e] SplittablesBase v0.1.15
[90137ffa] StaticArrays v1.9.8
[1e83bf80] StaticArraysCore v1.4.3
[82ae8749] StatsAPI v1.7.0
[2913bbd2] StatsBase v0.34.4
⌃ [09ab397b] StructArrays v0.6.21
[53d494c1] StructIO v0.3.1
[3783bdb8] TableTraits v1.0.1
[bd369af6] Tables v1.12.0
[a759f4b9] TimerOutputs v0.5.26
[28d57a85] Transducers v0.4.84
[013be700] UnsafeAtomics v0.3.0
[e88e6eb3] Zygote v0.6.73
[700de1a5] ZygoteRules v0.2.5
[7cc45869] Enzyme_jll v0.0.168+0
[dad2f222] LLVMExtra_jll v0.0.34+0
[efe28fd5] OpenSpecFun_jll v0.5.5+0
[0dad84c5] ArgTools v1.1.1
[56f22d72] Artifacts
[2a0f44e3] Base64
[ade2ca70] Dates
[8ba89e20] Distributed
[f43a241f] Downloads v1.6.0
[7b1f6079] FileWatching
[9fa8497b] Future
[b77e0a4c] InteractiveUtils
[4af54fe1] LazyArtifacts
[b27032c2] LibCURL v0.6.4
[76f85450] LibGit2
[8f399da3] Libdl
[37e2e46d] LinearAlgebra
[56ddb016] Logging
[d6f4376e] Markdown
[a63ad114] Mmap
[ca575930] NetworkOptions v1.2.0
[44cfe95a] Pkg v1.10.0
[de0858da] Printf
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA v0.7.0
[9e88b42a] Serialization
[6462fe0b] Sockets
[2f01184e] SparseArrays v1.10.0
[10745b16] Statistics v1.10.0
[4607b0f0] SuiteSparse
[fa267f1f] TOML v1.0.3
[a4e569a6] Tar v1.10.0
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
[e66e0078] CompilerSupportLibraries_jll v1.1.1+0
[deac9b47] LibCURL_jll v8.4.0+0
[e37daf67] LibGit2_jll v1.6.4+0
[29816b5a] LibSSH2_jll v1.11.0+1
[c8ffd9c3] MbedTLS_jll v2.28.2+1
[14a3606d] MozillaCACerts_jll v2023.1.10
[4536629a] OpenBLAS_jll v0.3.23+4
[05823500] OpenLibm_jll v0.8.1+2
[bea87d4a] SuiteSparse_jll v7.2.1+1
[83775a58] Zlib_jll v1.2.13+1
[8e850b90] libblastrampoline_jll v5.8.0+1
[8e850ede] nghttp2_jll v1.52.0+1
[3f19e933] p7zip_jll v17.4.0+2
Info Packages marked with ⌃ have new versions available and may be upgradable.

@mcabbott
Copy link
Member

Do any of my other tests trigger this error for you?

For me it also works on 1.11:

julia> reproducer()
((layers = ((weight = Float32[-0.00023009 -0.0014316047  -0.0033742953 0.0007075156; -0.002407181 -0.014977321  -0.035301577 0.007401965;    

julia> VERSION
v"1.11.0"

(jl_wutXbb) pkg> st
Status `/private/var/folders/yq/4p2zwd614y59gszh7y9ypyhh0000gn/T/jl_wutXbb/Project.toml`
  [7da242da] Enzyme v0.13.22
  [587475ba] Flux v0.16.0 `~/.julia/dev/Flux`

julia> versioninfo()
Julia Version 1.11.0
Commit 501a4f25c2b (2024-10-07 11:40 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 11 × Apple M3 Pro
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m3)

@lassepe
Copy link
Contributor Author

lassepe commented Dec 15, 2024

FWIW, I see that you are on libLLVM 16 and I'm on 15

@mcabbott
Copy link
Member

mcabbott commented Dec 15, 2024

I see failure again on another machine, intel linux but libLLVM-16.0.6

Somehow this must have to do with logging macro @warn. Reference is to this line:

 [3] macro expansion
   @ ./logging/logging.jl:388

Maybe these rules don't cover enough?

julia> versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, broadwell)
Threads: 4 default, 0 interactive, 2 GC (on 12 virtual cores)
Environment:
  JULIA_NUM_THREADS = 4

(jl_7brNeg) pkg> st
Status `/tmp/jl_7brNeg/Project.toml`
  [7da242da] Enzyme v0.13.22
  [587475ba] Flux v0.16.0
  
julia> reproducer()
ERROR: Constant memory is stored (or returned) to a differentiable variable.
As a result, Enzyme cannot provably ensure correctness and throws this error.
This might be due to the use of a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Runtime-Activity).
If Enzyme should be able to prove this use non-differentable, open an issue!
To work around this issue, either:
 a) rewrite this variable to not be conditionally active (fastest, but requires a code change), or
 b) set the Enzyme mode to turn on runtime activity (e.g. autodiff(set_runtime_activity(Reverse), ...) ). This will maintain correctness, but may slightly reduce performance.
Mismatched activity for:   store {} addrspace(10)* %61, {} addrspace(10)* addrspace(10)* %.repack32, align 8, !dbg !262, !tbaa !264, !alias.scope !202, !noalias !266 const val:   %61 = call fastcc nonnull {} addrspace(10)* @julia_summary_15589({} addrspace(10)* nocapture noundef nonnull readonly align 8 dereferenceable(32) %1) #213, !dbg !252
Type tree: {[-1]:Pointer}
 llvalue=  %61 = call fastcc nonnull {} addrspace(10)* @julia_summary_15589({} addrspace(10)* nocapture noundef nonnull readonly align 8 dereferenceable(32) %1) #213, !dbg !252

Stacktrace:
 [1] #invokelatest#2
   @ ./essentials.jl:1057
 [2] invokelatest
   @ ./essentials.jl:1052
 [3] macro expansion
   @ ./logging/logging.jl:388
 [4] _match_eltype
   @ ~/.julia/packages/Flux/Mhg1r/src/layers/stateless.jl:60

Stacktrace:
  [1] #invokelatest#2
    @ ./essentials.jl:1057 [inlined]
  [2] invokelatest
    @ ./essentials.jl:1052 [inlined]
  [3] macro expansion
    @ ./logging/logging.jl:388 [inlined]
  [4] _match_eltype
    @ ~/.julia/packages/Flux/Mhg1r/src/layers/stateless.jl:60
  [5] _match_eltype
    @ ~/.julia/packages/Flux/Mhg1r/src/layers/stateless.jl:85 [inlined]
  [6] Dense
    @ ~/.julia/packages/Flux/Mhg1r/src/layers/basic.jl:198
  [7] macro expansion
    @ ~/.julia/packages/Flux/Mhg1r/src/layers/basic.jl:68 [inlined]
  [8] _applychain
    @ ~/.julia/packages/Flux/Mhg1r/src/layers/basic.jl:68
  [9] Chain
    @ ~/.julia/packages/Flux/Mhg1r/src/layers/basic.jl:65 [inlined]
 [10] #2
    @ ./REPL[5]:8 [inlined]
 [11] diffejulia__2_15066_inner_184wrap
    @ ./REPL[5]:0
 [12] macro expansion
    @ ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:5204 [inlined]
 [13] enzyme_call
    @ ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:4750 [inlined]
 [14] CombinedAdjointThunk
    @ ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:4622 [inlined]
 [15] autodiff(::EnzymeCore.ReverseMode{…}, ::EnzymeCore.Const{…}, ::Type{…}, ::EnzymeCore.Duplicated{…}, ::EnzymeCore.Const{…}, ::EnzymeCore.Const{…})
    @ Enzyme ~/.julia/packages/Enzyme/haqjK/src/Enzyme.jl:503
 [16] _enzyme_gradient(::Function, ::EnzymeCore.Duplicated{Flux.Chain{…}}, ::Vararg{Union{…}}; zero::Bool)
    @ FluxEnzymeExt ~/.julia/packages/Flux/Mhg1r/ext/FluxEnzymeExt/FluxEnzymeExt.jl:49
 [17] gradient(::Function, ::EnzymeCore.Duplicated{Flux.Chain{…}}, ::Vararg{Union{…}}; zero::Bool)
    @ Flux ~/.julia/packages/Flux/Mhg1r/src/gradient.jl:122
 [18] reproducer()
    @ Main ./REPL[5]:7
 [19] top-level scope
    @ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants