Use `destructure` from Optimisers.jl #1901

mcabbott · 2022-03-05T22:38:06Z

This deletes destructure here to use the function from Optimisers.jl, of the same name and almost the same behaviour, except that:

it only includes trainable parameters, replacing Have destructure return only trainable params #1742, and
it should fix many bugs: Deal with missing gradients, shared parameters, etc.

~~But, of course, there are bugs, Flux's tests fail locally.~~

~~Should also add more tests...~~ Now includes tests from these issues:

Fixes #1826
Fixes #1767
Fixes #1601
Fixes #1727

Fixes #1733, closes #1742 (replaces)

mcabbott · 2022-03-07T16:51:13Z

Failures in Metalhead look like this, I don't see them on its master, not sure if they are caused by Flux:

Inception3: Error During Test at /home/runner/work/Flux.jl/Flux.jl/downstream/test/convnets.jl:74
[332](https://github.com/FluxML/Flux.jl/runs/5451760541?check_suite_focus=true#step:6:332)
  Test threw exception
[333](https://github.com/FluxML/Flux.jl/runs/5451760541?check_suite_focus=true#step:6:333)
  Expression: size(m(rand(Float32, 299, 299, 3, 2))) == (1000, 2)
[334](https://github.com/FluxML/Flux.jl/runs/5451760541?check_suite_focus=true#step:6:334)
  MethodError: no method matching cat_channels(::Array{Float32, 4}, ::Array{Float32, 4}, ::Array{Float32, 4}, ::Array{Float32, 4})
[335](https://github.com/FluxML/Flux.jl/runs/5451760541?check_suite_focus=true#step:6:335)
  Closest candidates are:
[336](https://github.com/FluxML/Flux.jl/runs/5451760541?check_suite_focus=true#step:6:336)
    cat_channels(::Any, ::Any) at ~/work/Flux.jl/Flux.jl/downstream/src/utilities.jl:37
[337](https://github.com/FluxML/Flux.jl/runs/5451760541?check_suite_focus=true#step:6:337)
  Stacktrace:
[338](https://github.com/FluxML/Flux.jl/runs/5451760541?check_suite_focus=true#step:6:338)
    [1] (::Parallel{typeof(Metalhead.cat_channels), Tuple{Chain{Tuple{Conv{2, 4, typeof(identity),

Failures in FastAI and OperatorLearning are about params export. Haven't checked whether they run the rest of their tests after failure.

darsnack · 2022-03-07T17:23:28Z

Metalhead failures are unrelated to this PR: FluxML/Metalhead.jl#127

mcabbott · 2022-03-08T00:23:06Z

Project.toml

-version = "0.13.0-DEV"
+version = "0.12.99"


Shouldn't forget to change this back! Just trying to ensure downstream tests do run.

mcabbott · 2022-03-08T00:28:46Z

src/functor.jl

@@ -4,8 +4,6 @@ using Zygote: IdSet
 import Functors: Functors, @functor, functor, fmap, isleaf
 using SparseArrays: AbstractSparseArray

-trainable(m) = functor(m)[1]


This caused:

julia> using Flux [ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c] WARNING: Method definition trainable(Any) in module Optimisers at /Users/me/.julia/dev/Optimisers/src/interface.jl:69 overwritten in module Flux at /Users/me/.julia/dev/Flux/src/functor.jl:7. ** incremental compilation may be fatally broken for this module **

mcabbott · 2022-03-08T00:35:23Z

@gabrevaya this might solve the example here: #1353 (comment) But what it opt?

julia> res = GalacticOptim.solve(optprob, opt, data, cb = cb)
ERROR: UndefVarError: opt not defined

codecov-commenter · 2022-03-08T05:50:55Z

Codecov Report

Merging #1901 (91d88fe) into master (ed78e8a) will decrease coverage by 0.14%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1901      +/-   ##
==========================================
- Coverage   86.57%   86.42%   -0.15%     
==========================================
  Files          18       18              
  Lines        1445     1422      -23     
==========================================
- Hits         1251     1229      -22     
+ Misses        194      193       -1

Impacted Files	Coverage Δ
src/functor.jl	`88.13% <ø> (-0.20%)`	⬇️
src/utils.jl	`95.71% <ø> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed78e8a...91d88fe. Read the comment docs.

CarloLucibello · 2022-03-08T06:00:33Z

.github/workflows/Downstream.yml

@@ -27,7 +28,7 @@ jobs:
          - {user: SciML, repo: DiffEqFlux.jl, group: Layers}
          - {user: SciML, repo: NeuralPDE.jl, group: NNPDE}
          - {user: SciML, repo: OperatorLearning.jl, group: All}
-    if: contains(github.event.pull_request.labels.*.name, 'run downstream test')
+    # if: contains(github.event.pull_request.labels.*.name, 'run downstream test')


This change should be removed before merging.

You are thinking to save CI time?

Only the NNPDE one is long, 45 minutes. So I think downstream tests are quicker in total than the main tests. I was wondering whether it would be simplest to always run them, but not sure.

Maybe ideal would be to set things up that the basic test on Julia 1 must pass before any others are run. I think CUDA.jl has something like that. But not sure how easy it is to set up.

Even if they don't take more time it's still wasted energy in most cases. Adding a label to sensitive PRs seems an effortless compromise.

CarloLucibello · 2022-03-11T20:12:28Z

As an extra layer of caution we could rerun downstream tests for NeuralPDE and DiffEqFlux with group All

mcabbott · 2022-03-12T02:04:06Z

Tests for DiffEqFlux.jl don't run, sounds like there is a manifest checked in which wants NNlib 0.7, but not sure where:

┌ Warning: Could not use exact versions of packages in manifest, re-resolving
[637](https://github.com/FluxML/Flux.jl/runs/5516554350?check_suite_focus=true#step:6:637)
└ @ Pkg.Operations /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Pkg/src/Operations.jl:1488
[638](https://github.com/FluxML/Flux.jl/runs/5516554350?check_suite_focus=true#step:6:638)
┌ Info: Not compatible with this release. No problem.
[639](https://github.com/FluxML/Flux.jl/runs/5516554350?check_suite_focus=true#step:6:639)
│   exception =
[640](https://github.com/FluxML/Flux.jl/runs/5516554350?check_suite_focus=true#step:6:640)
│    Unsatisfiable requirements detected for package NNlib [872c559c]:
[641](https://github.com/FluxML/Flux.jl/runs/5516554350?check_suite_focus=true#step:6:641)

Tests for NeuralPDE.jl fail, not sure why.

3794
Neural adapter Poisson equation, strategy: QuasiRandomTraining
[3795](https://github.com/FluxML/Flux.jl/runs/5516554400?check_suite_focus=true#step:6:3795)
NeuralAdapter: Test Failed at /home/runner/work/Flux.jl/Flux.jl/downstream/test/neural_adapter_tests.jl:117
[3796](https://github.com/FluxML/Flux.jl/runs/5516554400?check_suite_focus=true#step:6:3796)
  Expression: ≈(u_predict, u_real, rtol = 0.1)
[3797](https://github.com/FluxML/Flux.jl/runs/5516554400?check_suite_focus=true#step:6:3797)
   Evaluated: [-0.0854166448535627 -0.08352062584364406 … 0.07674157080594485 0.07800274129936807; -0.08466057494953605 -0.08276370383344656 … 0.07734593215866198 0.0786038386001856; … ; -0.010857086294706109 -0.009036067653524105 … 0.1257043933646262 0.12661947080281216; -0.010144510500895154 -0.008325800824682428 … 0.1260858784271197 0.12699759752385437] ≈ [0.0 0.0 … 0.0 0.0; 0.0 4.998355282382539e-5 … 4.9983552823825304e-5 1.9487653202985864e-19; … ; 0.0 4.9983552823825304e-5 … 4.99835528238252e-5 1.9487653202985828e-19; 0.0 1.9487653202985864e-19 … 1.9487653202985828e-19 7.597871817923733e-34] (rtol=0.1)
[3798](https://github.com/FluxML/Flux.jl/runs/5516554400?check_suite_focus=true#step:6:3798)
Stacktrace:
[3799](https://github.com/FluxML/Flux.jl/runs/5516554400?check_suite_focus=true#step:6:3799)
 [1] top-level scope
[3800](https://github.com/FluxML/Flux.jl/runs/5516554400?check_suite_focus=true#step:6:3800)
   @ /opt/hostedtoolcache/julia/1.7.2/x64/share/julia/stdlib/v1.7/Test/src/Test.jl:445

Maybe @ChrisRackauckas knows?

Both packages use destructure a lot.

ChrisRackauckas · 2022-03-12T08:18:17Z

We don't intentionally restrict NNLib to v0.7 anywhere that I know of. I'm rather surprised that it's not using v0.8 given the compats I see on the repos.

ChrisRackauckas · 2022-03-12T09:35:17Z

Test deps were done incorrectly. A new patch fixes that.

mcabbott · 2022-03-12T18:52:53Z

Great. Any idea why NeuralPDE tests fail, or how to isolate one?

ChrisRackauckas · 2022-03-13T05:28:01Z

Re-run the NeuralPDE one and see if you get the same result or if it was just a bad initial condition? It's not failing other downstream tests

ChrisRackauckas · 2022-03-13T07:10:42Z

@yuehhua and @avik-pal there was another breaking change in GeometricFlux?

@mcabbott the NeuralPDE failure looks real, as in, that's one of the few cases that uses Chain instead of FastChain, and it just so happens to now be failing but I cannot reproduce it in the repo's CI. @zoemcc can maybe help isolate this?

yuehhua · 2022-03-13T07:15:52Z

@ChrisRackauckas If there are breaking changes, I will tag minor release for GeometricFlux. Currently, the breaking changes in GeometricFlux will be the message-passing network part. Did something happened here?

mcabbott · 2022-03-14T03:36:36Z

For this:
https://github.com/SciML/NeuralPDE.jl/blob/master/test/neural_adapter_tests.jl#L1-L121
I can't seem to load the necessary packages. Can anyone isolate this? Is it a bug in the new implementation, or just a change like only trainable parameters, or reliance on a bug in the old?

For DiffEqFlux, current error looks unrelated

┌ Warning: Assignment to `gradsc` in soft scope is ambiguous because a global variable by the same name exists: `gradsc` will be treated as a new local. Disambiguate by using `local gradsc` to suppress this warning or `global gradsc` to assign to the existing global variable.
[1262](https://github.com/FluxML/Flux.jl/runs/5526004094?check_suite_focus=true#step:6:1262)
└ @ ~/work/Flux.jl/Flux.jl/downstream/test/neural_de.jl:447
[1263](https://github.com/FluxML/Flux.jl/runs/5526004094?check_suite_focus=true#step:6:1263)
Test Summary:   | Pass  Broken  Total
[1264](https://github.com/FluxML/Flux.jl/runs/5526004094?check_suite_focus=true#step:6:1264)
Neural DE Tests |  149      50    199
[1265](https://github.com/FluxML/Flux.jl/runs/5526004094?check_suite_focus=true#step:6:1265)
Test Summary:             | Pass  Total
[1266](https://github.com/FluxML/Flux.jl/runs/5526004094?check_suite_focus=true#step:6:1266)
Augmented Neural DE Tests |    8      8
[1267](https://github.com/FluxML/Flux.jl/runs/5526004094?check_suite_focus=true#step:6:1267)
Neural Graph DE: Error During Test at /home/runner/.julia/packages/SafeTestsets/A83XK/src/SafeTestsets.jl:25
[1268](https://github.com/FluxML/Flux.jl/runs/5526004094?check_suite_focus=true#step:6:1268)
  Got exception outside of a @test
[1269](https://github.com/FluxML/Flux.jl/runs/5526004094?check_suite_focus=true#step:6:1269)
  LoadError: MethodError: no method matching GeometricFlux.GCNConv(::GraphSignals.FeaturedGraph{GraphSignals.SparseGraph{false, SparseArrays.SparseMatrixCSC{Float32, Int64}, Vector{Int64}, Int64}, FillArrays.Fill{Float32, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, FillArrays.Fill{Float32, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, FillArrays.Fill{Float32, 1, Tuple{Base.OneTo{Int64}}}}, ::Pair{Int64, Int64})
[1270](https://github.com/FluxML/Flux.jl/runs/5526004094?check_suite_focus=true#step:6:1270)
  Closest candidates are:
[1271](https://github.com/FluxML/Flux.jl/runs/5526004094?check_suite_focus=true#step:6:1271)
    GeometricFlux.GCNConv(::A, ::B, ::F) where {A<:(AbstractMatrix), B, F} at ~/.julia/packages/GeometricFlux/wJWol/src/layers/conv.jl:27
[1272](https://github.com/FluxML/Flux.jl/runs/5526004094?check_suite_focus=true#step:6:1272)
    GeometricFlux.GCNConv(::Pair{Int64, Int64}, ::Any; init, bias) at ~/.julia/packages/GeometricFlux/wJWol/src/layers/conv.jl:32
[1273](https://github.com/FluxML/Flux.jl/runs/5526004094?check_suite_focus=true#step:6:1273)

but it's unclear to me whether https://github.com/SciML/DiffEqFlux.jl/blob/master/test/runtests.jl runs all tests after one failure, or gives up.

mcabbott · 2022-03-20T14:48:21Z

Any luck isolating the NeuralPDE failure? (I think Chris meant to tag @zoemcc above, but version with a link won't notify.)

ChrisRackauckas · 2022-03-21T05:29:30Z

Looks like it was a fluke 🎉

mcabbott · 2022-03-21T13:46:29Z

Note that 91d88fe reverts the version & downstream changes --- they now pass in 2 mins for packages which don't declare compatibility with Flux 0.13.

Still easy to investigate locally by using Optimisers: destructure. But perhaps 10 days is enough, and we should merge this & revisit if there are in fact bugs?

ChrisRackauckas · 2022-03-21T13:48:10Z

Ahh, I thought that was a rebase. Yeah sorry, it seems like I messed up the ping earlier. Feel free to merge, and if NeuralPDE has a failure then at least we know why.

mcabbott marked this pull request as draft March 5, 2022 22:45

mcabbott added this to the v0.13 milestone Mar 5, 2022

mcabbott mentioned this pull request Mar 6, 2022

Missing vec in gradient of destructure FluxML/Optimisers.jl#59

Merged

mcabbott commented Mar 8, 2022

View reviewed changes

mcabbott force-pushed the destructure branch from a792707 to eced0af Compare March 8, 2022 04:52

mcabbott marked this pull request as ready for review March 8, 2022 05:10

CarloLucibello closed this Mar 11, 2022

CarloLucibello reopened this Mar 11, 2022

CarloLucibello approved these changes Mar 11, 2022

View reviewed changes

CarloLucibello added the run downstream test label Mar 12, 2022

mcabbott closed this Mar 13, 2022

mcabbott reopened this Mar 13, 2022

mcabbott added 3 commits March 20, 2022 22:40

rm destructure

aad1fbc

try to fix Downstream.yml by copying NNlib

dc487c6

Optimisers 0.2.1

26e8474

mcabbott added 5 commits March 20, 2022 22:40

rm trainable fallback defn

14302be

more tests

2ff00ba

test no longer broken

2d510d7

enlarge downstream for now

b90973a

revert steps for downstream testing

91d88fe

mcabbott force-pushed the destructure branch from 9d6e2a4 to 91d88fe Compare March 21, 2022 05:01

mcabbott merged commit 81eea84 into FluxML:master Mar 21, 2022

mcabbott deleted the destructure branch March 21, 2022 13:49

ChrisRackauckas mentioned this pull request Jun 11, 2022

neural_ode_sciml example fails when Dense layer replaced by GRU SciML/DiffEqFlux.jl#432

Closed

ChrisRackauckas mentioned this pull request Jul 1, 2022

Potential gradient issues with Flux chains when changing parameter type SciML/NeuralPDE.jl#533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `destructure` from Optimisers.jl #1901

Use `destructure` from Optimisers.jl #1901

mcabbott commented Mar 5, 2022 •

edited

Loading

mcabbott commented Mar 7, 2022

darsnack commented Mar 7, 2022

mcabbott Mar 8, 2022

mcabbott Mar 8, 2022

mcabbott commented Mar 8, 2022

codecov-commenter commented Mar 8, 2022 •

edited

Loading

CarloLucibello Mar 8, 2022

mcabbott Mar 11, 2022

CarloLucibello Mar 12, 2022

CarloLucibello commented Mar 11, 2022

mcabbott commented Mar 12, 2022 •

edited

Loading

ChrisRackauckas commented Mar 12, 2022

ChrisRackauckas commented Mar 12, 2022

mcabbott commented Mar 12, 2022

ChrisRackauckas commented Mar 13, 2022

ChrisRackauckas commented Mar 13, 2022

yuehhua commented Mar 13, 2022 •

edited

Loading

mcabbott commented Mar 14, 2022 •

edited

Loading

mcabbott commented Mar 20, 2022

ChrisRackauckas commented Mar 21, 2022

mcabbott commented Mar 21, 2022

ChrisRackauckas commented Mar 21, 2022

Use destructure from Optimisers.jl #1901

Use destructure from Optimisers.jl #1901

Conversation

mcabbott commented Mar 5, 2022 • edited Loading

mcabbott commented Mar 7, 2022

darsnack commented Mar 7, 2022

mcabbott Mar 8, 2022

Choose a reason for hiding this comment

mcabbott Mar 8, 2022

Choose a reason for hiding this comment

mcabbott commented Mar 8, 2022

codecov-commenter commented Mar 8, 2022 • edited Loading

Codecov Report

CarloLucibello Mar 8, 2022

Choose a reason for hiding this comment

mcabbott Mar 11, 2022

Choose a reason for hiding this comment

CarloLucibello Mar 12, 2022

Choose a reason for hiding this comment

CarloLucibello commented Mar 11, 2022

mcabbott commented Mar 12, 2022 • edited Loading

ChrisRackauckas commented Mar 12, 2022

ChrisRackauckas commented Mar 12, 2022

mcabbott commented Mar 12, 2022

ChrisRackauckas commented Mar 13, 2022

ChrisRackauckas commented Mar 13, 2022

yuehhua commented Mar 13, 2022 • edited Loading

mcabbott commented Mar 14, 2022 • edited Loading

mcabbott commented Mar 20, 2022

ChrisRackauckas commented Mar 21, 2022

mcabbott commented Mar 21, 2022

ChrisRackauckas commented Mar 21, 2022

Use `destructure` from Optimisers.jl #1901

Use `destructure` from Optimisers.jl #1901

mcabbott commented Mar 5, 2022 •

edited

Loading

codecov-commenter commented Mar 8, 2022 •

edited

Loading

mcabbott commented Mar 12, 2022 •

edited

Loading

yuehhua commented Mar 13, 2022 •

edited

Loading

mcabbott commented Mar 14, 2022 •

edited

Loading