Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fourier Layer Tests #35

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Fourier Layer Tests #35

wants to merge 1 commit into from

Conversation

ba2tro
Copy link

@ba2tro ba2tro commented Feb 19, 2022

I have added a couple of simple tests for Fourier Layer and DeepONet Layer. What more tests can we add for the same?
One thing that I wanted to add for test/deeponet.jl

  model1 = DeepONet((16, 22, 30), (1, 16, 24, 30), σ, tanh; init_branch=Flux.glorot_normal, bias_trunk=false)
  parameters = params(model1)

  branch = Chain(Dense(16, 22,init=Flux.glorot_normal), Dense(22, 30,init=Flux.glorot_normal))
  trunk = Chain(Dense(1, 16, bias=false), Dense(16, 24, bias=false), Dense(24, 30, bias=false))
  model2 = DeepONet(branch, trunk)

  model1(a,sensors)
  model2(a,sensors)
  #forward pass
  @test model1(a, sensors) ≈ model2(a, sensors)

  m1grad = Flux.Zygote.gradient((x,p)->sum(model1(x,p)),a,sensors)
  m2grad = Flux.Zygote.gradient((x,p)->sum(model2(x,p)),a,sensors)

  #gradients
  @test !iszero(m1grad)
  @test !iszero(m2grad)
  @test m1grad[1] ≈ m2grad[1] rtol=1e-12
  @test m1grad[2] ≈ m2grad[2] rtol=1e-12

but the problem is, making the parameters same for model1 and model2 doesn't seem feasible here. Besides, I wanted to know how to formulate a test for training of FNO and DeepONet?

@ChrisRackauckas
Copy link
Member

For the training test, let's start with regression testing. The tutorial has examples using them to solve some equations. Do that on a PDE with a known analytical solution and do a difference against the analytical solution, putting the tolerance just above the error of what you get locally. That would then trigger if the training ever gets worse. Usually taking the current loss and multiplying it by like 3 or 5 is a safe regression value.

@ba2tro
Copy link
Author

ba2tro commented Feb 19, 2022

Would we need a dataset in the library for that?

@ba2tro
Copy link
Author

ba2tro commented Feb 19, 2022

Maybe we can add a smaller version of the Burgers' equation dataset which contains just what we need for the test, because the whole data file is like 600MB

@ba2tro
Copy link
Author

ba2tro commented Feb 19, 2022

It has data for 2048 initial conditions at 8192 points for each😅. We can use like 100 - 200 ICs at 1024 points(just like the tutorial), would that work?

@ba2tro
Copy link
Author

ba2tro commented Feb 20, 2022

So I tried implementing a training test for fourier layer, I think there could be a bug here, I have followed the burgers' equation example here
Code:

vars = matread("burgerset.mat")
xtrain = vars["a"][1:280, :]
xtest = vars["a"][end-19:end, :]
ytrain = vars["u"][1:280, :]
ytest = vars["u"][end-19:end, :]

grid = collect(range(0, 1, length=length(xtrain[1,:])))

xtrain = cat(reshape(xtrain,(280,1024,1)),
            reshape(repeat(grid,280),(280,1024,1));
            dims=3)
ytrain = cat(reshape(ytrain,(280,1024,1)),
            reshape(repeat(grid,280),(280,1024,1));
            dims=3)
xtest = cat(reshape(xtest,(20,1024,1)),
            reshape(repeat(grid,20),(20,1024,1));
            dims=3)
ytest = cat(reshape(ytest,(20,1024,1)),
        reshape(repeat(grid,20),(20,1024,1));
        dims=3)

xtrain, xtest = permutedims(xtrain,(3,2,1)), permutedims(xtest,(3,2,1))
ytrain, ytest = permutedims(ytrain,(3,2,1)), permutedims(ytest,(3,2,1))

train_loader = Flux.Data.DataLoader((xtrain, ytrain), batchsize=20, shuffle=true)
test_loader = Flux.Data.DataLoader((xtest, ytest), batchsize=20, shuffle=false)

layer = FourierLayer(128,128,1024,16,gelu,bias_fourier=false)

model = Chain(Dense(2,128;bias=false), layer, layer, layer, layer,
            Dense(128,2;bias=false))

learning_rate = 0.001
opt = ADAM(learning_rate)

parameters = params(model)

loss(x,y) = Flux.Losses.mse(model(x),y)
evalcb() = @show(loss(xtest,ytest))
throttled_cb = Flux.throttle(evalcb, 5)

Flux.@epochs 500 Flux.train!(loss, parameters, train_loader, opt, cb = throttled_cb)

Error:

MethodError: no method matching batched_gemm(::Char, ::Char, ::Array{ComplexF64, 3}, ::Array{ComplexF32, 3})
Closest candidates are:
  batched_gemm(::AbstractChar, ::AbstractChar, ::AbstractArray{ComplexF64, 3}, 
!Matched::AbstractArray{ComplexF64, 3}) at C:\Users\user\.julia\packages\BatchedRoutines\4RDBA\src\blas.jl:137
  batched_gemm(::AbstractChar, ::AbstractChar, !Matched::ComplexF32, ::AbstractArray{ComplexF32, 3}, 
!Matched::AbstractArray{ComplexF32, 3}) at C:\Users\user\.julia\packages\BatchedRoutines\4RDBA\src\blas.jl:134
  batched_gemm(::AbstractChar, ::AbstractChar, !Matched::AbstractArray{ComplexF32, 3}, 
::AbstractArray{ComplexF32, 3}) at C:\Users\user\.julia\packages\BatchedRoutines\4RDBA\src\blas.jl:137

...
in eval at base\boot.jl:373
in top-level scope at Juno\n6wyj\src\progress.jl:119
in macro expansion at [Flux\qAdFM\src\optimise\train.jl:144]
(#35)
in at Flux\qAdFM\src\optimise\train.jl:105
in var"#train!#36" at Flux\qAdFM\src\optimise\train.jl:107
in macro expansion at Juno\n6wyj\src\progress.jl:119
in macro expansion at [Flux\qAdFM\src\optimise\train.jl:109]
(#35)
in gradient at Zygote\FPUm3\src\compiler\interface.jl:75
in pullback at Zygote\FPUm3\src\compiler\interface.jl:352
in _pullback at Zygote\FPUm3\src\compiler\interface2.jl
in _pullback at Flux\qAdFM\src\optimise\train.jl:110
in _pullback at ZygoteRules\AIbCs\src\adjoint.jl:65
in adjoint at Zygote\FPUm3\src\lib\lib.jl:200
in _apply at base\boot.jl:814
in _pullback at Zygote\FPUm3\src\compiler\interface2.jl
in _pullback at fourier_tests.jl:135
in _pullback at Zygote\FPUm3\src\compiler\interface2.jl
in _pullback at Flux\qAdFM\src\layers\basic.jl:49
in _pullback at Zygote\FPUm3\src\compiler\interface2.jl
in _pullback at Flux\qAdFM\src\layers\basic.jl:47
in _pullback at Zygote\FPUm3\src\compiler\interface2.jl
in _pullback at Flux\qAdFM\src\layers\basic.jl:47
in _pullback at Zygote\FPUm3\src\compiler\interface2.jl
in _pullback at [dev\OperatorLearning\src\FourierLayer.jl:115]
(#35)
in _pullback at Zygote\FPUm3\src\compiler\interface2.jl
in _pullback at OMEinsum\EMISk\src\interfaces.jl:204
in _pullback at Zygote\FPUm3\src\compiler\interface2.jl:9
in macro expansion at [Zygote\FPUm3\src\compiler\interface2.jl]
(#35)
in chain_rrule at [Zygote\FPUm3\src\compiler\chainrules.jl:216]
(#35)
in rrule at ChainRulesCore\uxrij\src\rules.jl:134
in rrule at OMEinsum\EMISk\src\autodiff.jl:33
in einsum at OMEinsum\EMISk\src\interfaces.jl:200
in einsum at OMEinsum\EMISk\src\binaryrules.jl:98
in einsum at OMEinsum\EMISk\src\binaryrules.jl:226
in _batched_gemm at OMEinsum\EMISk\src\utils.jl:119

Both xtrain and grid are Float64 here. When I make them Float32 explicitly this error resolves and the model trains normally. I did the same to avoid it here

https://github.com/Abhishek-1Bhatt/OperatorLearning.jl/blob/c92d3ed1eca77ea61b756864bded99e6f42dc878/test/fourierlayer.jl#L34-L35

@ba2tro
Copy link
Author

ba2tro commented Feb 20, 2022

The test for DeepONet works fine

@ChrisRackauckas
Copy link
Member

Then split out the DeepONet tests so those can merge quicker while the other ones are investigated.

@codecov
Copy link

codecov bot commented Feb 20, 2022

Codecov Report

Merging #35 (c92d3ed) into master (9b16e02) will increase coverage by 16.43%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #35       +/-   ##
===========================================
+ Coverage   41.09%   57.53%   +16.43%     
===========================================
  Files           6        6               
  Lines          73       73               
===========================================
+ Hits           30       42       +12     
+ Misses         43       31       -12     
Impacted Files Coverage Δ
src/DeepONet.jl 60.00% <0.00%> (+20.00%) ⬆️
src/FourierLayer.jl 74.19% <0.00%> (+29.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b16e02...c92d3ed. Read the comment docs.

@pzimbrod
Copy link
Contributor

Yeah, somehow FourierLayer doesn't promote its parameters data type to match the inputs although it should. I haven't found the culprit for sure but the no. 1 suspects are the tensor multiplications:

@ein linear[batch, out, grid] := Wl[out, in] * xp[batch, in, grid]

@ein 𝔉[batch, out, grid] := Wf[in, out, grid] * rfft(xp, 3)[batch, in, grid]

I'm working on switching those implementations out for more specialized code anyways in #31, but the problem might well be elsewhere - that's just my best guess so far.

@ba2tro
Copy link
Author

ba2tro commented Feb 21, 2022

There's one last thing, for reading the data from .mat file would we need MAT.jl as one of the dependencies. Do I run add MAT when OperatorLearning env is activated to do add it to Project.toml?

@ba2tro ba2tro changed the title Some tests for forward pass and gradients Fourier Layer Tests Feb 21, 2022
@pzimbrod
Copy link
Contributor

There's one last thing, for reading the data from .mat file would we need MAT.jl as one of the dependencies. Do I run add MAT when OperatorLearning env is activated to do add it to Project.toml?

Yep. However, I wouldn't include MAT as a package dependency, only for testing. We can either put it as a test-specific dependency in the main Project.toml, but will be deprecated in the future as described here. I would rather have a completely separate environment for tests by creating test/Project.toml where MAT is included, as advised in the linked docs above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants