Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compatibility with Flux #396

Open
CarloLucibello opened this issue Jan 8, 2021 · 25 comments
Open

compatibility with Flux #396

CarloLucibello opened this issue Jan 8, 2021 · 25 comments

Comments

@CarloLucibello
Copy link

Hi,
this is to enquire about the possibility of using Interpolations.jl to build image upsampling or downsampling layers in Flux.jl.
We recently added a bilinear upsampling function FluxML/NNlib.jl#262, but I was wondering if we could leverage instead some of the code here. The requisites would be

  • handle batch dimension
  • gradient computation
  • compatible with CuArrays

I'm not familiar with the codebase here, maybe this is a long shot, but worth making an attempt and creating awareness about the possibility of this kind of interaction.
Moreover gpu and automatic diff friendly interpolations would generally benefit the ML ecosystem.

Best,
CL

cc @johnnychen94

@mkitti
Copy link
Collaborator

mkitti commented Jan 8, 2021

Would compatible with CuArrays create a dependency on CUDA?

@CarloLucibello
Copy link
Author

Typically no, CuArrays just follow abstract array interface, although scalar indexing leads to horrible performance.
To address such cases a cuda kernel is needed (and the CUDA.jl dependence)

@maxfreu
Copy link

maxfreu commented Feb 5, 2021

Just having completed the upsampling code: I would now rather use KernelAbstractions.jl for such work, as it saves you from writing almost the same code twice. Plus I think it's a much lighter dependency than CUDA. In the end I imagine having super portable (CPU, GPU (Nvidia & AMD!)) code with gradients, which can be recycled in julia math, images and nnlib. It could even be used to write specialized methods for different dimension orderings and dispatch via NamedDims.

@moesphere
Copy link

Related to the gradient computation, it would be helpful to have a rrule and frule function defined, package ChainRulesCore, to be able to use Interpolations with AD, e.g. Zygote. A gradient function is already defined. Thus the only part missing that needs to be implemented is a gradient with respect to the fields of Interpolations objects, if I understand it correctly, see here

@DhairyaLGandhi
Copy link

Could you point to the gradient function?

@rick2047
Copy link
Contributor

rick2047 commented Apr 8, 2021

I don't know too much about the code, but simple debugging tells me there are 6 gradient functions

These have special definitions

  • Indexing.jl x2
  • monotonic.jl

These call the general gradient functions

  • Interpolations.jl
  • extrapolation.jl
  • scaling.jl

I suspect the indexing.jl one is the one which needs to be used here.

@DhairyaLGandhi
Copy link

Found it!

function gradient(itp::AbstractInterpolation, x::Vararg{UnexpandedIndexTypes})

@rick2047
Copy link
Contributor

Now that #414 is finished, how do we rewrite the upsampling functions? Going through the NNLib code, it seems like now that we have chainrules integration we can just rewrite stuff like upsample_nearest and its gradient. If its just that, I will open a PR their.

@mkitti
Copy link
Collaborator

mkitti commented Apr 17, 2021

Have you taken a look at http://juliamath.github.io/Interpolations.jl/latest/devdocs/ yet?

@rick2047
Copy link
Contributor

I have read that page and I think I get it, a bit. But I was wondering if we should rather write them as simple functions using interpolation objects. Like the upsample_nearest is something like

itp = interpolate(x,(NoInterp(),NoInterp()))

We already have parent axes, which can be used to determine size.

@DhairyaLGandhi
Copy link

Would it work similarly to the kernels we have now in NNlib? Do we get the correct gradients out etc?

Worth doing a comparison imo

@rick2047
Copy link
Contributor

I dont know what the kernel system is in NNLib (not familiar with that codebase at all). But I was thinking I can replace the code for upsample_nearest and its gradient function to the Zygote call. They have tests for these functions so rewriting should be easier.

@maxfreu
Copy link

maxfreu commented Apr 28, 2021

Hi! @DhairyaLGandhi this serves as a reply to FluxML/NNlibCUDA.jl#5 (comment), but I think it might be better to keep the discussion in one place :)

Yes, I would find it cool if it works and we can get rid of duplicate code (of which I really wrote a lot, sorry for that). But I don't know if the performance can match the handwritten kernels I ported from pytorch (at least for now). For example I benchmarked the CPU implementation of bilinear upsampling against imresize! in ImageTransformations.jl (which uses interpolations.jl under the hood) and NNlib was 2.2x faster, even single threaded and with sub-optimal memory layout. Furthermore the ImageTransformations.jl code doesn't work on gpus when scalar getindex is disallowed. So I don't really know what the best way forward is. Maybe to enhance interpolations.jl and then use it? But how should all the specialized GPU code be handled, like the setup for the kernel calls (setting thread & block size)? Create CUDAinterpolations.jl so that interpolations.jl doesn't have to depend on CUDA? Or simply proceed with collecting specialized code for neural networks / dl in NNlib and just pick the stuff which really makes things faster and simpler from other packages? Note that at least for upsampling / interpolations things can be simplified a lot by smarter dispatch, which I can maybe hand in two PRs down the line or so.

@rick2047

I dont know what the kernel system is in NNLib

I don't know either 😄 I just ported pytorch code brute force and it turned out to be quite fast, but also a bit unwieldy, which can probably be improved by properly leveraging the dispatch. The nearest neighbour part was written by @mcabott.

Lastly, maybe someone else can also benchmark the interpolations to check my results?

@kiranshila
Copy link

kiranshila commented May 26, 2021

Pretty confused as to the current state of this. I'm trying to run the same example from #414 but have the following error

using Interpolations
using Zygote
y = sin.(1:10)
itp = interpolate(y,BSpline(Cubic(Reflect(OnCell()))))
Zygote.gradient(itp, 2)
ERROR: ArgumentError: unable to check bounds for indices of type Interpolations.WeightedAdjIndex{4, Ratios.SimpleRatio{Int64}}

Did something change in this package or Zygote?

@kiranshila
Copy link

Ah I see, that merge hasn't made it to a release yet. Disregard me.

@jmsull
Copy link

jmsull commented Dec 14, 2021

Sorry to revive this - but the above example does not work when I run it despite the previous merge. Further, when I run the package tests in a fresh julia installation

(@v1.7) pkg> add Interpolations
(@v1.7) pkg> test Interpolations

on the latest version (0.13.4) I find the error:

ChainRulesCore: Error During Test at /Users/jsull/.julia/packages/Interpolations/3gTQB/test/chainrules.jl:12
  Test threw exception
  Expression: (Zygote.gradient(itp, 1))[1] == Interpolations.gradient(itp, 1)
  MethodError: no method matching (::ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}})(::SVector{1, Float64})
  Closest candidates are:
    (::ChainRulesCore.ProjectTo{T})(::ChainRulesCore.AbstractZero) where T at ~/.julia/packages/ChainRulesCore/sHMAp/src/projection.jl:120
    (::ChainRulesCore.ProjectTo{<:Number})(::ChainRulesCore.Tangent{<:Number}) at ~/.julia/packages/ChainRulesCore/sHMAp/src/projection.jl:192
    (::ChainRulesCore.ProjectTo{T})(::ChainRulesCore.Tangent{<:T}) where T at ~/.julia/packages/ChainRulesCore/sHMAp/src/projection.jl:142
    ...

for the test related to the previously discussed Zygote gradients issue. It seems like the problem is ChainRulesCore?

(I also find a many-times repeated error when running the tests in test/gradient.jl:216, but the place for discussing this is probably another issue.)

@mkitti
Copy link
Collaborator

mkitti commented Dec 14, 2021

Have you tried this on 1.6? There are known inference issues on 1.7

@mcabbott
Copy link
Contributor

The example from above reproduces this error.

julia> using Interpolations, Zygote
julia> y = sin.(1:10);
julia> itp = interpolate(y,BSpline(Cubic(Reflect(OnCell()))));

julia> z, back = Zygote.pullback(itp, 2.0);

julia> z
0.769963450028415

julia> back(1.0)
([-0.35017548837401463],)

julia> Zygote.gradient(itp, 2.0);
ERROR: MethodError: no method matching (::ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}})(::StaticArrays.SVector{1, Float64})

(jl_hY83L3) pkg> st
Status `/private/var/folders/yq/4p2zwd614y59gszh7y9ypyhh0000gn/T/jl_hY83L3/Project.toml`
  [a98d9a8b] Interpolations v0.13.4
  [e88e6eb3] Zygote v0.6.32

The error is coming from ChainRulesCore, but only because it's performing a sanity-check on the gradient returned by the rule. Above, back(1.0) avoids this, and it produces a 1-element vector as the gradient of a number, which doesn't make sense.

Where the bug is, or whether this used to work, I don't know.

@mcabbott
Copy link
Contributor

Oh, the problem is just that there has not been a release since #465. On master this works.

@jmsull
Copy link

jmsull commented Dec 14, 2021

Just switching to master worked for me - thanks for pointing that out and for the explanation!

@mkitti
Copy link
Collaborator

mkitti commented Dec 14, 2021

I just released a new version 0.13.5

@DhairyaLGandhi
Copy link

I'd love to have the gradient wrt to the original array as well, and at that point I believe we could close this. Is there some interest in putting together a PR to that end?

@stevengj
Copy link
Member

stevengj commented Nov 1, 2022

@DhairyaLGandhi, me too, but implementing that seems quite a bit harder, since it has to be done separately for each type of interpolation. One simple case that might be a good starting point would be to support linear gridded interpolation.

@rs1909
Copy link

rs1909 commented Nov 16, 2022

@DhairyaLGandhi @stevengj I would also love to have the gradient with respect to the data being interpolated. Especially for cubic splines, because I need smoothness... The problem is that I have no clue where to start implementing it. Does the interpolation involve solving equations, or are there some constant weights pre-computed? Also there is very little online about how cubic spline interpolation works in 2D or 3D. Any pointer to resources would be helpful. Is there any other package that could do this?

@rs1909
Copy link

rs1909 commented Nov 17, 2022

@DhairyaLGandhi @stevengj :
Gradient with respect to the image is after all not that complicated. The interpolation linearly depends on the image, so this gradient is in fact independent of the image.

Internally, the gradient is calculated by the weightedindexes function and then multiplied with the image data. This has a sparse format, because the interpolation result depends only locally on the image. This is the same internal API all throughout the various interpolations. The only task is to convert the tuple of 'WeightedAdjIndex' to a digestible format, which I can do.

I would be happy if 'weightedindexes' is exported as an API and then I could use it as the gradient. Could someone make that happen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests