Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added rfft! and irfft! functionality through PaddedRFFTArray type. #54

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

favba
Copy link
Contributor

@favba favba commented Jan 5, 2018

This PR addresses #52. This is a rewrite of my code with fixes to address possible problems with usage of unsafe_wrap raised here. I don't fully understand those issues, so an expert review is needed to see if the changes I introduced indeed solved the problem.

The code is already functional. The test I wrote originally for my package are passing. The code still needs documentation of the new type and new functions.

@favba
Copy link
Contributor Author

favba commented Jan 10, 2018

Due to the rapid changes on Julia now, I'll wait for the release of v0.7 to finish up this PR.

@appleparan
Copy link
Contributor

Bump! Could you finish this up with v0.7-dev? It would be nice if this will release before v0.7 official release.

@favba
Copy link
Contributor Author

favba commented Mar 28, 2018

I think all to major changes on Julia have been made. I'll work on it again.

@favba
Copy link
Contributor Author

favba commented Mar 29, 2018

I managed to avoid using unsafe views and this PR is ready for technical review now. If it is acceptable, it will only be missing docstrings.
FYI: Using the new reinterpret still provides really poor performance:

julia> versioninfo()
Julia Version 0.7.0-DEV.4702
Commit 524710f60d (2018-03-26 19:44 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin17.4.0)
  CPU: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 2

julia> using FFTW

julia> a = PaddedRFFTArray(rand(8,8))
5×8 PaddedRFFTArray{Float64,2,false}:
 0.712979+0.28724im   0.509384+0.0177532im     0.958592+0.884517im   0.685513+0.401894im
 0.361746+0.369876im  0.968775+0.193823im      0.0521561+0.815999im   0.924453+0.528454im
 0.835827+0.403952im  0.775697+0.187061im       0.106656+0.0442893im  0.321887+0.859492im
 0.784735+0.141277im  0.436122+0.198054im       0.580609+0.22239im    0.266962+0.252452im
      0.0+0.0im            0.0+0.0im                 0.0+0.0im             0.0+0.0im

julia> rfft!(a)
5×8 PaddedRFFTArray{Float64,2,false}:
  32.2515+0.0im       -2.36603+0.797907im       2.8308-1.49447im     -2.36603-0.797907im
 0.521591+1.17158im  -0.190847-0.681372im     -0.334622+1.19408im     -2.46524-3.4627im
  1.14537-2.0369im    0.383983-0.265645im      -1.52369-1.83912im    0.0772888+2.13619im
  1.39388-1.77041im    1.46133+3.4213im        -1.50413+0.570226im  0.00780449-0.57811im
   2.6889+0.0im        3.28556-0.875104im       1.79742+1.84643im      3.28556+0.875104im

julia> b = Array(a)
5×8 Array{Complex{Float64},2}:
  32.2515+0.0im       -2.36603+0.797907im       2.8308-1.49447im     -2.36603-0.797907im
 0.521591+1.17158im  -0.190847-0.681372im     -0.334622+1.19408im     -2.46524-3.4627im
  1.14537-2.0369im    0.383983-0.265645im      -1.52369-1.83912im    0.0772888+2.13619im
  1.39388-1.77041im    1.46133+3.4213im        -1.50413+0.570226im  0.00780449-0.57811im
   2.6889+0.0im        3.28556-0.875104im       1.79742+1.84643im      3.28556+0.875104im

julia> using BenchmarkTools

julia> @benchmark @inbounds getindex($(b),1)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.542 ns (0.00% GC)
  median time:      1.549 ns (0.00% GC)
  mean time:        1.552 ns (0.00% GC)
  maximum time:     5.031 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark @inbounds getindex($(a.c),1)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     15.493 ns (0.00% GC)
  median time:      15.699 ns (0.00% GC)
  mean time:        15.729 ns (0.00% GC)
  maximum time:     50.897 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> @benchmark @inbounds getindex($(a),1)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.846 ns (0.00% GC)
  median time:      1.853 ns (0.00% GC)
  mean time:        1.856 ns (0.00% GC)
  maximum time:     9.709 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark sum($b)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     34.567 ns (0.00% GC)
  median time:      34.638 ns (0.00% GC)
  mean time:        34.957 ns (0.00% GC)
  maximum time:     91.601 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     993

julia> @benchmark sum($(a.c))
BenchmarkTools.Trial:
  memory estimate:  208 bytes
  allocs estimate:  6
  --------------
  minimum time:     865.900 ns (0.00% GC)
  median time:      875.400 ns (0.00% GC)
  mean time:        1.011 μs (9.66% GC)
  maximum time:     705.103 μs (99.77% GC)
  --------------
  samples:          10000
  evals/sample:     60

julia> @benchmark sum($(a))
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     35.123 ns (0.00% GC)
  median time:      35.186 ns (0.00% GC)
  mean time:        35.756 ns (0.00% GC)
  maximum time:     70.551 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     992

@favba
Copy link
Contributor Author

favba commented Mar 31, 2018

Did I mess this pull request up with incorrect rebase/merges stuff? Or this is the way to go? Should I start a new clean PR with those final changes?

@favba favba changed the title [W.I.P.] Added rfft! and irfft! functionality through PaddedRFFTArray type. Added rfft! and irfft! functionality through PaddedRFFTArray type. Jul 17, 2018
@favba
Copy link
Contributor Author

favba commented Jul 17, 2018

I consider this finished now. Julia v0.7 probably won't have any other disruptive change.
If this PR is accepted I could (slowly) start working on documentation...

d = data(A)
i = $ip
I = $t
@boundscheck checkbounds(d,i,I...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it also check bounds with i-1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess since ip=2*I2[1], if it is in-bounds then i-1 must also be in-bounds.

src/rfft!.jl Outdated
(1 in region) || throw(ArgumentError("The first dimension must always be transformed"))
if flags&PRESERVE_INPUT != 0
a = similar(X)
return rFFTWPlan{Complex{T},BACKWARD,true,N}(complex_view(a), real(a), region, flags,timelimit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? IIRC, PRESERVE_INPUT only affects out-of-place plans.

Copy link
Contributor Author

@favba favba Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my... That is reminiscent from early stage development, when I didn't notice that inplace r2c and c2r were already exposed on Julia. I tried hacking out-of-place plans to work inplace, thus this PRESERVE_INPUT flag.
I'll change this.

@testset "Read binary file to PaddedRFFTArray" begin
for s in ((8,4,4),(9,4,4),(8,),(9,))
aa = rand(Float64,s)
f = Base.Filesystem.tempname()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be able to test i/o with an IOBuffer rather than a file.

src/rfft!.jl Outdated
else
x = similar(X)
p = rFFTWPlan{T,FORWARD,true,N}(real(x), complex_view(x), region, flags, timelimit)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you are trying to avoid overwriting X during planning, but I don't think it's a good idea:

  • plan_fft is documented to overwrite the input during non-ESTIMATE planning, so we should be consistent.

  • Creating the plan for x = similar(X) could result in the plan not being applicable to X in the unlikely event that the X array is not 16-byte aligned.

@favba
Copy link
Contributor Author

favba commented Jul 19, 2018

Sorry for this late change. But I just observed that performance is slightly better if we use Colons instead of UnitRages for the array view:

julia> a = rand(128,128,128);

julia> b = view(a,1:126,:,:);

julia> c = view(a,1:126,1:128,1:128);

julia> using BenchmarkTools

julia> @benchmark sum($b)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.533 ms (0.00% GC)
  median time:      4.620 ms (0.00% GC)
  mean time:        4.689 ms (0.00% GC)
  maximum time:     6.707 ms (0.00% GC)
  --------------
  samples:          1064
  evals/sample:     1

julia> @benchmark sum($c)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.150 ms (0.00% GC)
  median time:      5.208 ms (0.00% GC)
  mean time:        5.267 ms (0.00% GC)
  maximum time:     7.113 ms (0.00% GC)
  --------------
  samples:          947
  evals/sample:     1

julia> @benchmark reduce($(*),$b)
BenchmarkTools.Trial:
  memory estimate:  560 bytes
  allocs estimate:  14
  --------------
  minimum time:     4.531 ms (0.00% GC)
  median time:      4.608 ms (0.00% GC)
  mean time:        4.678 ms (0.00% GC)
  maximum time:     6.404 ms (0.00% GC)
  --------------
  samples:          1067
  evals/sample:     1

julia> @benchmark reduce($(*),$c)
BenchmarkTools.Trial:
  memory estimate:  560 bytes
  allocs estimate:  14
  --------------
  minimum time:     5.161 ms (0.00% GC)
  median time:      5.227 ms (0.00% GC)
  mean time:        5.291 ms (0.00% GC)
  maximum time:     7.780 ms (0.00% GC)
  --------------
  samples:          943
  evals/sample:     1

julia> versioninfo()
Julia Version 0.7.0-beta2.0
Commit b145832402* (2018-07-13 19:54 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin17.6.0)
  CPU: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 2

src/rfft!.jl Outdated
rsize = (nx,rrsize[2:end]...)
r = view(rr,(1:l for l in rsize)...)
return new{T, N, N === 1 ? true : false}(rr,r,c)
r = view(rr, 1:nx, ntuple(i->Colon(),Nm1)...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass Val{Nm1}() here to get a compiler-unrolled ntuple on 0.7. On 0.6 you need Val{Nm1}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Val(n) is a pure function now I can just put Val(Nm1) instead of Val{Nm1}(), right?

@stevengj
Copy link
Member

LGTM, now just needs docs.

@stevengj
Copy link
Member

stevengj commented May 8, 2019

Would be good to get documentation so that we can merge this.

@favba
Copy link
Contributor Author

favba commented Jun 25, 2019

Just to give some update. I' currently finishing my thesis, so I don't think I'll be able to work on this until a month from...

@navidcy
Copy link

navidcy commented Feb 2, 2020

@favba news on this PR? Seems neat and it would be very nice to have it merged in FFTW.jl

@roflmaostc
Copy link

I would also very appreciate that feature!

@BioTurboNick
Copy link

I took a stab at reviving this, but it has some issues. My motivation was to allow simply replacing irfft with irfft! in code like ImageFiltering.jl, where the new type can be used internally to manage the memory. rfft! is the trickier one, given that the output array needs to be bigger than the input array.

There's little difference in performance because most of the time is spent in the C routines, but there is less memory churn. My version can be found here: https://github.com/BioTurboNick/FFTW.jl/tree/in-place-rfft

I don't know the API that well, so I've hit a stumbling block in a couple places. Maybe someone else can pick up from here.

@favba
Copy link
Contributor Author

favba commented Oct 9, 2021

The only missing piece for this PR is documentation, correct?
If someone could point me to any guidance on how to make the documentation correctly I could finally finish this PR.

@BioTurboNick, I didn't understand what is the missing functionality you are adding in your PR. Could you try clarifying that to me? Maybe it can be done here.

@BioTurboNick
Copy link

A few issues:

  • Not able to use irfft! directly on the output of rfft
  • real function should probably be renamed, as it is doing something quite different from the real function in Base.
  • An inconsistency: the irfft! and brfft! functions here allow the form irfft!(b, (1, 3)) but the existing functions require an integer to be provided e.g. irfft(b, 8, (1, 3))

@BioTurboNick
Copy link

Also, not sure if all the special indexing methods in this PR are necessary anymore, as those are noted as needed due to inefficiencies back in Julia v0.7.

@favba
Copy link
Contributor Author

favba commented Oct 9, 2021

Both issues

* Not able to use `irfft!` directly on the output of `rfft`

and

* An inconsistency: the `irfft!` and `brfft!` functions here allow the form `irfft!(b, (1, 3))` but the existing functions require an integer to be provided e.g. `irfft(b, 8, (1, 3))`

are related to the fact that there is not a one-to-one mapping between the size of the complex array and the corresponding real array after the inverse transform.

The PaddedRFFTArray already has the size information of the real view of the array, so it is really not necessary to inform it the the irfft! function.
It does become inconsistent with irfft, but irfft is really the oddball here. The proposed function signature on this PR is consistent with all other in-place inverse transforms: ifft!, bfft!, FFTW.idct! (FFTW.r2r! is an exception that requires a kind parameter...)

Now, in order to be able to use irfft! directly on the output of rfft, it seems to me we'd have to change the output of rfft to return a PaddedRFFTAraay (remember: we have to save the size information of the original real array...).
That would be quite a disruptive change... But it could be discussed in a separate issue after/if this PR is merged.

And I agree that

* `real` function should probably be renamed, as it is doing something quite different from the `real` function in Base.

@favba
Copy link
Contributor Author

favba commented Oct 9, 2021

Also, not sure if all the special indexing methods in this PR are necessary anymore, as those are noted as needed due to inefficiencies back in Julia v0.7.

Unfortunately, it seems those issues are still present on Julia 1.6.3
On my machine I have:

julia> versioninfo()
Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_CUDA_SILENT = true
  JULIA_MPI_PATH = /usr/local/Cellar/open-mpi/4.0.3
  JULIA_NUM_THREADS = 4

julia> a = rand(256,256);

julia> b = reinterpret(Complex{Float64},a);

julia> c = Array(b);

julia> using BenchmarkTools

julia> @benchmark sum($c)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):   9.504 μs  55.548 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):      9.745 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.203 μs ±  2.793 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▅▁                                                        ▁
  ████▇▅▅▆▄▁▁▃▅▁▁▁▄▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▆▅▄▁▃▁▃█▇▇▅▄▃▃▃▆ █
  9.5 μs       Histogram: log(frequency) by time      28.5 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sum($b)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  20.482 μs  130.839 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     21.765 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   23.524 μs ±   5.070 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▇██▅▃ ▁ ▃▃▄▄▄▄▄▄▂                        ▁▁     ▁           ▂
  ▆██████████████████▇▅▆▅▅▃▁▅▁▃▃▁▃▄▁▃▁▄▄▁▁▄▆██▇▅▅▄███▇▇▇▆▇▇▆▆▆ █
  20.5 μs       Histogram: log(frequency) by time      44.6 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

@BioTurboNick
Copy link

BioTurboNick commented Oct 9, 2021

Does my PR fail in some way? Because it implements what I suggested with respect to being able to use irfft! on rfft output. That is, without changing the return type of rfft.

@favba
Copy link
Contributor Author

favba commented Oct 9, 2021

Does my PR fail in some way? Because it implements what I suggested with respect to being able to use irfft! on rfft output. That is, without changing the return type of rfft.

Ok, I see it now... Since you kept the irfft signature, which requires explicitly the first dimension of the real array, you are able to provide that feature...

I think that can be easily incorporated into this PR, keeping different signatures for irfft! when it is applied to a PaddedRFFTArray or an Array{Complex{T}}.

And I think I know a way to make rfft! work at best effort at least with SubArrays such that the following would work:

a = rand(256,256)
b = rfft!(irfft!(rfft(a),size(a,1))) # same as rfft(a)

@BioTurboNick , do you mind If I incorporate those features you provided in your PR to this one? or would you rather keep working on that one?
The most important features, at least for me, already implemented on this PR are the efficient indexing and the capability of reading data directly to a PaddedRFFTArray, such that absolutely no copy of data is necessary. (I've been using it to perform FFTs of real data that hardly fits memory...)

@BioTurboNick
Copy link

Absolutely, go for it!

@favba
Copy link
Contributor Author

favba commented Oct 9, 2021

Maintaining good index performance while adding those features is trickier than I thought it would be...
I think I'll work on that on a different PR, as it will involve finding a different solution for the slow reinterpret problem.

@BioTurboNick
Copy link

Ah, dang. Fair enough. Wonder what would be necessary to resolve the underlying slowness... I have a tendency to poke around on problems like that.

@antoine-levitt
Copy link
Contributor

If it's a julia issue can you open an issue there? Maybe it can be fixed

@BioTurboNick
Copy link

BioTurboNick commented Oct 13, 2021

If it's a julia issue can you open an issue there? Maybe it can be fixed

I took a look and it appears to be a fairly low-level issue. At the bottom is one or more memcpy calls that are interposed between the array and the produced value.

I was about to suggest unsafe_wrap but you already moved away from that... But looking at the concerns around it, maybe there's a way to create the PaddedRFFTArray type to address these issues?

It sounds like the main issue is you want to ensure consumers can't normally get a reference to the wrapped data. Thus, it could be have a real mode or complex mode as far as external code is concerned.

Something like:

data = rand(256, 256)
a = PaddedRFFTArray(data) # copies real data into expanded array, starts in Real mode. Getindex/setindex all access via internal real_view.
rfft!(a) # real_view and complex_view used internally, not exported
# a is now in Complex mode. Getindex/setindex all access via internal complex_view
irfft!(a)
# a is now in Real mode again.

@timholy
Copy link
Member

timholy commented Oct 13, 2021

as it will involve finding a different solution for the slow reinterpret problem.

Try reinterpret(reshape, T, A). I've seen this be 20x faster than reshape(reinterpret(T, A), sz). Requires Julia 1.6+

Also, in case it's helpful: https://github.com/HolyLab/RFFT.jl. Steal anything you like, keeping in mind that was originally written back in the Julia 0.2 days or so and only minimally updated since then. (Prior to docstrings, Documenter.jl, my current understanding of fast & inferrable tuple handling, keyword arguments, etc.)

@favba
Copy link
Contributor Author

favba commented Oct 13, 2021

Try reinterpret(reshape, T, A). I've seen this be 20x faster than reshape(reinterpret(T, A), sz). Requires Julia 1.6+

I think the issue is at a lower level than that... see my benchmark here. It doesn't seem to involve any reshaping

I was about to suggest unsafe_wrap but you already moved away from that... But looking at the concerns around it, maybe there's a way to create the PaddedRFFTArray type to address these issues?

That is my plan... to fiddle with references and/or pointers again.
Basically, my idea is to change the data field from an Array{T,N} to something like WrappedArray{T,N,A} where WrappedArray wraps either a real-valued or a complex-valued array (A being the type of the array), depending on how you initialize it, but indexing it always return a real value.

I'd put the "unsafe" indexing routines inside this WrappedArray (in the case it is initialized with a complex array), so it is hidden in this extra layer, and maybe keep the current indexing strategy for the complex view of the array.
This WrappedArray would also be useful to initialize the PaddedRFFTArray directly with the complex data so

a = rand(256,256)
irff!(rfft(a),256)

should work.

@timholy
Copy link
Member

timholy commented Oct 13, 2021

It doesn't seem to involve any reshaping

And that's the problem.

julia> a = rand(256,256); ars = rand(2, 128, 256);

julia> b = reinterpret(Complex{Float64},a); brs = reinterpret(reshape, Complex{Float64}, ars);

julia> @btime sum($b);
  20.360 μs (0 allocations: 0 bytes)

julia> @btime sum($brs);
  11.583 μs (0 allocations: 0 bytes)

julia> @btime sum($c);
  11.362 μs (0 allocations: 0 bytes)

@favba
Copy link
Contributor Author

favba commented Oct 20, 2021

The new implementation is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants