Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute grids and coeffs #61

Closed
wants to merge 3 commits into from
Closed

Precompute grids and coeffs #61

wants to merge 3 commits into from

Conversation

oxinabox
Copy link
Member

@oxinabox oxinabox commented Feb 2, 2020

grids and coeffs are always the same for standard grids for given p and q.

Precomputing them helps decrease the cost assoicated with adapt in the gradients.
This also fixes up some of the type instability with the central gradient, during adapt.

using BenchmarkTools
using FiniteDifferences

const _fdm = central_fdm(2,1);
const _fdm5 = central_fdm(5,1);

const xs = collect(1:0.1:200);
f(x) = sum(sin, x);
@btime grad($_fdm, $f, $xs);
@btime grad($_fdm5, $f, $xs);

Current

julia> @btime grad($_fdm, $f, $xs);
  202.552 ms (154789 allocations: 4.26 MiB)

julia> @btime grad($_fdm5, $f, $xs);
  413.311 ms (162753 allocations: 5.45 MiB)

New with Precomputed coeffs

julia> @btime grad($_fdm, $f, $xs);
  195.610 ms (53759 allocations: 1.96 MiB)

julia> @btime grad($_fdm5, $f, $xs);
  386.412 ms (53759 allocations: 1.87 MiB)

@oxinabox
Copy link
Member Author

oxinabox commented Feb 2, 2020

From stacking this ontop of #60:

julia> @btime grad($_fdm, $f, $xs);
  193.883 ms (51768 allocations: 1.93 MiB)

julia> @btime grad($_fdm5, $f, $xs);
  400.359 ms (51768 allocations: 1.84 MiB)

With #60 alone:

julia> @btime grad($_fdm, $f, $xs);
  200.205 ms (49777 allocations: 2.66 MiB)

julia> @btime grad($_fdm5, $f, $xs);
  398.832 ms (59732 allocations: 3.87 MiB)

interesting that it seems to make the central_fdm(2) case slightly worse.
Not sure whats up with that.

1 similar comment
@oxinabox
Copy link
Member Author

oxinabox commented Feb 2, 2020

From stacking this ontop of #60:

julia> @btime grad($_fdm, $f, $xs);
  193.883 ms (51768 allocations: 1.93 MiB)

julia> @btime grad($_fdm5, $f, $xs);
  400.359 ms (51768 allocations: 1.84 MiB)

With #60 alone:

julia> @btime grad($_fdm, $f, $xs);
  200.205 ms (49777 allocations: 2.66 MiB)

julia> @btime grad($_fdm5, $f, $xs);
  398.832 ms (59732 allocations: 3.87 MiB)

interesting that it seems to make the central_fdm(2) case slightly worse.
Not sure whats up with that.

@KristofferC
Copy link

Instead of precomputing you could memoize. Avoids having to determine upfront what exact combinations should be precomputed. Taking the cost once per session is probably ok.

src/methods.jl Outdated Show resolved Hide resolved
@eval begin
# Compatibility layer over the "old" API
function $fdmf(p::Integer, q::Integer; adapt=1, kwargs...)
_dep_kwarg(kwargs)
return $D(p, q; adapt=adapt, kwargs...)
end

# precompute a bunch of grids and coefs
const $precomputes = ntuple($num_precomputed) do p
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be defined in a constant?
Or should it be declared outside the eval, and then inlined in as a local literal
and thus should constant fold

@oxinabox
Copy link
Member Author

Instead of precomputing you could memoize. Avoids having to determine upfront what exact combinations should be precomputed. Taking the cost once per session is probably ok.

Yeah, good idea.

I think I might have looked into it, and struggled to workout how to do it without getting allocations.
The precomputes use tuples which are very compiler friendly.

But I can't recall if I did, so I should check again

@oxinabox oxinabox mentioned this pull request Apr 15, 2020
@oxinabox
Copy link
Member Author

No point with #73

@oxinabox oxinabox closed this Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants