-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhead #7
Comments
I think two things are happening.
I pushed a branch The results are @btime f3($r_3) # 1.871 ns (0 allocations: 0 bytes)
@btime StaticPolynomials.evaluate($P3, $r_3) # 15.597 ns (0 allocations: 0 bytes)
@btime StaticPolynomials.inline_evaluate($P3, $r_3) # 11.098 ns (0 allocations: 0 bytes)
@btime f4($r_4) # 8.719 ns
@btime StaticPolynomials.evaluate($P4, $r_4) # 24.174 ns (0 allocations: 0 bytes)
@btime StaticPolynomials.inline_evaluate($P4, $r_4) # 14.211 ns (0 allocations: 0 bytes) So the overhead drops significantly. My guess is that the remaining overhead is for multiplications with the coefficients and the fetching of the heap allocated coefficient array. |
Thank you. Have you tried using Calculus.simplify? |
I do not see where this would be useful, could you elaborate? |
using Calculus
ex = :( 1 * x^1 * y^2 )
Calculus.simplify(ex)
# :(x * y ^ 2) |
testing also the derivatives now: good news is your derivatives are much faster than
As I said above I am running these tests because (a) I really need the best performance I can get and (b) it feels that with code generation it should in principle be possible to match a "hand-coded" implementation. |
I pushed some changes to the inlining branch which enables the inlining also for the Jacobian and reduces that overhead quite a bit. But for large problems it seems that the performance even gets worse, so this cannot be applied without some care.
I agree there, but one reason for the performance difference is that currently we do not generate the same source code as you wrote since the code is optimized for general polynomials. Similarly, your specific polynomials only have coefficient one. That is a fact which can be nicely exploited in your application, but polynomials have in general non unit coefficients and to deal with special case just didn't seem worth it for me at the moment. |
ok, fair enough. for those small polynomials I can do everything by hand. Here is one component of a system that shows a bit better what I am interested in:
This is very impressive already, especially the fact that the cost of the gradient is only ~ factor 2. With a bit of hand-tuning I can get my |
It looks like I really need to give this a go - any concerns about "production use"? Any plans to register the package? |
Incidentally, while I have your attention :), I am trying to generate polynomials that are invariant w.r.t. certain permutation groups. Do you know of anything / anybody in the Julia environment that could help me? |
Okay the evaluation cost really does not seem to be so great, maybe I can optimize this a little bit more when I have some spare time. @btime StaticPolynomials.gradient($p10, $x) # 93.121 ns (0 allocations: 0 bytes)
@btime StaticPolynomials.evaluate_and_gradient($p10, $x) # 97.407 ns (0 allocations: 0 bytes) So if you have something like Newton's method where you need both values this can come really really handy.
Not really. I tried it in another package of mine and it worked flawlessly. You only need to keep in mind that the
I don't know anything in the Julia environment which could help with that. Is the tricky part figuring out the math or the software? If this only needs to be done once you could maybe use some computer algebra system like Macaulay 2. Funny thing, my PhD supervisor suggested that I should start to look into invariant theory. So I am not sure whether I can currently help you with this problem, but I would love to know more about the concrete problem. Also maybe Paul (@PBrdng) could help you. No promises, but maybe you could send an email with your concrete problem to the two of us? |
At the moment we are using MAGMA, but it takes forever and gives the most horrendous polynomials. We are now starting over just "guessing" the primary invariants by hand and then checking with MAGMA that they actually are correct, and then let MAGMA construct the secondary invariants. |
So btw you work with Sturmfels? Applications, algorithms, theory? He usually has some pretty exciting applications lined up. My own application is that we use invariant polynomials as descriptors of atomic environments and then construct high-accuracy approximations to the potential energy landscape (you can call it machine learning if you want but I hate that term) |
I am trying your
|
Btw, how would you rewrite this system (by hand) if you wanted to try and make
With |
I am working on the solution of systems of polynomials by (numerical) homotopy continuation. For this I am developing the package HomotopyContinuation.jl. There are already some software packages which do this (e.g. Bertini, PHCpack) but by using Julia it becomes much easier to adapt the algorithm to specific applications. Also things like this package help tremendously with the performance.
Sorry, I updated the branch and just inlined the normal evaluate (as well as the gradient) in order to benchmark the changes and got rid of the special function. Here are the results of the benchmark, which in general look good but I am a little bit concerned about the regression in the larger system |
I just tagged a first version (without any inlining) to get the registration done since this can take some time in my experience. |
Thanks - I also saw that the performance was not much better for a slightly larger system. |
Hi, as Sascha asked me to enter the discussion: Could you explain in more detail what you are aiming at? |
Happy to write more detail. here or better in a gutter channel? |
Email or here, either is fine for me. |
@saschatimme I've been experimenting with my own version of One possibility is that I am just sacrificing numerical stability for speed. Is this at all a concern for you? (Is your Horner scheme numerically stable?) EDIT: if this is the main issue then it might be interesting to have two types of PS: I can post some benchmarks here, if it is of interest, once I've managed to polish the code and put it in a public repository. |
I think it would be great to have a faster evaluation implementation available. So please share what you have :)
The current evaluation and gradient implementation should be numerically stable, and for certain applications it is probably important. Since the current evaluation is already implemented we could easily add another type parameter to give the user a choice whether speed is more important than accuracy. |
Did you do something recently to |
I found some inefficiencies in the generated code and got rid of them with #10 |
I think there might still be some things to be exploited, but I just wanted to say I am eternally grateful for your package and wanted to share a quick benchmark:
(P.S.: and I don't have to implement gradients) |
@cortner That's great to here! I also think that there are some things to be exploited. I think Also note that in Julia 0.7 the compile times as well as the performance improves significantly. :) julia> @btime evaluate($p10, $w)
39.783 ns (0 allocations: 0 bytes)
julia> @btime gradient($p10, $w)
76.963 ns (0 allocations: 0 bytes) vs. 0.6 julia> @btime evaluate($p10, $w)
56.208 ns (0 allocations: 0 bytes)
julia> @btime gradient($p10, $w)
103.934 ns (0 allocations: 0 bytes) |
@cortner closing this since we now have for the example system julia> @btime f10($y)
31.780 ns (0 allocations: 0 bytes)
394.1220308523676
julia> @btime $p10($y)
28.717 ns (0 allocations: 0 bytes)
394.1220308523676
julia> @btime gradient($p10, $y)
73.109 ns (0 allocations: 0 bytes)
julia> versioninfo()
Julia Version 0.7.0
Commit a4cb80f3ed (2018-08-08 06:46 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 4 |
To further improve the evaluation performance I think #22 is the way to go. |
I've been playing a bit testing the performance of this package. Taking one of your own tests (from some .md I found):
This is quite a high overhead. I also tested more complex polynomials, where my "manual" implementation is more in the range 50-100ns, and that case the overhead seems to reduce.
Questions:
I should say that my use-case is that I generate polynomials once and then call them millions of times, so a 20ns overhead can matter.
The text was updated successfully, but these errors were encountered: