-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler optimization tracker #3440
Comments
This is a really nice and useful tracker! We should also link these to our release notes. |
It's not obvious to me that you gave yourself credit for your awesome recent work in making Thanks for this list. Mildly embarrassed by how many of the issues were filed by me, without me being able to do much to fix most of them. |
...but here I go again... In addition to "flatten arrays of tuples" is it at all feasible to "flatten arrays of types with fixed size"? See https://groups.google.com/d/msg/julia-dev/QOZfPkdRQwk/O-DgzNxbQegJ. |
We did ask for "ungracious demands" from the outset ;-) |
Great stuff! I believe this will take the performance of Julia to a new height when done. |
- make it an error to resize an array with shared data (fixes #3430) - now able to use realloc to grow arrays (part of #3440, helps #3441) - the new scheme is simpler. one Array owns the data, instead of tracking the buffers separately as mallocptr_t - Array data can be allocated inline, with malloc, or from a pool
What about the kind of constant folding Stefan mentions in #2741 ? |
Thinking out loud: how hard would it be to set up something like http://speed.pypy.org/ ? |
@IainNZ; I'm working on that right now. :) We have a small corpus of performance tests in test/perf2 but I do believe we will need some work to come up with small, self-contained and relevant performance tests. I will open an issue regarding this in the near future, once I have more concrete work done. |
related: 620141b |
happens to make sparse mul faster; #942
this removes stores and loads, and if we can reduce the number of roots in a function to zero we save significant overhead. ref #3440
I'd be interested in helping out on one of these issues, but I don't want to step on anyone's toes or duplicate effort. Can I get some guidance on which of these issues is in need of manpower? |
@JeffBezanson Is "bounds check elimination" covered by |
No; ideally there should be some automatic bounds check removal. |
What level of automaticity do you have in mind? I'd love it if bounds-checking were automatically turned off for loops that look like:
|
Yes, that's the kind of case that can be handled automatically. |
By the way, the TBAA support in #5355 partially fixes the ``hoist 1-d array metadata loads'' item. It's partial because bounds checks will stop the hoisting because LLVM can't prove it's legal and profitable to hoist a load up over a branch. |
@staticfloat Is there any update on having a performance test suite and sth similar to speed.pypy.org. I found it very useful when working on optimizations (in addition to running the current tests to make sure I'm not breaking anything) to have somthing like this to measure the performance gain (or regression in some cases). Would be awesome to have github integration but sth I can run locally is also good enough. |
Running |
@yuyichao: There is also the Julia Speed Center. Edit: It appears to not have any new updates since June 2014 though, which was around when I last had a look at it. |
@staticfloat can shed some light on the history there. |
Yes, I think @yuyichao was hinting at that when he pinged me. Essentially, I just haven't had time to rewrite |
Is it fair to tick off SIMD types on this list with tuple overhaul? |
I think ensuring proper alignment for SIMD is still TODO.
|
My impression is that recent Intel SIMD hardware is much less performance picky about alignment than it used to be, but I have not run experiments. |
One bit that is (I think?) still missing is the ability to do vectorized arithmetic on tuples (or something that wraps them) without |
The optional (under -O) |
And getting good performance out of less-recent hardware shouldn't be ignored if it's reasonable to get. |
OK, this is great (thanks for the reference that led me here @timholy) Which boxes that are not checked are already largely in place? Thanks. |
I certainly am not qualified to give a whole list, but "more general inlining" and "inline declararation" are presumably largely done (there are still ambitions to introduce inlining as controlled from the call site), and there's already some bounds check elimination and SIMD support. |
You may want to add |
Checked off a few more. We now have all of these, or they are tracked in another existing issues. |
This is an umbrella issue for compiler and other low-level optimizations. I'm including ones that are already done, to make it more interesting.
compiler:
apply(f, t::(T,S)) => f(t[1],t[2])
apply(f, (x,y)) => f(x,y)
isdefined
on normal objects in codegen #19334)RTS:
larger projects:
performance-related features:
select
(inlined, eager-evaluated conditional function)The text was updated successfully, but these errors were encountered: