-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Why is this not part of GCC/Clang/... #32
Comments
It's not at all a dumb question!! To be sure, gcc and clang both do perform this optimization for compile-time constants. What they don't do is perform the optimization for runtime constants. To do it for runtime constants would require identifying loops that include division or modulus, with sufficiently high trip counts to make the optimization worthwhile. The next part is emitting code to compute the magic number; in some cases this can require e.g. 128 bit division, which may mean a library function. So it's doable, but might be ugly. The big reason why we should do it anyways is vectorization. Division defeats vectorization, because vector hardware can't divide. libdivide eliminates division, so it can enable vectorizing loops that are otherwise resistant to vectorization. I may take a crack at it myself one of these days. |
Closing, thanks for the interesting question! Feel free to continue the conversation here. |
Thank you @ridiculousfish for explanation. The vectorization of loops is of a particular interest. I'm curious what will you find out when it comes to "inclusion" into compilers like GCC/Clang. |
@ValZapod that is an interesting observation - about a year ago I made some research on the state-of-the-art floating point computations with focus on decimals (i.e. not exactly what GMP is for but still highly relevant IMHO). And GMP came out not that well as 15 years ago - especially regarding "precision vs. computation time". The paper seems to gather some dust already - it is not on par with the methods used in projects linked in the tree of links rooted in the quick research I linked above.
This sounds very interesting - any pointers how to implement this in HW (ASIC or FPGA)? |
This is a ridiculously dumb question, but why are the optimizations employed in libdivide not used by GCC/Clang/... by default (e.g. for
-O2
)?An additional question: Did you try to ask GCC/Clang/... upstreams to incorporate the optimizations you're using? If yes, what were the reactions and what are the outcomes?
The text was updated successfully, but these errors were encountered: