Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Why is this not part of GCC/Clang/... #32

Closed
dumblob opened this issue Mar 14, 2017 · 7 comments
Closed

Question: Why is this not part of GCC/Clang/... #32

dumblob opened this issue Mar 14, 2017 · 7 comments

Comments

@dumblob
Copy link

dumblob commented Mar 14, 2017

This is a ridiculously dumb question, but why are the optimizations employed in libdivide not used by GCC/Clang/... by default (e.g. for -O2)?

An additional question: Did you try to ask GCC/Clang/... upstreams to incorporate the optimizations you're using? If yes, what were the reactions and what are the outcomes?

@ridiculousfish
Copy link
Owner

ridiculousfish commented Mar 15, 2017

It's not at all a dumb question!!

To be sure, gcc and clang both do perform this optimization for compile-time constants. What they don't do is perform the optimization for runtime constants.

To do it for runtime constants would require identifying loops that include division or modulus, with sufficiently high trip counts to make the optimization worthwhile. The next part is emitting code to compute the magic number; in some cases this can require e.g. 128 bit division, which may mean a library function. So it's doable, but might be ugly.

The big reason why we should do it anyways is vectorization. Division defeats vectorization, because vector hardware can't divide. libdivide eliminates division, so it can enable vectorizing loops that are otherwise resistant to vectorization.

I may take a crack at it myself one of these days.

@ridiculousfish
Copy link
Owner

Closing, thanks for the interesting question! Feel free to continue the conversation here.

@dumblob
Copy link
Author

dumblob commented Mar 15, 2017

Thank you @ridiculousfish for explanation. The vectorization of loops is of a particular interest.

I'm curious what will you find out when it comes to "inclusion" into compilers like GCC/Clang.

@dumblob
Copy link
Author

dumblob commented Nov 13, 2022

@ValZapod that is an interesting observation - about a year ago I made some research on the state-of-the-art floating point computations with focus on decimals (i.e. not exactly what GMP is for but still highly relevant IMHO). And GMP came out not that well as 15 years ago - especially regarding "precision vs. computation time".

The paper seems to gather some dust already - it is not on par with the methods used in projects linked in the tree of links rooted in the quick research I linked above.

Also on M1/M2 and Intel after Ice Lake may be using libdivide in HW already. :)

This sounds very interesting - any pointers how to implement this in HW (ASIC or FPGA)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@ridiculousfish @dumblob and others