Question: Why is this not part of GCC/Clang/... #32

dumblob · 2017-03-14T16:00:34Z

This is a ridiculously dumb question, but why are the optimizations employed in libdivide not used by GCC/Clang/... by default (e.g. for -O2)?

An additional question: Did you try to ask GCC/Clang/... upstreams to incorporate the optimizations you're using? If yes, what were the reactions and what are the outcomes?

The text was updated successfully, but these errors were encountered:

ridiculousfish · 2017-03-15T05:11:58Z

It's not at all a dumb question!!

To be sure, gcc and clang both do perform this optimization for compile-time constants. What they don't do is perform the optimization for runtime constants.

To do it for runtime constants would require identifying loops that include division or modulus, with sufficiently high trip counts to make the optimization worthwhile. The next part is emitting code to compute the magic number; in some cases this can require e.g. 128 bit division, which may mean a library function. So it's doable, but might be ugly.

The big reason why we should do it anyways is vectorization. Division defeats vectorization, because vector hardware can't divide. libdivide eliminates division, so it can enable vectorizing loops that are otherwise resistant to vectorization.

I may take a crack at it myself one of these days.

ridiculousfish · 2017-03-15T05:15:37Z

Closing, thanks for the interesting question! Feel free to continue the conversation here.

dumblob · 2017-03-15T11:15:00Z

Thank you @ridiculousfish for explanation. The vectorization of loops is of a particular interest.

I'm curious what will you find out when it comes to "inclusion" into compilers like GCC/Clang.

dumblob · 2022-11-13T14:58:22Z

@ValZapod that is an interesting observation - about a year ago I made some research on the state-of-the-art floating point computations with focus on decimals (i.e. not exactly what GMP is for but still highly relevant IMHO). And GMP came out not that well as 15 years ago - especially regarding "precision vs. computation time".

The paper seems to gather some dust already - it is not on par with the methods used in projects linked in the tree of links rooted in the quick research I linked above.

Also on M1/M2 and Intel after Ice Lake may be using libdivide in HW already. :)

This sounds very interesting - any pointers how to implement this in HW (ASIC or FPGA)?

ridiculousfish closed this as completed Mar 15, 2017

dumblob mentioned this issue Mar 15, 2017

Enhancement: VM optimizations (compilation, GC, etc.) daokoder/dao#332

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Why is this not part of GCC/Clang/... #32

Question: Why is this not part of GCC/Clang/... #32

dumblob commented Mar 14, 2017

ridiculousfish commented Mar 15, 2017 •

edited

Loading

ridiculousfish commented Mar 15, 2017

dumblob commented Mar 15, 2017

dumblob commented Nov 13, 2022

Question: Why is this not part of GCC/Clang/... #32

Question: Why is this not part of GCC/Clang/... #32

Comments

dumblob commented Mar 14, 2017

ridiculousfish commented Mar 15, 2017 • edited Loading

ridiculousfish commented Mar 15, 2017

dumblob commented Mar 15, 2017

dumblob commented Nov 13, 2022

ridiculousfish commented Mar 15, 2017 •

edited

Loading