You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was optimizing an algorithm by replacing some multiply-add sequences with the fused multiply-add functions that this library provides, and was very confused as to why it made it a full 20% slower.
It turns out that the FMA instructions are a separate target feature from AVX2, and therefore must be enabled separately. Currently, all the fused multiply-add operations provided by this library will be compiled down to software fallbacks.
I believe that all CPUs that support AVX2 also support FMA (Haswell added support for both, and AMD added support for AVX2 in Excavator, three years after adding FMA support in Piledriver). Therefore, adding the extra "does this CPU also support FMA?" check should not cause any previously-supported CPUs to stop being able to use AVX2.
The text was updated successfully, but these errors were encountered:
I was optimizing an algorithm by replacing some multiply-add sequences with the fused multiply-add functions that this library provides, and was very confused as to why it made it a full 20% slower.
It turns out that the FMA instructions are a separate target feature from AVX2, and therefore must be enabled separately. Currently, all the fused multiply-add operations provided by this library will be compiled down to software fallbacks.
I believe that all CPUs that support AVX2 also support FMA (Haswell added support for both, and AMD added support for AVX2 in Excavator, three years after adding FMA support in Piledriver). Therefore, adding the extra "does this CPU also support FMA?" check should not cause any previously-supported CPUs to stop being able to use AVX2.
The text was updated successfully, but these errors were encountered: