Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FMA instructions are not used #77

Open
valadaptive opened this issue Dec 19, 2024 · 0 comments · May be fixed by #78
Open

FMA instructions are not used #77

valadaptive opened this issue Dec 19, 2024 · 0 comments · May be fixed by #78

Comments

@valadaptive
Copy link
Contributor

I was optimizing an algorithm by replacing some multiply-add sequences with the fused multiply-add functions that this library provides, and was very confused as to why it made it a full 20% slower.

It turns out that the FMA instructions are a separate target feature from AVX2, and therefore must be enabled separately. Currently, all the fused multiply-add operations provided by this library will be compiled down to software fallbacks.

I believe that all CPUs that support AVX2 also support FMA (Haswell added support for both, and AMD added support for AVX2 in Excavator, three years after adding FMA support in Piledriver). Therefore, adding the extra "does this CPU also support FMA?" check should not cause any previously-supported CPUs to stop being able to use AVX2.

@valadaptive valadaptive linked a pull request Dec 19, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant