Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DepthRaster: Use vmaxvq_s32 to implement AnyZeroSignBit more efficiently on ARM64 #19892

Merged
merged 1 commit into from
Jan 20, 2025

Conversation

hrydgard
Copy link
Owner

Turns out ARM64 has some neat horizontal reduce instructions. Thanks ryg for the pointer.

@hrydgard hrydgard added this to the v1.19.0 milestone Jan 19, 2025
@hrydgard hrydgard merged commit 9ab8875 into master Jan 20, 2025
19 checks passed
@hrydgard hrydgard deleted the minor-depth-opt branch January 20, 2025 00:53
@fp64
Copy link
Contributor

fp64 commented Jan 20, 2025

For basic NEON version, wouldn't something like vshrn_n_u32 (or maybe even vshrn_n_u64) work (similar to what is described here: https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon with vshrn_n_u16)? Perhaps even for more general SignBits(), e.g. (mask4x16*0x0001000200040008ull)>>48.
I don't know much about ARM, though.

@hrydgard
Copy link
Owner Author

Yes absolutely. Just haven't gotten around to testing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants