-
-
Notifications
You must be signed in to change notification settings - Fork 61
Add count-trailing-zero libcalls #597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
2eae7db
to
a61a932
Compare
a61a932
to
98bec4b
Compare
tst a, 030h | ||
jr z, .high2 | ||
dec a | ||
and a, 014h | ||
ret po | ||
ld a, 5 | ||
ret | ||
.high2: | ||
add a, a | ||
sbc a, -8 | ||
ret p | ||
ld a, 6 | ||
ret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unfortunately breaks the optimization of the first trailing one functions relying on this to produce an output of 8 for an input of 0, but it might be worth it?
tst a, 030h | |
jr z, .high2 | |
dec a | |
and a, 014h | |
ret po | |
ld a, 5 | |
ret | |
.high2: | |
add a, a | |
sbc a, -8 | |
ret p | |
ld a, 6 | |
ret | |
add a, a | |
add a, a | |
jr z, .high2 | |
add a, a | |
add a, a | |
sbc a, -5 | |
ret | |
.high2: | |
sbc a, -7 | |
ret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this change would also hurt for maybe eventually implementing std::countr_zero
, wouldn't it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, that assumption is also made by the compiler (this libcall implements CTTZ which is output by __builtin_ctz
intrinsics on the Z80 target and not CTTZ_ZERO_UNDEF). Good optimization idea though, I'll have to think whether I can do something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this change would also hurt for maybe eventually implementing
std::countr_zero
, wouldn't it...
It's actually already been implemented in the toolchain recently, and makes use of the Z80 intrinsic behavior I mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that, except for that pesky 0 case, you could potentially rework this whole routine to be based on branches after pairs of left shifts. I think whether that would be faster or not depends on the input distribution, but it would certainly be smaller.
ld a, l | ||
or a, a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matching __icttz
's first 3 bytes might compress better.
ld a, l | |
or a, a | |
xor a, a | |
or a, l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I always forget to think about compression alongside other size optimizations.
I've implemented optimized count-trailing-zero libcalls, and I have an ez80-clang commit ready to push to make it use them. They've also been added to ez80_builtin implementations where applicable, and tested via ez80_builtin/stdbit tests.