-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Implement @depositBits and @extractBits #23474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This commit encompasses the progress made towards the implementation of these builtins in ziglang#18680
Currently doesn't go beyond 64-bit integers due to implementation problems (see ziglang#19991), and that this has been implemented to enable bootstrapping the compiler, and so is sufficient for current purposes.
Necessary as the new @depositBits and @extractBits builtins are used by the compiler.
Re-implements std.math.big.int.{deposit,extract}Bits to use the relevant builtins. This should improve the speed of @depositBits and @extractBits at comptime by orders of magnitude on supported architectures. It also cleans up the code nicely :)
This code would cause an overflow in Sema, with a minimal reproduction: `@depositBits(1, 3 << 63)`. These implementations under big.int should be scrutinised for any other buffer overflows, although I believe this is the only one.
Adds randomly-generated test cases for the builtins. These cases have a number of different integer sizes. This has helped to uncover a few issues with the current implementation, which will be fixed in subsequent commits. Additionally skips tests for Windows with the x86 backend, as we are not providing support until progress is made on the self-hosted COFF linker.
This replaces the old (and broken) use of the old API with the new `select` API. Windows is not supported at all by this API, however the original old API implementation never worked properly anyway, and the correct course of action is to wait for the COFF linker to be fixed. TODO: This currently has a hacky copy-paste in compiler-rt. Wait for alexrp's PR. Co-authored-by: David Rubin <[email protected]>
There's currently a copy-paste in compiler-rt waiting for the required functions to be moved to |
There is scope for a follow-up PR to optimise the codegen for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably more issues, but they will be easier to see with these more obvious things fixed first.
.@"x86.bmi.pext.32" = .{ | ||
.ret_len = 1, | ||
.params = &.{ | ||
.{ .kind = .{ .type = .i32 } }, | ||
.{ .kind = .{ .type = .i32 } }, | ||
.{ .kind = .{ .type = .i32 } }, | ||
}, | ||
.attrs = &.{ .nocallback, .nofree, .nosync, .nounwind, .{ .memory = Attribute.Memory.all(.none) } }, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.@"x86.bmi.pext.32" = .{ | |
.ret_len = 1, | |
.params = &.{ | |
.{ .kind = .{ .type = .i32 } }, | |
.{ .kind = .{ .type = .i32 } }, | |
.{ .kind = .{ .type = .i32 } }, | |
}, | |
.attrs = &.{ .nocallback, .nofree, .nosync, .nounwind, .{ .memory = Attribute.Memory.all(.none) } }, | |
}, | |
.@"x86.bmi.pext = .{ | |
.ret_len = 1, | |
.params = &.{ | |
.{ .kind = .overloaded }, | |
.{ .kind = .{ .matches = 0 } }, | |
.{ .kind = .{ .matches = 0 } }, | |
}, | |
.attrs = &.{ .nocallback, .nofree, .nosync, .nounwind, .willreturn, .{ .memory = .all(.none) } }, | |
}, |
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, these don't follow the correct pattern, I completely missed the missing i
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the other changes here, aren't you right to add .willreturn
though?
Addresses requested changes in PR.
I think I've addressed those all now :) I'll take a look at any feedback you've given tomorrow. Thank you for the help so-far! Also it'd probably be a good idea to give the workflows approval in-case any tests aren't passing on platforms I haven't been able to test for :) (Although I imagine you don't want to be running a third-party zig1.wasm on the CI?) |
Fixed the faulty assumption that the source and mask would have the same number of limbs in the |
Overflow arithmetic and {deposit,extract}Bits can share the same function to generate their builtin calls.
return pext_uX(u128, source, mask); | ||
} | ||
|
||
// BEGIN HACKY CODE COPY WAIT FOR ALEXRP PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The x86_64 backend impl is looking much more solid, so I looked around a bit more at the other files.
}, | ||
.dst_temps = .{ .{ .mut_rc = .{ .rc = .general_purpose, .ref = .src1 } }, .unused }, | ||
.each = .{ | ||
.once = if (mir_tag == .pext) &.{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are begging to be switch
es with explicit mir tags in each case for clarity.
const needs_extend = bits != compiler_rt_bits; | ||
const extended_ty = if (needs_extend) try o.builder.intType(compiler_rt_bits) else llvm_ty; | ||
|
||
const params = .{ | ||
if (needs_extend) try self.wip.cast(.zext, source, extended_ty, "") else source, | ||
if (needs_extend) try self.wip.cast(.zext, mask, extended_ty, "") else mask, | ||
}; | ||
|
||
const result = try self.wip.callIntrinsic( | ||
.normal, | ||
.none, | ||
intrinsic, | ||
&.{}, | ||
¶ms, | ||
"", | ||
); | ||
|
||
return if (needs_extend) try self.wip.cast(.trunc, result, llvm_ty, "") else result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast
already handles this logic.
const needs_extend = bits != compiler_rt_bits; | |
const extended_ty = if (needs_extend) try o.builder.intType(compiler_rt_bits) else llvm_ty; | |
const params = .{ | |
if (needs_extend) try self.wip.cast(.zext, source, extended_ty, "") else source, | |
if (needs_extend) try self.wip.cast(.zext, mask, extended_ty, "") else mask, | |
}; | |
const result = try self.wip.callIntrinsic( | |
.normal, | |
.none, | |
intrinsic, | |
&.{}, | |
¶ms, | |
"", | |
); | |
return if (needs_extend) try self.wip.cast(.trunc, result, llvm_ty, "") else result; | |
const extended_ty = try o.builder.intType(compiler_rt_bits); | |
const result = try self.wip.callIntrinsic( | |
.normal, | |
.none, | |
intrinsic, | |
&.{ | |
try self.wip.cast(.zext, source, extended_ty, ""), | |
try self.wip.cast(.zext, mask, extended_ty, ""), | |
}, | |
¶ms, | |
"", | |
); | |
return try self.wip.cast(.trunc, result, llvm_ty, ""); |
else => {}, | ||
} | ||
|
||
return try self.genDepositExtractBitsEmulated(tag, bits, source, mask, llvm_ty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this function exist? It only has one call site and seems to just want most of the locals by the same name.
mask &= ~bit;\ | ||
uint##w##_t source_bit = source & bit;\ | ||
if (source_bit != 0) result |= bb;\ | ||
bb += bb;\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to obscure that you are doing bit manipulation.
bb += bb;\ | |
bb <<= 1; \ |
\ | ||
while (mask != 0) {\ | ||
uint##w##_t bit = mask & ~(mask - 1);\ | ||
mask &= ~bit;\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to extend this dependency chain when you can reuse an earlier expression.
mask &= ~bit;\ | |
mask &= mask - 1; \ |
assert(mask.positive); | ||
|
||
result.positive = true; | ||
@memset(result.limbs, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The running time of these functions should not depend on how oversized result
might be or one of source
and mask
relative to the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May you clarify this please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could pass in a result
with a billion limbs, and the @memset
would require each one to be set to 0 and then the normalize
at the end would have to iterate each and every one to make sure they are still 0. Instead, you should only access the prefix of the limbs that might possibly be nonzero, which certainly does not exceed the number of limbs in the mask, among other constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I guess we could memset as we iterate the limbs instead then, so we only memset the limbs we actually write to? Or we could memset based on the length if the source or mask? Not sure which approach would be preferable here, but I like the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact for depositBits a memset isn't necessary at all we can just use = instead of |= on the limb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both impls only ever access the limbs monotonically, so you just store the current limb in a local variable, and then when you need to move on to the next limb, you store the local into the next element and track how many initialized elements there are, and then the normalize
only ever has to check the actually initialized limbs for being zero.
|
||
var source_limb = source.limbs[shift_limbs] >> shift_bits; | ||
if (shift_bits != 0 and shift_limbs + 1 < source.limbs.len) { | ||
source_limb += source.limbs[shift_limbs + 1] << @intCast(limb_bits - shift_bits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, no need to hide bit manipulation.
source_limb += source.limbs[shift_limbs + 1] << @intCast(limb_bits - shift_bits); | |
source_limb |= source.limbs[shift_limbs + 1] << @intCast(limb_bits - shift_bits); |
shift += @intCast(@popCount(mask_limb)); | ||
} | ||
|
||
result.normalize(result.limbs.len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, you have a much tighter bound on the number of possible non-zero limbs than this unbounded input.
|
||
result_limb.* |= pdep_limb; | ||
|
||
shift += @intCast(@popCount(mask_limb)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely we can assume that @popCount
of a usize
fits in a usize
.
shift += @intCast(@popCount(mask_limb)); | |
shift += @popCount(mask_limb); |
@@ -2960,6 +2960,59 @@ fn popCountTest(val: *const Managed, bit_count: usize, expected: usize) !void { | |||
try testing.expectEqual(expected, val.toConst().popCount(bit_count)); | |||
} | |||
|
|||
test "big int extractBits" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on how buggy the other tested big int functions have been, this needs more testing for being passed non-normalized inputs, producing normalized outputs, and using a result with fewer limbs than the source and/or mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like the gcd and bitwise or tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's definitely room for improvement here relative to the existing tests, for example an isNormalized
function and an expect that the return of any function is true
, but anything that catches possible bugs is good enough for now.
Currently away from my computer, will get these changes addressed in a week or two :) |
This PR takes #18680, updates the changes to master, and fixes the outstanding issues which prevented it from being merged.
There currently are outstanding issues related to the COFF linker which are blocking progress in the x86 backend:
.dll
.I have been told that moving on and just skipping the failing tests is the appropriate course of action here.
The comptime implementation calls into
big.int.depositBits/extractBits
, which themselves call the corresponding builtins. This allows for significantly better performance at comptime, but obviously requireszig1.wasm
to be updated. I understand that a core member will be required to redo this commit for me before this can be merged?The C backend implementation for integer sizes above
u64
is blocked by #19991. I am subscribed to this issue, and once fixed, I have a work-in-progress implementation ready to be completed and included in a follow-up PR. Up-tou64
, the C backend implementation works fine, and is sufficent for bootstrapping the compiler.I've included a number of behaviour tests to ensure that the implementations are correct.
Many thanks to @Rexicon226, @jacobly0 and others for your extensive help :)
Closes #14995