Skip to content

Implement @depositBits and @extractBits #23474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

bnuuydev
Copy link

@bnuuydev bnuuydev commented Apr 5, 2025

This PR takes #18680, updates the changes to master, and fixes the outstanding issues which prevented it from being merged.

There currently are outstanding issues related to the COFF linker which are blocking progress in the x86 backend:

I have been told that moving on and just skipping the failing tests is the appropriate course of action here.

The comptime implementation calls into big.int.depositBits/extractBits, which themselves call the corresponding builtins. This allows for significantly better performance at comptime, but obviously requires zig1.wasm to be updated. I understand that a core member will be required to redo this commit for me before this can be merged?

The C backend implementation for integer sizes above u64 is blocked by #19991. I am subscribed to this issue, and once fixed, I have a work-in-progress implementation ready to be completed and included in a follow-up PR. Up-to u64, the C backend implementation works fine, and is sufficent for bootstrapping the compiler.

I've included a number of behaviour tests to ensure that the implementations are correct.

Many thanks to @Rexicon226, @jacobly0 and others for your extensive help :)

Closes #14995

bnuuydev and others added 11 commits April 5, 2025 20:19
This commit encompasses the progress made towards the implementation of
these builtins in ziglang#18680
Currently doesn't go beyond 64-bit integers due to implementation problems (see
ziglang#19991), and that this has been implemented to enable bootstrapping
the compiler, and so is sufficient for current purposes.
Necessary as the new @depositBits and @extractBits builtins are used
by the compiler.
Re-implements std.math.big.int.{deposit,extract}Bits to use the relevant
builtins. This should improve the speed of @depositBits and @extractBits
at comptime by orders of magnitude on supported architectures. It also
cleans up the code nicely :)
This code would cause an overflow in Sema, with a minimal reproduction:
`@depositBits(1, 3 << 63)`. These implementations under big.int should be
scrutinised for any other buffer overflows, although I believe this is the only
one.
Adds randomly-generated test cases for the builtins. These cases have a number
of different integer sizes. This has helped to uncover a few issues with the
current implementation, which will be fixed in subsequent commits. Additionally
skips tests for Windows with the x86 backend, as we are not providing support
until progress is made on the self-hosted COFF linker.
This replaces the old (and broken) use of the old API with the new
`select` API. Windows is not supported at all by this API, however the
original old API implementation never worked properly anyway, and the
correct course of action is to wait for the COFF linker to be fixed.

TODO: This currently has a hacky copy-paste in compiler-rt. Wait for
alexrp's PR.

Co-authored-by: David Rubin <[email protected]>
@bnuuydev
Copy link
Author

bnuuydev commented Apr 5, 2025

There's currently a copy-paste in compiler-rt waiting for the required functions to be moved to std/Target.zig. Once that change is made, I'll fix that commit up.

@bnuuydev
Copy link
Author

bnuuydev commented Apr 5, 2025

There is scope for a follow-up PR to optimise the codegen for u128 and above to compile down to multiple invocations of pdep and pext. I intend to tackle that at a later date :)

Copy link
Member

@jacobly0 jacobly0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably more issues, but they will be easier to see with these more obvious things fixed first.

Comment on lines +4030 to +4038
.@"x86.bmi.pext.32" = .{
.ret_len = 1,
.params = &.{
.{ .kind = .{ .type = .i32 } },
.{ .kind = .{ .type = .i32 } },
.{ .kind = .{ .type = .i32 } },
},
.attrs = &.{ .nocallback, .nofree, .nosync, .nounwind, .{ .memory = Attribute.Memory.all(.none) } },
},
Copy link
Member

@jacobly0 jacobly0 Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.@"x86.bmi.pext.32" = .{
.ret_len = 1,
.params = &.{
.{ .kind = .{ .type = .i32 } },
.{ .kind = .{ .type = .i32 } },
.{ .kind = .{ .type = .i32 } },
},
.attrs = &.{ .nocallback, .nofree, .nosync, .nounwind, .{ .memory = Attribute.Memory.all(.none) } },
},
.@"x86.bmi.pext = .{
.ret_len = 1,
.params = &.{
.{ .kind = .overloaded },
.{ .kind = .{ .matches = 0 } },
.{ .kind = .{ .matches = 0 } },
},
.attrs = &.{ .nocallback, .nofree, .nosync, .nounwind, .willreturn, .{ .memory = .all(.none) } },
},

etc.

Copy link
Member

@jacobly0 jacobly0 Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, these don't follow the correct pattern, I completely missed the missing i...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the other changes here, aren't you right to add .willreturn though?

Addresses requested changes in PR.
@bnuuydev
Copy link
Author

bnuuydev commented Apr 6, 2025

I think I've addressed those all now :) I'll take a look at any feedback you've given tomorrow. Thank you for the help so-far! Also it'd probably be a good idea to give the workflows approval in-case any tests aren't passing on platforms I haven't been able to test for :) (Although I imagine you don't want to be running a third-party zig1.wasm on the CI?)

@bnuuydev
Copy link
Author

bnuuydev commented Apr 6, 2025

Fixed the faulty assumption that the source and mask would have the same number of limbs in the big.int implementation. Also added a couple more tests to confirm we have the correct behaviour.

bnuuydev added 3 commits April 7, 2025 19:45
Overflow arithmetic and {deposit,extract}Bits can share the same function to
generate their builtin calls.
return pext_uX(u128, source, mask);
}

// BEGIN HACKY CODE COPY WAIT FOR ALEXRP PR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@jacobly0 jacobly0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The x86_64 backend impl is looking much more solid, so I looked around a bit more at the other files.

},
.dst_temps = .{ .{ .mut_rc = .{ .rc = .general_purpose, .ref = .src1 } }, .unused },
.each = .{
.once = if (mir_tag == .pext) &.{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are begging to be switches with explicit mir tags in each case for clarity.

Comment on lines +10843 to +10860
const needs_extend = bits != compiler_rt_bits;
const extended_ty = if (needs_extend) try o.builder.intType(compiler_rt_bits) else llvm_ty;

const params = .{
if (needs_extend) try self.wip.cast(.zext, source, extended_ty, "") else source,
if (needs_extend) try self.wip.cast(.zext, mask, extended_ty, "") else mask,
};

const result = try self.wip.callIntrinsic(
.normal,
.none,
intrinsic,
&.{},
&params,
"",
);

return if (needs_extend) try self.wip.cast(.trunc, result, llvm_ty, "") else result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast already handles this logic.

Suggested change
const needs_extend = bits != compiler_rt_bits;
const extended_ty = if (needs_extend) try o.builder.intType(compiler_rt_bits) else llvm_ty;
const params = .{
if (needs_extend) try self.wip.cast(.zext, source, extended_ty, "") else source,
if (needs_extend) try self.wip.cast(.zext, mask, extended_ty, "") else mask,
};
const result = try self.wip.callIntrinsic(
.normal,
.none,
intrinsic,
&.{},
&params,
"",
);
return if (needs_extend) try self.wip.cast(.trunc, result, llvm_ty, "") else result;
const extended_ty = try o.builder.intType(compiler_rt_bits);
const result = try self.wip.callIntrinsic(
.normal,
.none,
intrinsic,
&.{
try self.wip.cast(.zext, source, extended_ty, ""),
try self.wip.cast(.zext, mask, extended_ty, ""),
},
&params,
"",
);
return try self.wip.cast(.trunc, result, llvm_ty, "");

else => {},
}

return try self.genDepositExtractBitsEmulated(tag, bits, source, mask, llvm_ty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this function exist? It only has one call site and seems to just want most of the locals by the same name.

mask &= ~bit;\
uint##w##_t source_bit = source & bit;\
if (source_bit != 0) result |= bb;\
bb += bb;\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to obscure that you are doing bit manipulation.

Suggested change
bb += bb;\
bb <<= 1; \

\
while (mask != 0) {\
uint##w##_t bit = mask & ~(mask - 1);\
mask &= ~bit;\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to extend this dependency chain when you can reuse an earlier expression.

Suggested change
mask &= ~bit;\
mask &= mask - 1; \

assert(mask.positive);

result.positive = true;
@memset(result.limbs, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The running time of these functions should not depend on how oversized result might be or one of source and mask relative to the other.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May you clarify this please?

Copy link
Member

@jacobly0 jacobly0 Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could pass in a result with a billion limbs, and the @memset would require each one to be set to 0 and then the normalize at the end would have to iterate each and every one to make sure they are still 0. Instead, you should only access the prefix of the limbs that might possibly be nonzero, which certainly does not exceed the number of limbs in the mask, among other constraints.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I guess we could memset as we iterate the limbs instead then, so we only memset the limbs we actually write to? Or we could memset based on the length if the source or mask? Not sure which approach would be preferable here, but I like the first?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact for depositBits a memset isn't necessary at all we can just use = instead of |= on the limb.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both impls only ever access the limbs monotonically, so you just store the current limb in a local variable, and then when you need to move on to the next limb, you store the local into the next element and track how many initialized elements there are, and then the normalize only ever has to check the actually initialized limbs for being zero.


var source_limb = source.limbs[shift_limbs] >> shift_bits;
if (shift_bits != 0 and shift_limbs + 1 < source.limbs.len) {
source_limb += source.limbs[shift_limbs + 1] << @intCast(limb_bits - shift_bits);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no need to hide bit manipulation.

Suggested change
source_limb += source.limbs[shift_limbs + 1] << @intCast(limb_bits - shift_bits);
source_limb |= source.limbs[shift_limbs + 1] << @intCast(limb_bits - shift_bits);

shift += @intCast(@popCount(mask_limb));
}

result.normalize(result.limbs.len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, you have a much tighter bound on the number of possible non-zero limbs than this unbounded input.


result_limb.* |= pdep_limb;

shift += @intCast(@popCount(mask_limb));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely we can assume that @popCount of a usize fits in a usize.

Suggested change
shift += @intCast(@popCount(mask_limb));
shift += @popCount(mask_limb);

@@ -2960,6 +2960,59 @@ fn popCountTest(val: *const Managed, bit_count: usize, expected: usize) !void {
try testing.expectEqual(expected, val.toConst().popCount(bit_count));
}

test "big int extractBits" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on how buggy the other tested big int functions have been, this needs more testing for being passed non-normalized inputs, producing normalized outputs, and using a result with fewer limbs than the source and/or mask.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like the gcd and bitwise or tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's definitely room for improvement here relative to the existing tests, for example an isNormalized function and an expect that the return of any function is true, but anything that catches possible bugs is good enough for now.

@bnuuydev
Copy link
Author

Currently away from my computer, will get these changes addressed in a week or two :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add @pext and @pdep builtins
4 participants