Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s390x vector facilities support #130869

Open
7 of 11 tasks
taiki-e opened this issue Sep 26, 2024 · 13 comments
Open
7 of 11 tasks

s390x vector facilities support #130869

taiki-e opened this issue Sep 26, 2024 · 13 comments
Labels
A-ABI Area: Concerning the application binary interface (ABI) A-inline-assembly Area: Inline assembly (`asm!(…)`) A-SIMD Area: SIMD (Single Instruction Multiple Data) O-SystemZ Target: SystemZ processors (s390x) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@taiki-e
Copy link
Member

taiki-e commented Sep 26, 2024

This tracks the status of s390x vector facilities support in rustc and standard libraries.

@rustbot label +O-SystemZ

@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. O-SystemZ Target: SystemZ processors (s390x) labels Sep 26, 2024
@taiki-e
Copy link
Member Author

taiki-e commented Sep 26, 2024

cc @uweigand

@saethlin saethlin added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 26, 2024
@uweigand
Copy link
Contributor

Thanks for putting together this list! I can certainly look into providing proper vector register ABI support.

@tgross35 tgross35 added A-inline-assembly Area: Inline assembly (`asm!(…)`) A-ABI Area: Concerning the application binary interface (ABI) labels Oct 4, 2024
@uweigand
Copy link
Contributor

I've looked into this a bit, but ran into a problem I'm not sure how to address. The main issue is that on s390x, we use a different ABI depending on whether or not the vector feature is active (i.e. the processor has vector registers). The difference affects only vector types (which I guess would correspond to #repr(simd) types in Rust), in the following manner:

  • Alignment: If the vector ABI is not present, all vector types are naturally aligned to their size. If the vector ABI is present, vector types larger than 8 bytes are still only 8-byte aligned.
  • Calling convention: If the vector ABI is not present, all vector types are passed and returned like aggregates. If the vector ABI is present, vector types up to 16 bytes in size are instead passed and returned in vector registers. For arguments (but not return values) the same applies also to aggregates containing only a single element of vector type.

The calling convention should be implemented in the rustc_target/src/abi/call/s390x.c. However, I'm not sure how to detect in this place whether or not the vector feature is active. This depends both on explicit feature flags, and also on the target processor (-march=) - processors from z13 onwards default to having the vector feature enabled, older processors have it disabled. Clang tracks this in the front-end so it can make ABI decisions based on it. I'm not sure how this is done in Rust, I'm not seeing any precedent in any of the other ABI files. (I believe there may be some similar issues e.g. on Intel for the larger vector registers like AVX-256 or AVX-512, but I wasn't able to find where this is handled.)

In addition, I'm not sure where to handle the vector alignment. This currently seems to be derived solely from the datalayout string, which likely isn't correct even now - the datalayout string gives the alignment for the LLVM vector types, which is always bounded to 8 bytes, but Clang explicitly uses a different alignment for C/C++ level vector types if necessary.

@taiki-e
Copy link
Member Author

taiki-e commented Oct 11, 2024

@uweigand Thanks for the investigation!

  • Alignment: If the vector ABI is not present, all vector types are naturally aligned to their size. If the vector ABI is present, vector types larger than 8 bytes are still only 8-byte aligned.

In addition, I'm not sure where to handle the vector alignment. This currently seems to be derived solely from the datalayout string, which likely isn't correct even now - the datalayout string gives the alignment for the LLVM vector types, which is always bounded to 8 bytes, but Clang explicitly uses a different alignment for C/C++ level vector types if necessary.

Hmm, I thought it is always 8-byte aligned since LLVM 16 (2e7a964).

The calling convention should be implemented in the rustc_target/src/abi/call/s390x.c. However, I'm not sure how to detect in this place whether or not the vector feature is active. This depends both on explicit feature flags, and also on the target processor (-march=) - processors from z13 onwards default to having the vector feature enabled, older processors have it disabled. Clang tracks this in the front-end so it can make ABI decisions based on it. I'm not sure how this is done in Rust, I'm not seeing any precedent in any of the other ABI files. (I believe there may be some similar issues e.g. on Intel for the larger vector registers like AVX-256 or AVX-512, but I wasn't able to find where this is handled.)

I think we can implement this by referring to wasm code (1, 2) that determines the ABI depending on the command line option.
I will try to see if I can implement that approach...

@taiki-e
Copy link
Member Author

taiki-e commented Oct 11, 2024

I think we can implement this by referring to wasm code (1, 2) that determines the ABI depending on the command line option.
I will try to see if I can implement that approach...

I haven't done much testing yet, but this approach appears to be working: master...taiki-e:rust:s390x-vector-abi

// no_vector: mvc 8(8,%r2), 8(%r3)
// no_vector-NEXT: mvc 0(8,%r2), 0(%r3)
// no_vector-NEXT: br %r14
// vector: vl %v24, 0(%r2), 3
// vector-NEXT: br %r14
#[no_mangle]
extern "C" fn vector(x: &i8x16) -> i8x16 {
    *x
}

@uweigand
Copy link
Contributor

@uweigand Thanks for the investigation!

  • Alignment: If the vector ABI is not present, all vector types are naturally aligned to their size. If the vector ABI is present, vector types larger than 8 bytes are still only 8-byte aligned.

In addition, I'm not sure where to handle the vector alignment. This currently seems to be derived solely from the datalayout string, which likely isn't correct even now - the datalayout string gives the alignment for the LLVM vector types, which is always bounded to 8 bytes, but Clang explicitly uses a different alignment for C/C++ level vector types if necessary.

Hmm, I thought it is always 8-byte aligned since LLVM 16 (2e7a964).

The point is that there's a difference between the LLVM IR vector types and the C/C++ language vector types. The LLVM IR vector types are indeed always 8-byte aligned now. But the C/C++ language vector types still use different ABI alignment rules depending on the vector feature. Clang handles this by mapping the C/C++ type not directly to an LLVM IR type (leaving LLVM to use its own alignment rules), but rather to an LLVM IR type with an explicit alignment override (which implements the appropriate ABI alignment rule).

The calling convention should be implemented in the rustc_target/src/abi/call/s390x.c. However, I'm not sure how to detect in this place whether or not the vector feature is active. This depends both on explicit feature flags, and also on the target processor (-march=) - processors from z13 onwards default to having the vector feature enabled, older processors have it disabled. Clang tracks this in the front-end so it can make ABI decisions based on it. I'm not sure how this is done in Rust, I'm not seeing any precedent in any of the other ABI files. (I believe there may be some similar issues e.g. on Intel for the larger vector registers like AVX-256 or AVX-512, but I wasn't able to find where this is handled.)

I think we can implement this by referring to wasm code (1, 2) that determines the ABI depending on the command line option. I will try to see if I can implement that approach...

Ah, interesting! It seems this

    let unstable_target_features = codegen_backend.target_features(sess, true);

calls back into the LLVM back-end, which does already know which features are available by default depending on the target-cpu setting. I wasn't aware of that ... I'm wondering whether/how this works when using another codegen backend like GCC or Cranelift?

Otherwise, your patch looks good to me, except for

    if abi == Vector && size.bits() == 128 && contains_vector(cx, ret.layout) {
  • We also pass vectors smaller than 128 bits in VRs if available, so I'm not sure this size check is correct. (Then again, your vector_small test case already seems to do the right thing, so I may be missing something here?)
  • The logic should be different between the argument and return case: single-element vector aggregates are treated like vectors only as arguments, not as return types (that's a bit of an unfortunate ABI choice, but it matches what we historically did for single-element float aggregates). The wrapper tests all show the incorrect behavior for return values.

@taiki-e
Copy link
Member Author

taiki-e commented Oct 12, 2024

@uweigand Thanks for the investigation!

  • Alignment: If the vector ABI is not present, all vector types are naturally aligned to their size. If the vector ABI is present, vector types larger than 8 bytes are still only 8-byte aligned.

In addition, I'm not sure where to handle the vector alignment. This currently seems to be derived solely from the datalayout string, which likely isn't correct even now - the datalayout string gives the alignment for the LLVM vector types, which is always bounded to 8 bytes, but Clang explicitly uses a different alignment for C/C++ level vector types if necessary.

Hmm, I thought it is always 8-byte aligned since LLVM 16 (2e7a964).

The point is that there's a difference between the LLVM IR vector types and the C/C++ language vector types. The LLVM IR vector types are indeed always 8-byte aligned now. But the C/C++ language vector types still use different ABI alignment rules depending on the vector feature. Clang handles this by mapping the C/C++ type not directly to an LLVM IR type (leaving LLVM to use its own alignment rules), but rather to an LLVM IR type with an explicit alignment override (which implements the appropriate ABI alignment rule).

Ah, thanks for the clarification. I'm not sure how to handle it either, but considering that the current Rust SIMD types cannot be used with C FFI in the first place, and that even if it could be used, the requirement would be that the target feature be enabled, we may actually not have to very worry about the case where the vector feature is disabled here.

RFC 2574 to allows using SIMD types in C FFI says:

> Architecture-specific vector types require #[target_feature]s to be FFI safe. That is, they are only safely usable as part of the signature of extern functions if the function has certain #[target_feature]s enabled.

EDIT: I think #130869 (comment) 's approach will work.

 let unstable_target_features = codegen_backend.target_features(sess, true);

calls back into the LLVM back-end, which does already know which features are available by default depending on the target-cpu setting. I wasn't aware of that ... I'm wondering whether/how this works when using another codegen backend like GCC or Cranelift?

rustc_codegen_gcc seems to have code to handle -C target-cpu and -C target-feature, but I have not tested if it works as expected here.

IIRC rustc_codegen_cranelift does not currently support the target feature at all (rust-lang/rustc_codegen_cranelift#1400), so I suspect that vector ABI cannot be enabled from any way.

  • We also pass vectors smaller than 128 bits in VRs if available, so I'm not sure this size check is correct. (Then again, your vector_small test case already seems to do the right thing, so I may be missing something here?)

It looks like the vector_small case is processed in the branch before processing the 128-bit vector.

if !ret.layout.is_aggregate() && size.bits() <= 64 {
ret.extend_integer_width_to(64);
return;
}

However, I think we should handle it explicitly in the if abi == Vector && branch. Posted fixed version in #131586.

  • The logic should be different between the argument and return case: single-element vector aggregates are treated like vectors only as arguments, not as return types (that's a bit of an unfortunate ABI choice, but it matches what we historically did for single-element float aggregates). The wrapper tests all show the incorrect behavior for return values.

Thanks for pointing that out. I tried to fix this, but it appears that the ABI of Wrapper<Vector> is considered Vector, even without #[repr(transparent)] / #[repr(C)]...

@taiki-e
Copy link
Member Author

taiki-e commented Oct 12, 2024

Rust plans to disallow passing vector types to non-Rust ABI if the required target feature is disabled (#127731), so I think the issue of ABI differences with C/C++ when vector target feature is disabled could be resolved by extending the ABI checks used for that to handle s390x as well.

@uweigand
Copy link
Contributor

Thanks for pointing that out. I tried to fix this, but it appears that the ABI of Wrapper is considered Vector, even without #[repr(transparent)] / #[repr(C)]...

Hmm. Your test case uses the default (Rust) representation for the Wrapper type - my understanding is that this does not actually guarantee interoperability with the native platform ABI. Only #[repr(C)] comes with that guarantee. Does the test work if you use this for the Wrapper type?

Rust plans to disallow passing vector types to non-Rust ABI if the required target feature is disabled (#127731), so I think the issue of ABI differences with C/C++ when vector target feature is disabled could be resolved by extending the ABI checks used for that to handle s390x as well.

That looks like a reasonable approach to me as well.

@taiki-e
Copy link
Member Author

taiki-e commented Oct 12, 2024

Hmm. Your test case uses the default (Rust) representation for the Wrapper type - my understanding is that this does not actually guarantee interoperability with the native platform ABI. Only #[repr(C)] comes with that guarantee. Does the test work if you use this for the Wrapper type?

Oh, you are right. Changing Wrapper to #[repr(C)] worked as expected. Thanks!

@workingjubilee workingjubilee added the A-SIMD Area: SIMD (Single Instruction Multiple Data) label Nov 1, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Nov 9, 2024
…nieu

Stabilize s390x inline assembly

This stabilizes inline assembly for s390x (SystemZ).

Corresponding reference PR: rust-lang/reference#1643

---

From the requirements of stabilization mentioned in rust-lang#93335

> Each architecture needs to be reviewed before stabilization:

> - It must have clobber_abi.

Done in rust-lang#130630.

> - It must be possible to clobber every register that is normally clobbered by a function call.

Done in the PR that added support for clobber_abi.

> - Generally review that the exposed register classes make sense.

The followings can be used as input/output:

- `reg` (`r[0-10]`, `r[12-14]`): General-purpose register

- `reg_addr` (`r[1-10]`, `r[12-14]`): General-purpose register except `r0` which is evaluated as zero in an address context

  This class is needed because `r0`, which may be allocated when using the `reg` class, cannot be used as a register in certain contexts. This is identical to the `a` constraint in LLVM and GCC. See rust-lang#119431 for details.

- `freg` (`f[0-15]`): Floating-point register

The followings are clobber-only:

- `vreg` (`v[0-31]`): Vector register

  Technically `vreg` should be able to accept `#[repr(simd)]` types as input/output if the unstable `vector` target feature added is enabled, but `core::arch` has no s390x vector type and both `#[repr(simd)]` and `core::simd` are unstable. Everything related is unstable, so the fact that this is currently a clobber-only should not be considered a stabilization blocker. (rust-lang#130869 tracks unstable stuff here)

- `areg` (`a[2-15]`): Access register

All of the above register classes except `reg_addr` are needed for `clobber_abi`.

The followings cannot be used as operands for inline asm (see also [getReservedRegs](https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp#L258-L282) and [SystemZELFRegisters](https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.h#L107-L128) in LLVM):

- `r11`: frame pointer
- `r15`: stack pointer
- `a0`, `a1`: Reserved for system use
- `c[0-15]` (control register)  Reserved by the kernel

Although not listed in the above requirements, `preserves_flags` is implemented in rust-lang#111331.

---

cc `@uweigand`

r? `@Amanieu`

`@rustbot` label +O-SystemZ +A-inline-assembly
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Nov 10, 2024
…nieu

Stabilize s390x inline assembly

This stabilizes inline assembly for s390x (SystemZ).

Corresponding reference PR: rust-lang/reference#1643

---

From the requirements of stabilization mentioned in rust-lang#93335

> Each architecture needs to be reviewed before stabilization:

> - It must have clobber_abi.

Done in rust-lang#130630.

> - It must be possible to clobber every register that is normally clobbered by a function call.

Done in the PR that added support for clobber_abi.

> - Generally review that the exposed register classes make sense.

The followings can be used as input/output:

- `reg` (`r[0-10]`, `r[12-14]`): General-purpose register

- `reg_addr` (`r[1-10]`, `r[12-14]`): General-purpose register except `r0` which is evaluated as zero in an address context

  This class is needed because `r0`, which may be allocated when using the `reg` class, cannot be used as a register in certain contexts. This is identical to the `a` constraint in LLVM and GCC. See rust-lang#119431 for details.

- `freg` (`f[0-15]`): Floating-point register

The followings are clobber-only:

- `vreg` (`v[0-31]`): Vector register

  Technically `vreg` should be able to accept `#[repr(simd)]` types as input/output if the unstable `vector` target feature added is enabled, but `core::arch` has no s390x vector type and both `#[repr(simd)]` and `core::simd` are unstable. Everything related is unstable, so the fact that this is currently a clobber-only should not be considered a stabilization blocker. (rust-lang#130869 tracks unstable stuff here)

- `areg` (`a[2-15]`): Access register

All of the above register classes except `reg_addr` are needed for `clobber_abi`.

The followings cannot be used as operands for inline asm (see also [getReservedRegs](https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp#L258-L282) and [SystemZELFRegisters](https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.h#L107-L128) in LLVM):

- `r11`: frame pointer
- `r15`: stack pointer
- `a0`, `a1`: Reserved for system use
- `c[0-15]` (control register)  Reserved by the kernel

Although not listed in the above requirements, `preserves_flags` is implemented in rust-lang#111331.

---

cc ``@uweigand``

r? ``@Amanieu``

``@rustbot`` label +O-SystemZ +A-inline-assembly
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Nov 10, 2024
Rollup merge of rust-lang#131258 - taiki-e:s390x-stabilize-asm, r=Amanieu

Stabilize s390x inline assembly

This stabilizes inline assembly for s390x (SystemZ).

Corresponding reference PR: rust-lang/reference#1643

---

From the requirements of stabilization mentioned in rust-lang#93335

> Each architecture needs to be reviewed before stabilization:

> - It must have clobber_abi.

Done in rust-lang#130630.

> - It must be possible to clobber every register that is normally clobbered by a function call.

Done in the PR that added support for clobber_abi.

> - Generally review that the exposed register classes make sense.

The followings can be used as input/output:

- `reg` (`r[0-10]`, `r[12-14]`): General-purpose register

- `reg_addr` (`r[1-10]`, `r[12-14]`): General-purpose register except `r0` which is evaluated as zero in an address context

  This class is needed because `r0`, which may be allocated when using the `reg` class, cannot be used as a register in certain contexts. This is identical to the `a` constraint in LLVM and GCC. See rust-lang#119431 for details.

- `freg` (`f[0-15]`): Floating-point register

The followings are clobber-only:

- `vreg` (`v[0-31]`): Vector register

  Technically `vreg` should be able to accept `#[repr(simd)]` types as input/output if the unstable `vector` target feature added is enabled, but `core::arch` has no s390x vector type and both `#[repr(simd)]` and `core::simd` are unstable. Everything related is unstable, so the fact that this is currently a clobber-only should not be considered a stabilization blocker. (rust-lang#130869 tracks unstable stuff here)

- `areg` (`a[2-15]`): Access register

All of the above register classes except `reg_addr` are needed for `clobber_abi`.

The followings cannot be used as operands for inline asm (see also [getReservedRegs](https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp#L258-L282) and [SystemZELFRegisters](https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.h#L107-L128) in LLVM):

- `r11`: frame pointer
- `r15`: stack pointer
- `a0`, `a1`: Reserved for system use
- `c[0-15]` (control register)  Reserved by the kernel

Although not listed in the above requirements, `preserves_flags` is implemented in rust-lang#111331.

---

cc ``@uweigand``

r? ``@Amanieu``

``@rustbot`` label +O-SystemZ +A-inline-assembly
RalfJung pushed a commit to RalfJung/miri that referenced this issue Nov 10, 2024
Stabilize s390x inline assembly

This stabilizes inline assembly for s390x (SystemZ).

Corresponding reference PR: rust-lang/reference#1643

---

From the requirements of stabilization mentioned in rust-lang/rust#93335

> Each architecture needs to be reviewed before stabilization:

> - It must have clobber_abi.

Done in rust-lang/rust#130630.

> - It must be possible to clobber every register that is normally clobbered by a function call.

Done in the PR that added support for clobber_abi.

> - Generally review that the exposed register classes make sense.

The followings can be used as input/output:

- `reg` (`r[0-10]`, `r[12-14]`): General-purpose register

- `reg_addr` (`r[1-10]`, `r[12-14]`): General-purpose register except `r0` which is evaluated as zero in an address context

  This class is needed because `r0`, which may be allocated when using the `reg` class, cannot be used as a register in certain contexts. This is identical to the `a` constraint in LLVM and GCC. See rust-lang/rust#119431 for details.

- `freg` (`f[0-15]`): Floating-point register

The followings are clobber-only:

- `vreg` (`v[0-31]`): Vector register

  Technically `vreg` should be able to accept `#[repr(simd)]` types as input/output if the unstable `vector` target feature added is enabled, but `core::arch` has no s390x vector type and both `#[repr(simd)]` and `core::simd` are unstable. Everything related is unstable, so the fact that this is currently a clobber-only should not be considered a stabilization blocker. (rust-lang/rust#130869 tracks unstable stuff here)

- `areg` (`a[2-15]`): Access register

All of the above register classes except `reg_addr` are needed for `clobber_abi`.

The followings cannot be used as operands for inline asm (see also [getReservedRegs](https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp#L258-L282) and [SystemZELFRegisters](https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.h#L107-L128) in LLVM):

- `r11`: frame pointer
- `r15`: stack pointer
- `a0`, `a1`: Reserved for system use
- `c[0-15]` (control register)  Reserved by the kernel

Although not listed in the above requirements, `preserves_flags` is implemented in rust-lang/rust#111331.

---

cc ``@uweigand``

r? ``@Amanieu``

``@rustbot`` label +O-SystemZ +A-inline-assembly
mati865 pushed a commit to mati865/rust that referenced this issue Nov 12, 2024
…nieu

Stabilize s390x inline assembly

This stabilizes inline assembly for s390x (SystemZ).

Corresponding reference PR: rust-lang/reference#1643

---

From the requirements of stabilization mentioned in rust-lang#93335

> Each architecture needs to be reviewed before stabilization:

> - It must have clobber_abi.

Done in rust-lang#130630.

> - It must be possible to clobber every register that is normally clobbered by a function call.

Done in the PR that added support for clobber_abi.

> - Generally review that the exposed register classes make sense.

The followings can be used as input/output:

- `reg` (`r[0-10]`, `r[12-14]`): General-purpose register

- `reg_addr` (`r[1-10]`, `r[12-14]`): General-purpose register except `r0` which is evaluated as zero in an address context

  This class is needed because `r0`, which may be allocated when using the `reg` class, cannot be used as a register in certain contexts. This is identical to the `a` constraint in LLVM and GCC. See rust-lang#119431 for details.

- `freg` (`f[0-15]`): Floating-point register

The followings are clobber-only:

- `vreg` (`v[0-31]`): Vector register

  Technically `vreg` should be able to accept `#[repr(simd)]` types as input/output if the unstable `vector` target feature added is enabled, but `core::arch` has no s390x vector type and both `#[repr(simd)]` and `core::simd` are unstable. Everything related is unstable, so the fact that this is currently a clobber-only should not be considered a stabilization blocker. (rust-lang#130869 tracks unstable stuff here)

- `areg` (`a[2-15]`): Access register

All of the above register classes except `reg_addr` are needed for `clobber_abi`.

The followings cannot be used as operands for inline asm (see also [getReservedRegs](https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp#L258-L282) and [SystemZELFRegisters](https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.h#L107-L128) in LLVM):

- `r11`: frame pointer
- `r15`: stack pointer
- `a0`, `a1`: Reserved for system use
- `c[0-15]` (control register)  Reserved by the kernel

Although not listed in the above requirements, `preserves_flags` is implemented in rust-lang#111331.

---

cc ``@uweigand``

r? ``@Amanieu``

``@rustbot`` label +O-SystemZ +A-inline-assembly
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Nov 21, 2024
…er-errors,uweigand

Support s390x z13 vector ABI

cc rust-lang#130869

This resolves the following fixmes:
- https://github.com/rust-lang/rust/blob/58420a065b68ecb3eec03b942740c761cdadd5c4/compiler/rustc_target/src/abi/call/s390x.rs#L1-L2
- https://github.com/rust-lang/rust/blob/58420a065b68ecb3eec03b942740c761cdadd5c4/compiler/rustc_target/src/spec/targets/s390x_unknown_linux_gnu.rs#L9-L11

Refs: Section 1.2.3 "Parameter Passing" and section 1.2.5 "Return Values" in ELF Application Binary Interface s390x Supplement, Version 1.6.1 (lzsabi_s390x.pdf in https://github.com/IBM/s390x-abi/releases/tag/v1.6.1)

This PR extends ~~rust-lang#127731 rust-lang#132173 (merged) 's ABI check to handle cases where `vector` target feature is disabled.
If we do not do ABI check, we run into the ABI problems as described in rust-lang#116558 and rust-lang#130869 (comment), and the problem of the compiler generating strange code (rust-lang#131586 (comment)).

cc `@uweigand`

`@rustbot` label +O-SystemZ +A-ABI
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Nov 21, 2024
Rollup merge of rust-lang#131586 - taiki-e:s390x-vector-abi, r=compiler-errors,uweigand

Support s390x z13 vector ABI

cc rust-lang#130869

This resolves the following fixmes:
- https://github.com/rust-lang/rust/blob/58420a065b68ecb3eec03b942740c761cdadd5c4/compiler/rustc_target/src/abi/call/s390x.rs#L1-L2
- https://github.com/rust-lang/rust/blob/58420a065b68ecb3eec03b942740c761cdadd5c4/compiler/rustc_target/src/spec/targets/s390x_unknown_linux_gnu.rs#L9-L11

Refs: Section 1.2.3 "Parameter Passing" and section 1.2.5 "Return Values" in ELF Application Binary Interface s390x Supplement, Version 1.6.1 (lzsabi_s390x.pdf in https://github.com/IBM/s390x-abi/releases/tag/v1.6.1)

This PR extends ~~rust-lang#127731 rust-lang#132173 (merged) 's ABI check to handle cases where `vector` target feature is disabled.
If we do not do ABI check, we run into the ABI problems as described in rust-lang#116558 and rust-lang#130869 (comment), and the problem of the compiler generating strange code (rust-lang#131586 (comment)).

cc `@uweigand`

`@rustbot` label +O-SystemZ +A-ABI
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Nov 25, 2024
…anieu

Support input/output in vector registers of s390x inline assembly (under asm_experimental_reg feature)

This extends currently clobber-only vector registers (`vreg`) support to allow passing `#[repr(simd)]` types, floats (f32/f64/f128), and integers (i32/i64/i128) as input/output.

This is unstable and gated under new `#![feature(asm_experimental_reg)]` (tracking issue: rust-lang#133416). If the feature is not enabled, only clober is supported as before.

| Architecture | Register class | Target feature | Allowed types |
| ------------ | -------------- | -------------- | -------------- |
| s390x | `vreg` | `vector` | `i32`, `f32`, `i64`, `f64`, `i128`, `f128`, `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x4`, `f64x2` |

This matches the list of types that are supported by the vector registers in LLVM:
https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.td#L301-L313

In addition to `core::simd` types and floats listed above, custom `#[repr(simd)]` types of the same size and type are also allowed. All allowed types other than i32/f32/i64/f64/i128, and relevant target features are currently unstable.

Currently there is no SIMD type for s390x in `core::arch`, but this is tracked in rust-lang#130869.

cc rust-lang#130869 about vector facility support in s390x
cc rust-lang#125398 & rust-lang#116909 about f128 support in asm

`@rustbot` label +O-SystemZ +A-inline-assembly
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Nov 25, 2024
Rollup merge of rust-lang#131664 - taiki-e:s390x-asm-vreg-inout, r=Amanieu

Support input/output in vector registers of s390x inline assembly (under asm_experimental_reg feature)

This extends currently clobber-only vector registers (`vreg`) support to allow passing `#[repr(simd)]` types, floats (f32/f64/f128), and integers (i32/i64/i128) as input/output.

This is unstable and gated under new `#![feature(asm_experimental_reg)]` (tracking issue: rust-lang#133416). If the feature is not enabled, only clober is supported as before.

| Architecture | Register class | Target feature | Allowed types |
| ------------ | -------------- | -------------- | -------------- |
| s390x | `vreg` | `vector` | `i32`, `f32`, `i64`, `f64`, `i128`, `f128`, `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x4`, `f64x2` |

This matches the list of types that are supported by the vector registers in LLVM:
https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.td#L301-L313

In addition to `core::simd` types and floats listed above, custom `#[repr(simd)]` types of the same size and type are also allowed. All allowed types other than i32/f32/i64/f64/i128, and relevant target features are currently unstable.

Currently there is no SIMD type for s390x in `core::arch`, but this is tracked in rust-lang#130869.

cc rust-lang#130869 about vector facility support in s390x
cc rust-lang#125398 & rust-lang#116909 about f128 support in asm

`@rustbot` label +O-SystemZ +A-inline-assembly
bjorn3 pushed a commit to rust-lang/rustc_codegen_cranelift that referenced this issue Nov 28, 2024
Support input/output in vector registers of s390x inline assembly (under asm_experimental_reg feature)

This extends currently clobber-only vector registers (`vreg`) support to allow passing `#[repr(simd)]` types, floats (f32/f64/f128), and integers (i32/i64/i128) as input/output.

This is unstable and gated under new `#![feature(asm_experimental_reg)]` (tracking issue: rust-lang/rust#133416). If the feature is not enabled, only clober is supported as before.

| Architecture | Register class | Target feature | Allowed types |
| ------------ | -------------- | -------------- | -------------- |
| s390x | `vreg` | `vector` | `i32`, `f32`, `i64`, `f64`, `i128`, `f128`, `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x4`, `f64x2` |

This matches the list of types that are supported by the vector registers in LLVM:
https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.td#L301-L313

In addition to `core::simd` types and floats listed above, custom `#[repr(simd)]` types of the same size and type are also allowed. All allowed types other than i32/f32/i64/f64/i128, and relevant target features are currently unstable.

Currently there is no SIMD type for s390x in `core::arch`, but this is tracked in rust-lang/rust#130869.

cc rust-lang/rust#130869 about vector facility support in s390x
cc rust-lang/rust#125398 & rust-lang/rust#116909 about f128 support in asm

`@rustbot` label +O-SystemZ +A-inline-assembly
@folkertdev
Copy link
Contributor

folkertdev commented Jan 12, 2025

Adding some vector intrinsics sounds fun, but is there a design for the api for core::arch::s390x?

I'm guessing that function-wise it should roughly follow the api of vecintrin.h, which is reasonably straightforward, but I'm wondering how we should represent the C vector type.

Based on the llvm test suite, this type is fairly flexible. What I'm looking for, I guess, is something like __m128i and __m128d, but for s390x, and it looks like that does not really exist (yet).

So there could just be a type vector, defined as (roughly)

pub struct vector([f32; 4]);

Which could then specify that

    /// Internally this type may be viewed as:
    ///
    /// * `i8x16` - sixteen `i8` variables packed together
    /// * `i16x8` - eight `i16` variables packed together
    /// * `i32x4` - four `i32` variables packed together
    /// * `i64x2` - two `i64` variables packed together
    /// * `f32x4` - four `f32` variables packed together
    /// * `f64x2` - two `f64` variables packed together

That is minimal type safety, but we don't have to invent something new.

Alternatively a vectori and vectord could be added, where

  • vectori covers all the integer variants
  • vector is just f32x4
  • vectord is just f64x2

But, I don't really have experience with using the s390x intrinsics in practice, and I haven't been able to find code (besides LLVM) that uses these functions, so I'm not sure what the best approach is.


edit: maybe this approach from PowerPC could work?

https://github.com/rust-lang/stdarch/blob/b04d87ca1324187a382af0c6950a97e899a3e178/crates/core_arch/src/powerpc/altivec.rs#L23-L48

types! {
    #![unstable(feature = "stdarch_powerpc", issue = "111145")]

    /// PowerPC-specific 128-bit wide vector of sixteen packed `i8`
    pub struct vector_signed_char(16 x i8);
    /// PowerPC-specific 128-bit wide vector of sixteen packed `u8`
    pub struct vector_unsigned_char(16 x u8);

    /// PowerPC-specific 128-bit wide vector mask of sixteen packed elements
    pub struct vector_bool_char(16 x i8);
    /// PowerPC-specific 128-bit wide vector of eight packed `i16`
    pub struct vector_signed_short(8 x i16);
    /// PowerPC-specific 128-bit wide vector of eight packed `u16`
    pub struct vector_unsigned_short(8 x u16);
    /// PowerPC-specific 128-bit wide vector mask of eight packed elements
    pub struct vector_bool_short(8 x i16);
    // pub struct vector_pixel(???);
    /// PowerPC-specific 128-bit wide vector of four packed `i32`
    pub struct vector_signed_int(4 x i32);
    /// PowerPC-specific 128-bit wide vector of four packed `u32`
    pub struct vector_unsigned_int(4 x u32);
    /// PowerPC-specific 128-bit wide vector mask of four packed elements
    pub struct vector_bool_int(4 x i32);
    /// PowerPC-specific 128-bit wide vector of four packed `f32`
    pub struct vector_float(4 x f32);
}

@uweigand
Copy link
Contributor

I'm guessing that function-wise it should roughly follow the api of vecintrin.h, which is reasonably straightforward, but I'm wondering how we should represent the C vector type.

Based on the llvm test suite, this type is fairly flexible. What I'm looking for, I guess, is something like __m128i and __m128d, but for s390x, and it looks like that does not really exist (yet).

Well, the s390x (or ppc64) vector isn't really a type, it's a keyword used to define a family of types. In a sense, it's just syntactic sugar for __attribute__((vector_size (16))) in GCC or LLVM (which would be sort-of equivalent to repr(simd) in Rust I guess). Note that the current ppc64 vector keyword is a direct descendant of the original AltiVec specification, and the s390x vector keyword was directly modeled after and is mostly compatible with the ppc64 one.

Now, my understanding of the __m128i or __m128d types on other platforms is that those represent (partially) untyped contents of a vector register, and serve as the interface to vector intrinsics, but cannot really be used in any other contexts. So you might use vector types defined via __attribute__((vector_size)) to perform language-level operations and then cast to these special types when using vector intrinsics.

We don't have any equivalent of such types on s390x/ppc64. Rather, the recommendation is to use vector types defined by the vector keyword (or else __attribute__((vector_size)) - those are generally equivalent types) for both language-level operations and vector intrinsics.

This means that most of the intrinsics are overloaded - they can operate on multiple different vector types. E.g. something like vec_perm basically takes any vector type, while vec_mulh only takes signed integer types etc. This can be seen most easily when looking at the LLVM version of vecintrin.h (https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/vecintrin.h), which actually uses the LLVM extension __attibute__((__overloadable__)) to simulate C++ overloaded functions in C.

Now, I'm not sure what the best way would be to model an equivalent concept in Rust. The ppc64 way you show above is correct in the sense that it describes the list of instantiated types that can be defined via the vector keyword. But it doesn't model in any way that this keyword actually provides a generic mechanism to define vector types. Maybe a Rust equivalent could use generics in some form, where vector would be a trait, and vector intrinsics would be defined on that trait? But I don't think I understand details of Rust language semantics enough to come up with a full implementation of this ...

@folkertdev
Copy link
Contributor

The knowledge that the api is based on powerpc is very useful. That means the powerpc implementation is probably a good starting point.

The powerpc functions are actually generic: there is a lot of boilerplate and macro logic to provide a somewhat ergonomic api for the powerpc types. e.g.

https://doc.rust-lang.org/core/arch/powerpc64/fn.vec_add.html

pub unsafe fn vec_add<T, U>(a: T, b: U) -> <T as VectorAdd<U>>::Result
where
    T: VectorAdd<U>,

where VectorAdd is implemented for pairs of types where addition makes sense. And so on for the other functions.

    #[inline]
    #[target_feature(enable = "altivec")]
    #[cfg_attr(test, assert_instr(vaddubm))]
    pub unsafe fn vec_add_sc_sc(
        a: vector_signed_char,
        b: vector_signed_char,
    ) -> vector_signed_char {
        simd_add(a, b)
    }
    #[unstable(feature = "stdarch_powerpc", issue = "111145")]
    impl VectorAdd<vector_signed_char> for vector_signed_char {
        type Result = vector_signed_char;
        #[inline]
        #[target_feature(enable = "altivec")]
        unsafe fn vec_add(self, other: vector_signed_char) -> Self::Result {
            vec_add_sc_sc(self, other)
        }
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ABI Area: Concerning the application binary interface (ABI) A-inline-assembly Area: Inline assembly (`asm!(…)`) A-SIMD Area: SIMD (Single Instruction Multiple Data) O-SystemZ Target: SystemZ processors (s390x) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants