Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard Fixed-length Vector Calling Convention Variant #418

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
173 changes: 173 additions & 0 deletions riscv-cc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,179 @@ NOTE: `setjmp`/`longjmp` follow the standard calling convention, which clobbers
all vector registers. Hence, the standard vector calling convention variant
won't disrupt the `jmp_buf` ABI.

NOTE: Functions that use the standard vector calling convention
variant follow an additional name mangling rule for {Cpp}.
For more details, see <<Name Mangling for Standard Calling Convention Variant>>.

=== Standard Fixed-length Vector Calling Convention Variant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variant itself seems fine, modulo nits, but how are we planning to enable it?

If it's automatically used by -march=rva23 -mabi=ilp32d that will create major compatibility issues for binary distributions that use a fixed ABI and allow mixing packages at different architecture levels (either as an explicit user action, or as an implementation detail when rebuilding the distribution to change the architecture requirement).

If a new -mabi= value is required to enable use of the variant, it will be usable on closed systems where all packages are built at once, but not on binary distributions, since there is no expectation that binary code built with different -mabi= options is interoperable at all. This will include Debian and Alpine and might include Android and Fedora if their ABIs are finalized prior to the acceptance of this PR.

If it's enabled on a per-function basis using an attribute, or automatically for functions not visible across DSO boundaries, then it's effectively part of the definition of the attribute or a compiler implementation detail and may belong in riscv-c-api-doc or gccint, not here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My expectation is that should be enabled by per-function basis by attribute, and I think that should have a riscv-c-api-doc PR for that, will send that in the next few days.


This section defines the calling convention variant for fixed-length vectors.
The intention of this variant is to pass fixed-length vectors via the vector
register. For the definition of a fixed-length vector, see
<<Fixed-length vector>>.

This variant is based on the standard vector calling convention variant:
the register convention and the rules for passing arguments and return values
are the same.

NOTE: The reason we define a separate calling convention variant is that we
would like to define a flexible convention to utilize the variable length
feature in the vector extension, also considering embedded vector extensions,
such as `Zve32x`.

ABI_VLEN refers to the width of a vector register in the calling convention
variant.
kito-cheng marked this conversation as resolved.
Show resolved Hide resolved

The ABI_VLEN must be no wider than the ISA's VLEN, meaning that the ISA may
support wider vector registers than the ABI, but the ABI's VLEN cannot exceed
the ISA's VLEN.

ABI_VLEN represents the width (in bits) of the vector register available in the
calling convention for fixed-length vectors. ABI_VLEN can vary from 32 bits
(as in `Zve32x`) up to the maximum supported by the ISA. The flexibility of
ABI_VLEN enables the convention to adapt to both low-end embedded systems and
high-performance processors that utilize wider vector registers.

The ABI_VLEN is a parameter of this calling convention variant. It could be set
by the command line option for the compiler or specified by the function
attribute in the source code.

NOTE: We suggest the toolchain implementation set the default value of ABI_VLEN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't possible unless V or Zvl128b is in the ISA string since ABI_VLEN must be less than or equal to the ISA VLEN.

to 128, as it's the most common minimal requirement. However, it is not fixed
to 128, since the ISA allows the VLEN to be only 32 bits or 64 bits. This
also enables the utilization of the capacity of longer VLEN. Users can build
with an optimized library with larger ABI_VLEN for better utilization of those
cores with longer VLEN.

A fixed-length vector argument is passed in a vector argument register if the
size of the vector is less than or equal to ABI_VLEN bit.

[NOTE]
===
Even in the absence of specific vector extension support for certain element
types, such as `__bf16`, `_Float16`, `float`, or `double`, the standard
fixed-length vector calling convention rules still apply. For example,
even without the support of extensions like `Zvfbfmin`, `Zve32f`, or `Zve64d`,
these element types will be passed according to the calling convention rules
outlined here.

Additionally, data types such as `__int128_t`, which currently do not
have direct support in any vector extension, will also follow these rules.
This design ensures that the calling convention remains forward-compatible,
minimizing the need for continuous adjustments as new extensions and data types
are introduced in the future.

The consistency in applying these rules to unsupported element types guarantees
a smooth transition when future vector extensions become available, allowing for
seamless integration of new features without requiring significant changes to
the calling convention.
===

A fixed-length vector argument is passed in two vector argument registers,
kito-cheng marked this conversation as resolved.
Show resolved Hide resolved
similar to vector data arguments with LMUL=2, if the size of the vector is
greater than ABI_VLEN bit and less than or equal to 2×ABI_VLEN bit.

A fixed-length vector argument is passed in four vector argument registers,
similar to vector data arguments with LMUL=4, if the size of the vector is
greater than 2×ABI_VLEN bit and less than or equal to 4×ABI_VLEN bit.

A fixed-length vector argument is passed in eight vector argument registers,
similar to vector data arguments with LMUL=8, if the size of the vector is
greater than 4×ABI_VLEN bit and less than or equal to 8×ABI_VLEN bit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit -> bits in two places on this line


[NOTE]
===
Fixed-length vectors that are not a power-of-2 in size will be rounded up to
the next power-of-2 length for the purpose of register allocation and handling.
For instance, a vector type like `int32x3_t` (which contains three 32-bit
integers) will be treated as an `int32x4_t` (a 128-bit vector, as LMUL=1) in
the ABI, and passed accordingly. This ensures consistency in how vectors are
handled and simplifies the process of argument passing.

Example: Consider an `int32x3_t` vector (three 32-bit integers):
- The vector's total size is 96 bits, which is not a power of 2.
- The ABI will round up the size to 128 bits (corresponding to `int32x4_t`),
meaning the vector will be passed using one vector argument register when
ABI_VLEN=128.

This rule applies to all non-power-of-2 fixed-length vectors, ensuring they
are treated consistently across different ABI_VLEN settings.
===

A fixed-length vector argument is passed by reference and is replaced in the
argument list with the address if it is larger than 8×ABI_VLEN bit or if
there is a shortage of vector argument registers.

A struct containing members with all fixed-length vectors will be passed in
vector argument registers like a vector tuple type if all members have the
same length, the length is less than or equal to 4×ABI_VLEN bit, and the size of
the whole struct is less than or equal to 8×ABI_VLEN bit.
If there are not enough vector argument registers to pass the entire struct,
it will pass by reference and is replaced in the argument list with the address.
Otherwise, it will use the rule defined in the hardware floating-point calling
convention.

A struct containing just one fixed-length vector or a fixed-length vector
array of length one, it will be flattened as a single fixed-length vector argument
if the size of the vector is less than or equal to 8×ABI_VLEN bit.

Struct with zero-length fixed-length arrays use the rule defined in the hardware
floating-point calling convention, which means it won't consume vector argument
register eitehr in C or {Cpp}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either*


A struct containing just one fixed-length vector array is passed as though it
were a vector tuple type if the size of the base element for the array is less than
or equal to 8×ABI_VLEN bit, and the size of the array is less than 8×ABI_VLEN
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
or equal to 8×ABI_VLEN bit, and the size of the array is less than 8×ABI_VLEN
or equal to 4×ABI_VLEN bit, and the size of the array is less than or equal to 8×ABI_VLEN

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the array would have >=2 elements here, so I think the length can't be 8xABI_VLEN.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array with length 1 is legal :P

bit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit -> bits

If there are not enough vector argument registers to pass the entire struct,
it will pass by reference and is replaced in the argument list with the address.
Otherwise, it will use the rule defined in the hardware floating-point
calling convention.

Unions with fixed-length vectors are always passed according to the integer
calling convention.

The details of vector argument register rules are the same as the standard
vector calling convention variant.

NOTE: Functions that use the standard fixed-length vector calling convention
variant must be marked with STO_RISCV_VARIANT_CC. See <<Dynamic Linking>>
for the meaning of STO_RISCV_VARIANT_CC.

NOTE: Functions that use the standard fixed-length vector calling convention
variant follow an additional name mangling rule for {Cpp}.
For more details, see <<Name Mangling for Standard Calling Convention Variant>>.

[NOTE]
====
When ABI_VLEN is smaller than the VLEN, the number of vector argument
registers utilized remains unchanged. However, in such cases, values are only
placed in a portion of these vector argument registers, corresponding to the
size of ABI_VLEN. The remaining portion of the vector argument registers, which
extends beyond the ABI_VLEN, will remain idle. This means that while the full
capacity of the vector argument registers may not be used, the allocation of
these registers do not change, ensuring consistency in register usage regardless
of the ABI_VLEN to VLEN ratio.

Example: With ABI_VLEN at 32 bits and VLEN at 128 bits, consider passing an
`int32x4_t` parameter (four 32-bit integers).

Allocation: Four vector argument registers are allocated for
`int32x4_t`, based on LMUL=4.

Utilization: All four integers are placed in the first vector register,
utilizing its full 128-bit capacity (VLEN), despite ABI_VLEN being 32 bits.

Remaining Registers: The other three allocated registers remain unused and idle.
====

NOTE: In a single compilation unit, different functions may use different
ABI_VLEN values. This means that ABI_VLEN is not uniform across the entire unit,
allowing for function-specific optimization. However, this necessitates that
users ensure consistency in ABI_VLEN between calling and called functions. It
is the user's responsibility to verify that the ABI_VLEN matches on both sides
of a function call to ensure correct operation and data handling.

=== ILP32E Calling Convention

IMPORTANT: RV32E is not a ratified base ISA and so we cannot guarantee the
Expand Down
28 changes: 28 additions & 0 deletions riscv-elf.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,34 @@ See the "Type encodings" section in _Itanium {Cpp} ABI_
for more detail on how to mangle types. Note that `__bf16` is mangled in the
same way as `std::bfloat16_t`.

=== Name Mangling for Standard Calling Convention Variant

Function use standard calling convention variant have to append extra ABI tag to
the function name mangling, the rule are same as the "ABI tags" section in
_Itanium {Cpp} ABI_.

.ABI Tag name for calling convention variants
[cols="5,2"]
[width=80%]
|===
| Name | ABI tag name

| Standard vector calling convention variant | riscv_vector_cc
|===


For example:
[,c]
----
__attribute__((riscv_vector_cc)) void foo();
----

is mangled as
[,c]
----
_Z3fooB15riscv_vector_ccv
----

=== Name Mangling for Vector Data Types, Vector Mask Types and Vector Tuple Types.

The vector data types and vector mask types, as defined in the section
Expand Down
Loading