diff --git a/riscv-cc.adoc b/riscv-cc.adoc index f6ab1882..86f15346 100644 --- a/riscv-cc.adoc +++ b/riscv-cc.adoc @@ -428,6 +428,179 @@ NOTE: `setjmp`/`longjmp` follow the standard calling convention, which clobbers all vector registers. Hence, the standard vector calling convention variant won't disrupt the `jmp_buf` ABI. +NOTE: Functions that use the standard vector calling convention +variant follow an additional name mangling rule for {Cpp}. +For more details, see <>. + +=== Standard Fixed-length Vector Calling Convention Variant + +This section defines the calling convention variant for fixed-length vectors. +The intention of this variant is to pass fixed-length vectors via the vector +register. For the definition of a fixed-length vector, see +<>. + +This variant is based on the standard vector calling convention variant: +the register convention and the rules for passing arguments and return values +are the same. + +NOTE: The reason we define a separate calling convention variant is that we +would like to define a flexible convention to utilize the variable length +feature in the vector extension, also considering embedded vector extensions, +such as `Zve32x`. + +ABI_VLEN refers to the width of a vector register in the calling convention +variant. + +The ABI_VLEN must be no wider than the ISA's VLEN, meaning that the ISA may +support wider vector registers than the ABI, but the ABI's VLEN cannot exceed +the ISA's VLEN. + +ABI_VLEN represents the width (in bits) of the vector register available in the +calling convention for fixed-length vectors. ABI_VLEN can vary from 32 bits +(as in `Zve32x`) up to the maximum supported by the ISA. The flexibility of +ABI_VLEN enables the convention to adapt to both low-end embedded systems and +high-performance processors that utilize wider vector registers. + +The ABI_VLEN is a parameter of this calling convention variant. It could be set +by the command line option for the compiler or specified by the function +attribute in the source code. + +NOTE: We suggest the toolchain implementation set the default value of ABI_VLEN +to 128, as it's the most common minimal requirement. However, it is not fixed +to 128, since the ISA allows the VLEN to be only 32 bits or 64 bits. This +also enables the utilization of the capacity of longer VLEN. Users can build +with an optimized library with larger ABI_VLEN for better utilization of those +cores with longer VLEN. + +A fixed-length vector argument is passed in a vector argument register if the +size of the vector is less than or equal to ABI_VLEN bit. + +[NOTE] +=== +Even in the absence of specific vector extension support for certain element +types, such as `__bf16`, `_Float16`, `float`, or `double`, the standard +fixed-length vector calling convention rules still apply. For example, +even without the support of extensions like `Zvfbfmin`, `Zve32f`, or `Zve64d`, +these element types will be passed according to the calling convention rules +outlined here. + +Additionally, data types such as `__int128_t`, which currently do not +have direct support in any vector extension, will also follow these rules. +This design ensures that the calling convention remains forward-compatible, +minimizing the need for continuous adjustments as new extensions and data types +are introduced in the future. + +The consistency in applying these rules to unsupported element types guarantees +a smooth transition when future vector extensions become available, allowing for +seamless integration of new features without requiring significant changes to +the calling convention. +=== + +A fixed-length vector argument is passed in two vector argument registers, +similar to vector data arguments with LMUL=2, if the size of the vector is +greater than ABI_VLEN bit and less than or equal to 2×ABI_VLEN bit. + +A fixed-length vector argument is passed in four vector argument registers, +similar to vector data arguments with LMUL=4, if the size of the vector is +greater than 2×ABI_VLEN bit and less than or equal to 4×ABI_VLEN bit. + +A fixed-length vector argument is passed in eight vector argument registers, +similar to vector data arguments with LMUL=8, if the size of the vector is +greater than 4×ABI_VLEN bit and less than or equal to 8×ABI_VLEN bit. + +[NOTE] +=== +Fixed-length vectors that are not a power-of-2 in size will be rounded up to +the next power-of-2 length for the purpose of register allocation and handling. +For instance, a vector type like `int32x3_t` (which contains three 32-bit +integers) will be treated as an `int32x4_t` (a 128-bit vector, as LMUL=1) in +the ABI, and passed accordingly. This ensures consistency in how vectors are +handled and simplifies the process of argument passing. + +Example: Consider an `int32x3_t` vector (three 32-bit integers): +- The vector's total size is 96 bits, which is not a power of 2. +- The ABI will round up the size to 128 bits (corresponding to `int32x4_t`), + meaning the vector will be passed using one vector argument register when + ABI_VLEN=128. + +This rule applies to all non-power-of-2 fixed-length vectors, ensuring they +are treated consistently across different ABI_VLEN settings. +=== + +A fixed-length vector argument is passed by reference and is replaced in the +argument list with the address if it is larger than 8×ABI_VLEN bit or if +there is a shortage of vector argument registers. + +A struct containing members with all fixed-length vectors will be passed in +vector argument registers like a vector tuple type if all members have the +same length, the length is less than or equal to 4×ABI_VLEN bit, and the size of +the whole struct is less than or equal to 8×ABI_VLEN bit. +If there are not enough vector argument registers to pass the entire struct, +it will pass by reference and is replaced in the argument list with the address. +Otherwise, it will use the rule defined in the hardware floating-point calling +convention. + +A struct containing just one fixed-length vector or a fixed-length vector +array of length one, it will be flattened as a single fixed-length vector argument +if the size of the vector is less than or equal to 8×ABI_VLEN bit. + +Struct with zero-length fixed-length arrays use the rule defined in the hardware +floating-point calling convention, which means it won't consume vector argument +register eitehr in C or {Cpp}. + +A struct containing just one fixed-length vector array is passed as though it +were a vector tuple type if the size of the base element for the array is less than +or equal to 8×ABI_VLEN bit, and the size of the array is less than 8×ABI_VLEN +bit. +If there are not enough vector argument registers to pass the entire struct, +it will pass by reference and is replaced in the argument list with the address. +Otherwise, it will use the rule defined in the hardware floating-point +calling convention. + +Unions with fixed-length vectors are always passed according to the integer +calling convention. + +The details of vector argument register rules are the same as the standard +vector calling convention variant. + +NOTE: Functions that use the standard fixed-length vector calling convention +variant must be marked with STO_RISCV_VARIANT_CC. See <> +for the meaning of STO_RISCV_VARIANT_CC. + +NOTE: Functions that use the standard fixed-length vector calling convention +variant follow an additional name mangling rule for {Cpp}. +For more details, see <>. + +[NOTE] +==== +When ABI_VLEN is smaller than the VLEN, the number of vector argument +registers utilized remains unchanged. However, in such cases, values are only +placed in a portion of these vector argument registers, corresponding to the +size of ABI_VLEN. The remaining portion of the vector argument registers, which +extends beyond the ABI_VLEN, will remain idle. This means that while the full +capacity of the vector argument registers may not be used, the allocation of +these registers do not change, ensuring consistency in register usage regardless +of the ABI_VLEN to VLEN ratio. + +Example: With ABI_VLEN at 32 bits and VLEN at 128 bits, consider passing an +`int32x4_t` parameter (four 32-bit integers). + +Allocation: Four vector argument registers are allocated for +`int32x4_t`, based on LMUL=4. + +Utilization: All four integers are placed in the first vector register, +utilizing its full 128-bit capacity (VLEN), despite ABI_VLEN being 32 bits. + +Remaining Registers: The other three allocated registers remain unused and idle. +==== + +NOTE: In a single compilation unit, different functions may use different +ABI_VLEN values. This means that ABI_VLEN is not uniform across the entire unit, +allowing for function-specific optimization. However, this necessitates that +users ensure consistency in ABI_VLEN between calling and called functions. It +is the user's responsibility to verify that the ABI_VLEN matches on both sides +of a function call to ensure correct operation and data handling. + === ILP32E Calling Convention IMPORTANT: RV32E is not a ratified base ISA and so we cannot guarantee the diff --git a/riscv-elf.adoc b/riscv-elf.adoc index 08d948a5..f7cd7796 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -202,6 +202,34 @@ See the "Type encodings" section in _Itanium {Cpp} ABI_ for more detail on how to mangle types. Note that `__bf16` is mangled in the same way as `std::bfloat16_t`. +=== Name Mangling for Standard Calling Convention Variant + +Function use standard calling convention variant have to append extra ABI tag to +the function name mangling, the rule are same as the "ABI tags" section in +_Itanium {Cpp} ABI_. + +.ABI Tag name for calling convention variants +[cols="5,2"] +[width=80%] +|=== +| Name | ABI tag name + +| Standard vector calling convention variant | riscv_vector_cc +|=== + + +For example: +[,c] +---- + __attribute__((riscv_vector_cc)) void foo(); +---- + +is mangled as +[,c] +---- + _Z3fooB15riscv_vector_ccv +---- + === Name Mangling for Vector Data Types, Vector Mask Types and Vector Tuple Types. The vector data types and vector mask types, as defined in the section