An important goal of the Slang project is that the rules for how shader parameters get assigned to register
s/binding
s is completely deterministic, so that users can rely on the compiler's behavior.
This document will attempt to explain the rules that Slang employs at a high level.
Eventually it might evolve into a formal specification of the expected behavior.
The whole point of having a deterministic layout approach is the guarantees that it gives to users, so we will start by explicitly stating the guarantees that users can rely upon:
-
A single top-level shader parameter will always occupy a contiguous range of bindings/registers for each resource type it consumes (e.g., a contiguous range of
t
registers, a contiguous range of bytes in acbuffer
, etc.). -
The amount of resources a parameter consumes depends only on its type, and top-level context in which it appears (e.g., is it in a
cbuffer
? an entry-point varying parameter? etc.). -
A shader parameter that is declared the same way in two different programs will get the same amount of resources (registers/bytes) allocated for it in both programs, but it might get a different starting offset/register.
-
Changing the bodies of functions in shader code cannot change the layout of shader parameters. In particular, just because a shader parameter is "dead" does not mean it gets eliminated.
-
If the user doesn't use explicit
register
/layout
modifiers to bind parameters, then each module will get a contiguous range of bindings, and the overall program will always use a contiguous range starting from zero for each resource type.
Layout is applied to a Slang compile request which comprises one or more translation units of user code, and zero or more import
ed modules.
The compile request also specifies zero or more entry points to be compiled, where each entry point identifies a function and a profile to use.
Layout is always done with respect to a chosen target, and different targets might compute the resource usage for types differently, or apply different alignment.
Within a single target there may also be different layout rules (e.g., the difference between GLSL std140
and std430
).
Layout proceeds in four main phases:
- Establish a global ordering on shader parameters
- Compute the resource requirements of each shader parameter
- Process shader parameters with fixed binding modifiers
- Allocate bindings to parameter without fixed binding modifiers
Shader parameters from the user's code always precede shader parameters from imported modules. The order of parameters in the user's code is derived by "walking" through the code as follows:
-
Walk through each translation unit in the order they were added via API (or the order they were listed on the command line)
-
Walk through each source file of a translation unit in the order they were added/listed
-
Walk through global-scope shader paramter declarations (global variables,
cbuffer
s, etc.) in the order they are listed in the (preprocessed) file. -
After all global parameters for a translation unit have been walked, walk through any entry points in the translation unit.
-
When walking through an entry point, walk through all of its function parameters (both uniforms and varyings) in order, and then walk the function result as a varying output parameter.
When dealing with global-scope parameters in the user's code, it is possible for the "same" parameter to appear in multiple translation units. Any two global shader parameters in user code with the same name are assumed to represent the same parameter, and will only be included in the global order at the first location where they are seen. It is an error for the different declarations to have a mismatch in type, or conflicting explicit bindings.
Parameters from import
ed modules are enumerated after the user code, using the order in which modules were first import
ed.
The order of parameters within each module is the same as when the module was compiled, which matches the ordering given above.
Each shader parameter computes its resource requirements based on its type, and how it is declared.
-
Global-scope parameters, entry point
uniform
parameters, andcbuffer
decalrations all use the "default" layout rules -
Entry point non-
uniform
parameters use "varying" layout rules, either input or output -
A few other special case rules exist (e.g., for laying out the elements of a
StructuredBuffer
), but most users will not need to worry about these
Note that the "default" rules are different for D3D and GL/Vulkan targets, because they have slightly different packing behavior.
Under the default rules simple scalar types (bool
, int
, float
, etc.) are laid out as "uniform" data (that is, bytes of ordinary memory).
In most cases, the size matches the expected data type size (although be aware that most targets treat bool
as a synonym for int
) and the alignment is the same as the size.
Vectors are laid out as N sequential scalars.
Under HLSL rules, a vector has the same alignment as its scalar type.
Under GLSL std140
rules, a vector has an alignment that is its size rounded up to the next power of two (so a float3
has float4
alignment).
"Opaque" types include resource/sampler types like Texture2D
and SamplerState
.
These consume a single "slot" of the appropriate category for the chosen API.
Note that when compiling for D3D, a Texture2D
and a SamplerState
will consume different resources (t
and s
registers, respectively), but when compiling for Vulkan, they both consume the same resource ("descriptor table slot").
Opaque types currently all have an alignment of one.
A structure is laid out by initializing a counter for each resource type, and then processing fields sequential (in declaration order):
-
Compute resource usage for the field's type
-
Adjust counters based on the alignment of the field for each resource type where it has non-zero usage
-
Assign an offset to the field for each resource type where it has non-zero usage
-
Add the resource usage of the field to the counters
An important wrinkle is that when doing layout for HLSL, we must ensure that if a field with uniform data that is smaller than 16 bytes would straddle a 16-byte boundary, we advance to the next 16-byte aligned offset.
The overall alignment of a struct
is the maximum alignment of its fields or the default alignment (if it is larger).
The default alignment is 16 for both D3D and Vulkan targets.
The final resource usage of a struct
is rounded up to a multiple of the alignment for each resource type. Note that we allow a struct
to consume zero bytes of uniform storage.
It is important to note that a struct
type can use resources of many different kinds, so in general we cannot talk about the "size" of a type, but only its size for a particular kind of resource (uniform bytes, texture registers, etc.).
For uniform data, the size of the element type is rounded up to the target-specific minimum (e.g., 16 for D3D and Vulkan constant buffers) to arrive at the stride of the array. The total size of the array is then the stride times the element count.
For opaque resource types, the D3D case simply takes the stride to be the number of registers consumed by each element, and multiplies this by the element count.
For Vulkan, an array of resources uses only a single binding
, so that the stride is always zero for these resource kinds, and the resource usage of an array is the same as its element type.
The uniform part of an unsized array has the same stride as for the sized case, but an effectively infinite size.
For register/binding resource usage, a Vulkan unsized array is just like a sized one, while a D3D array will consume a full register space instead of individual registers.
To determine the resource usage of a constant buffer (either a cbuffer { ... }
declaration or a ConstantBuffer<T>
) we look at the resource usage of its element type.
If the element uses any uniform data, the constant buffer will use at least one constant-buffer register (or whatever the target-specific resource is). If the element uses any non-uniform data, that usage will be added to that of the constant buffer.
A parameter block is similar to a constant buffer. If the element type uses any uniform data, we compute resource usage for a constant buffer. We then add in any non-uniform resource usage for the element types.
If the target requires use of register spaces (e.g., for Vulkan), then a parameter block uses a single register space; otherwise it exposes the resource usage of its element type directly.
If the user put an explicit binding modifier on a parameter, and that modifier applies to the current target, then we use it and "reserve" space in the overall binding range.
Traditional HLSL register
modifiers only apply for D3D targets.
Slang currently allows GLSL-style layout(binding =...)
modifiers to be attached to shader parameters, and will use those modifiers for GL/Vulkan targets.
If two parameters reserve overlapping ranges, we currently issue an error. This may be downgraded to a warning for targets that support overlapping ranges.
Once ranges have been reserved for parameters with explicit bindings, the compiler goes through all parameters again, in the global order and assigns them bindings based on their resource requirements.
For each resource type used by a parameter, it is allocated the first contiguous range of resources of that type that have not been reserved.
In order to support struct
types that mix uniform and non-uniform data, the Slang compiler always "splits" these types.
For example, given:
struct LightInfo { float3 pos; Texture2D shadowMap; };
LightInfo gLight;
Slang will generate code like:
float3 gLight_pos;
Texture2D gLight_shadowMap;
In a simple case like the above, this doesn't affect layout at all, but once arrays get involved, the layout can be more complicated. Consider this case:
struct Pair { Texture2D a; Texture2D b; };
Pair gPairs[8];
The output from the splitting step is equivalent to:
Texture2D gPairs_a[8];
Texture2D gPairs_b[8];
While this transformation is critical for having a type layout algorithm that applies across all APIs (and also it is pretty much required to work around various bugs in downstream compilers), it has the important down-side that the value gPairs[0]
does not occupy a contiguous range of registers (although the top-level shader parameter gPairs
does).
The Slang reflection API will correctly report the information about this situation:
-
The "stride" of the
gPairs
array will be reported as one, becausegPairs[n+1].a
is always one register aftergPairs[n].a
. -
The offset of the
gPairs.b
field will be reported as 8, becausegPairs[0].b
will be 8 registers after the starting register forgPairs
.
The Slang API tries to provide the best information it can in this case, but it is still important for users who mix arrays and complex struct
types to know how the compiler will lay them out.
Generic type parameters complicate these layout rules.
For example, we cannot compute the exact resource requirements for a vector<T,3>
without knowing what the type T
is.
When computing layouts for fully specialized types or programs, no special considerations are needed: the rules as described in this document still apply. One important consequence to understand is that given a type like:
struct MyStuff<T>
{
int a;
T b;
int c;
}
the offset computed for the c
field depends on the concrete type that gets plugged in for T
.
We think this is the least surprising behavior for programmers who might be familiar with things like C++ template specialization.
In cases where confusion about a field like c
getting different offsets in different specializations is a concern, users are encouraged to declare types so that all non-generic-dependent fields come before generic-dependent ones.