You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
q, rem=divmod(node_dim_size, outer_size) # returns x//y, x%y
# Make sure that the outer_dim is divisible by the outer_size
ifrem!=0:
# Pad the dimension to a multiple of outer_size
node_dim_size= (q+1) *outer_size
q+=1
tile_attrs.layer_dim_sizes[outer_dim] =q
In this code block, the tile's layer_dim_sizes are reduced to exclude the outer loop size. When node_dim_size % outer_size != 0, the tile's size is padded to a multiple of outer_size.
When two nodes in the same layer stack have a different inter core tiling, it can happen that the same outer_dim is padded to different sizes in both layers, and as a result, the dependencies between nodes of the two layers can cross the boundary of the steady state group.
example:
layer 0 and 1 both have {D: 34} as layer_dim_sizes.
After intra core tiling of (D, 2), this becomes {D: 17}.
Layer 0 has inter core tiling (D, 4) and layer 1 has (D, 3).
The tile sizes will be padded to 20 and 18, as multiples of 4 and 3 respectively.
The dependencies between the tiles will look like this:
The red arrow causes scheduling problems.
To fix this, there are two options:
The padded size is a multiple of the intra core tiling and all inter core tiling factors of the nodes in the stack
Stream can deal the tile sizes not being a multiple of the inter and intra core tiling factors
Additional dependency issues
The dependency generation (using NodeTensor) always uses the original loop ranges and not the extended ones. When the dimensions of a layer are extended to multiples of the divisors and then tiles, it is possible that the last tiles have a loop range that falls completely outside of the original loop ranges. This tile will not be recorded in NodeTensor, and tiles of consecutive layers that rely on this tile will have an empty dependency instead.
To fix this, the dependency generation needs to know the updated sizes, but this is non-trivial.
The text was updated successfully, but these errors were encountered:
I have added a semi-fix for the additional dependency issues: instead of losing the dependencies of the last tile(s) because it's loop ranges exceed the layer size, there will now always be a dependency to the very last element of the preceding layer.
I have implemented solution 1, with one problem left: different layers can have different dimension names even though they are the same tensor dimension
stream/stream/stages/generation/tiled_workload_generation.py
Lines 232 to 243 in 2e8d92f
In this code block, the tile's
layer_dim_sizes
are reduced to exclude the outer loop size. Whennode_dim_size % outer_size != 0
, the tile's size is padded to a multiple ofouter_size
.When two nodes in the same layer stack have a different inter core tiling, it can happen that the same
outer_dim
is padded to different sizes in both layers, and as a result, the dependencies between nodes of the two layers can cross the boundary of the steady state group.example:
0
and1
both have{D: 34}
aslayer_dim_sizes
.(D, 2)
, this becomes{D: 17}
.0
has inter core tiling(D, 4)
and layer1
has(D, 3)
.20
and18
, as multiples of4
and3
respectively.The dependencies between the tiles will look like this:

The red arrow causes scheduling problems.
To fix this, there are two options:
Additional dependency issues
The dependency generation (using
NodeTensor
) always uses the original loop ranges and not the extended ones. When the dimensions of a layer are extended to multiples of the divisors and then tiles, it is possible that the last tiles have a loop range that falls completely outside of the original loop ranges. This tile will not be recorded inNodeTensor
, and tiles of consecutive layers that rely on this tile will have an empty dependency instead.To fix this, the dependency generation needs to know the updated sizes, but this is non-trivial.
The text was updated successfully, but these errors were encountered: