Skip to content

Commit

Permalink
[GPU] Fix offsets calculation formula in MultiMmaOp distribution. (ir…
Browse files Browse the repository at this point in the history
…ee-org#18055)

It was

```
vtid: virtual thread id
tid: lane id
vtid = (tid floordiv stride_i) mod size_i
```

However, it does not take `element` into account. Each thread grabs
`element` contiguous data, so the vtid needs to be multiplied by
`element` to get the next bunch of data. I.e., it becomes

```
vtid: virtual thread id
tid: lane id
vtid = ((tid floordiv stride_i) mod size_i) * element_i
```

Fixes iree-org#17973

---------

Signed-off-by: hanhanW <[email protected]>
  • Loading branch information
hanhanW authored Jul 31, 2024
1 parent 91433fc commit d8d1407
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 8 deletions.
17 changes: 10 additions & 7 deletions compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -677,27 +677,30 @@ static LogicalResult populateCanonicalOffsetsSizesAndStrides(
OpFoldResult one = builder.getIndexAttr(1);
canonicalStrides.append(rankReducedShape.size(), one);

// Each thread grabs `element` contiguous data, so the vtid needs to be
// multiplied by `element` to get the next bunch of data.
// vtid: virtual thread id
// tid: lane id
// vtid = (tid floordiv stride_i) mod size_i.
// vtid = ((tid floordiv stride_i) mod size_i) * element_i.
SmallVector<OpFoldResult> vtids;
for (auto [dimSize, dimStride] :
llvm::zip_equal(subgroupLayout.thread, subgroupLayout.tstrides)) {
for (auto [dimSize, dimStride, element] :
llvm::zip_equal(subgroupLayout.thread, subgroupLayout.tstrides,
subgroupLayout.element)) {
if (dimSize == 1) {
vtids.push_back(zero);
}

// (tid floordiv stride) mod size
// ((tid floordiv stride) mod size) * element.
AffineExpr tidExpr = builder.getAffineDimExpr(0);
AffineMap vtidMap = AffineMap::get(
/*dims=*/1, /*syms=*/0, tidExpr.floorDiv(dimStride) % dimSize);
/*dims=*/1, /*syms=*/0,
(tidExpr.floorDiv(dimStride) % dimSize) * element);
Value vtid = builder.create<affine::AffineApplyOp>(loc, vtidMap, laneId);
vtids.push_back(vtid);
}

int64_t idx = 0;
for (auto [thread, element] :
llvm::zip_equal(subgroupLayout.thread, subgroupLayout.element)) {
for (int64_t element : subgroupLayout.element) {
canonicalSizes.push_back(builder.getIndexAttr(element));
canonicalOffsets.push_back(vtids[idx++]);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ module attributes { transform.with_named_sequence } {
}

// CHECK-DAG: #[[$MAP:.+]] = affine_map<(d0) -> (d0 mod 16)>
// CHECK-DAG: #[[$MAP1:.+]] = affine_map<(d0) -> ((d0 floordiv 16) mod 4)>
// CHECK-DAG: #[[$MAP1:.+]] = affine_map<(d0) -> ((d0 floordiv 16) * 4 - ((d0 floordiv 16) floordiv 4) * 16)>
// CHECK-LABEL: func @distribute_multi_mma_16x16x16
// CHECK-SAME: %[[LHS:[A-Za-z0-9]+]]: tensor<2x2x16x16xf16>
// CHECK-SAME: %[[RHS:[A-Za-z0-9]+]]: tensor<2x2x16x16xf16>
Expand Down

0 comments on commit d8d1407

Please sign in to comment.