feat: support masked_scatter by lowering path #3438

chohk88 · 2025-03-12T13:00:33Z

Description

Implemented support for masked_scatter in the lowering path, referring to this implementation in PyTorch Inductor.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

peri044 · 2025-03-14T17:48:10Z

py/torch_tensorrt/dynamo/lowering/_decompositions.py

+    input_b, mask_b = aten.broadcast_tensors([input, mask])
+
+    # 2) Flatten the broadcasted tensors and the source tensor
+    input_flat = input_b.flatten()
+    mask_flat = mask_b.flatten()
+    source_flat = source.flatten()
+
+    # 3) Compute gather indices: (cumsum of mask as int64) - 1
+    source_idx = mask_flat.to(torch.int64).cumsum(0) - 1
+
+    # 4) Gather elements from source_flat using these indices
+    gathered = source_flat.gather(0, source_idx)
+
+    # 5) Replace positions where mask is True with gathered values, otherwise keep original
+    replaced = torch.where(mask_flat, gathered, input_flat)
+
+    # 6) Reshape the result back to the broadcasted shape
+    return replaced.view(input_b.shape)


@chohk88 I have a question. I tried running this code in a separate python session for the test input size of (2, 3, 4) and I see the following error. Do you know why this happens ? Am I missing something here ?

import torch shape=(2, 3, 4) ax=torch.randn(*shape, dtype=torch.float32, device="cuda") mask=torch.rand(*shape, device="cuda") > 0.5 num_trues = mask.sum().item() source = torch.arange(num_trues, dtype=torch.float32, device="cuda") ax_b, mask_b = torch.ops.aten.broadcast_tensors([ax, mask]) ax_flat = ax_b.flatten() mask_flat = mask_b.flatten() source_flat = source.flatten() source_idx = mask_flat.to(torch.int64).cumsum(0) - 1 gathered = source_flat.gather(0, source_idx) >>> /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [1,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.

feat: support masked_scatter by lowering path

946c3c1

chohk88 requested review from peri044 and apbose March 12, 2025 13:00

chohk88 self-assigned this Mar 12, 2025

facebook-github-bot added the cla signed label Mar 12, 2025

github-actions bot requested a review from narendasan March 12, 2025 13:44

peri044 reviewed Mar 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support masked_scatter by lowering path #3438

feat: support masked_scatter by lowering path #3438

chohk88 commented Mar 12, 2025 •

edited

Loading

peri044 Mar 14, 2025

feat: support masked_scatter by lowering path #3438

Are you sure you want to change the base?

feat: support masked_scatter by lowering path #3438

Conversation

chohk88 commented Mar 12, 2025 • edited Loading

Description

Type of change

Checklist:

peri044 Mar 14, 2025

Choose a reason for hiding this comment

chohk88 commented Mar 12, 2025 •

edited

Loading