[BH] Watcher asserts ncrisc_noc_nonposted_writes_flushed for matmul and conv ops #18341

s-jovic · 2025-02-26T12:57:59Z

Description

On Blackhole, watcher reports ncrisc_noc_nonposted_writes_flushed assert for some matmuls and convolution ops, even though they pass normally when ran without the watcher.

I discovered this while developing SD 1.4 on Blackhole. Adding noc_async_write_barrier() at the end of the reader kernel resolves the watcher assert in both cases I encountered. However, when I tried to add the barrier at the end of all matmul and conv kernels that do some writes and miss this barrier, I encountered a hang in one convolution.

Since I am not very knowledgable about this problem - should this be debugged globally, or should each op owner debug separately?

Matmul that triggers the assert

# SPDX-FileCopyrightText: © 2025 Tenstorrent Inc.
# SPDX-License-Identifier: Apache-2.0
import torch
import ttnn

def test_matmul_with_watcher_assert(
    device,
):
    grid_size = (5, 8)
    input_shape = [1, 1, 8192, 320]
    weights_shape = [1, 1, 320, 1280]
    bias_shape = [1, 1, 1, 1280]

    block_sharded_mem_config = ttnn.MemoryConfig(
        memory_layout=ttnn.TensorMemoryLayout.BLOCK_SHARDED,
        buffer_type=ttnn.BufferType.L1,
    )

    dram_mem_config = ttnn.MemoryConfig(
        memory_layout=ttnn.TensorMemoryLayout.INTERLEAVED,
        buffer_type=ttnn.BufferType.DRAM,
    )

    input = torch.randn(input_shape).bfloat16().float()
    weights = torch.randn(weights_shape).bfloat16().float()
    bias = torch.randn(bias_shape).bfloat16().float()

    input_t = ttnn.Tensor(input, ttnn.bfloat16).to(ttnn.TILE_LAYOUT).to(
        device, ttnn.MemoryConfig(
            ttnn.TensorMemoryLayout.BLOCK_SHARDED,
            ttnn.BufferType.L1,
            ttnn.ShardSpec(
                ttnn.CoreRangeSet(
                    {
                        ttnn.CoreRange(
                            ttnn.CoreCoord(0, 0),
                            ttnn.CoreCoord(4, 7)
                        ),
                    }
                ),
                (1024, 64),
                ttnn.ShardOrientation.ROW_MAJOR,
            )
        ))
    weights_t = ttnn.Tensor(weights, ttnn.bfloat8_b).to(ttnn.TILE_LAYOUT).to(device, dram_mem_config)
    bias_t = ttnn.Tensor(bias, ttnn.bfloat8_b).to(ttnn.TILE_LAYOUT).to(device, dram_mem_config)

    program_config = ttnn.MatmulMultiCoreReuseMultiCastProgramConfig(
        compute_with_storage_grid_size=grid_size,
        in0_block_w=2,
        out_subblock_h=1,
        out_subblock_w=8,
        per_core_M=32,
        per_core_N=8,
        transpose_mcast=False,
        fused_activation=None,
    )
    output_t = ttnn.linear(
        input_t,
        weights_t,
        bias=bias_t,
        program_config=program_config,
        memory_config=block_sharded_mem_config,
        dtype=ttnn.bfloat8_b,
         compute_kernel_config= ttnn.WormholeComputeKernelConfig(
            math_fidelity=ttnn.MathFidelity.LoFi,
            math_approx_mode=False,
            fp32_dest_acc_en=False,
            packer_l1_acc=False,
        )
    )

    tt_out = output_t.cpu().to_torch()

$ TT_METAL_WATCHER=1 pytest <name-of-the-file>.py -> triggers assert

Adding the barrier to ttnn/cpp/ttnn/operations/matmul/device/kernels/dataflow/reader_bmm_tile_layout_in0_sender_receiver_padding_block_sharded.cpp resolves watcher assert.

Conv that hangs when the barrier is added to the reader kernel

The convolution that hangs with the barrier is already in main, and can be invoked:

$ pytest "tests/ttnn/unit_tests/operations/test_new_conv2d.py::test_conv_ws[tilized-auto_shard-activations_dtype=DataType.BFLOAT16-weights_dtype=DataType.BFLOAT16-has_bias=True-batch_size=2-output_channels=576-input_channels=576-input_height=9-input_width=9-filter_height=3-filter_width=3-pad_h=0-pad_w=0-act_block_w_div=1-stride=1-device_params={'l1_small_size': 16384}]"

If we add the barrier to the end of the reader (ttnn/cpp/ttnn/operations/conv/conv2d/device/kernels/activation_reader_width_sharded.cpp), the test hangs, otherwise it passes.

The text was updated successfully, but these errors were encountered:

s-jovic · 2025-02-26T13:01:27Z

@pavlejosipovic

s-jovic added blackhole Stable Diffusion labels Feb 26, 2025

s-jovic assigned ejouretTT Feb 26, 2025

ejouretTT assigned mywoodstock Feb 26, 2025

mywoodstock mentioned this issue Feb 26, 2025

Updated Resnet50 model for Blackhole, with Batch = 32 #17985

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BH] Watcher asserts ncrisc_noc_nonposted_writes_flushed for matmul and conv ops #18341

[BH] Watcher asserts ncrisc_noc_nonposted_writes_flushed for matmul and conv ops #18341

s-jovic commented Feb 26, 2025 •

edited

Loading

s-jovic commented Feb 26, 2025

[BH] Watcher asserts ncrisc_noc_nonposted_writes_flushed for matmul and conv ops #18341

[BH] Watcher asserts ncrisc_noc_nonposted_writes_flushed for matmul and conv ops #18341

Comments

s-jovic commented Feb 26, 2025 • edited Loading

Description

Matmul that triggers the assert

Conv that hangs when the barrier is added to the reader kernel

s-jovic commented Feb 26, 2025

s-jovic commented Feb 26, 2025 •

edited

Loading