X86 Quantization on test_matmul.py? #168

rtayl · 2024-08-21T14:29:29Z

rtayl
Aug 21, 2024

Is anyone doing quantization on x86 for fp32 to int8, I've been looking at some examples and I'm trying to do this for the test_matmul.py in the triton_shared/python/examples dir but am not able to do it. Does triton even support int8 at all? For example, if I modify the MLIR file after --triton-to-linalg-experimental by changing f32 to i8 I run into translation errors when trying to lower to affine loops. If I change float32 to int8 at the source level for test_matmul.py I run into errors about int8 not being supported in triton_shared: RuntimeError: "normal_kernel_cpu" not implemented for 'Char'
. I have seen this: https://pytorch.org/blog/int8-quantization/ but could not get this to work for test_matmul.py.

parsifal-47 · 2024-08-21T14:45:54Z

parsifal-47
Aug 21, 2024

Is anyone doing quantization on x86 for fp32 to int8, I've been looking at some examples and I'm trying to do this for the test_matmul.py in the triton_shared/python/examples dir but am not able to do it. Does triton even support int8 at all? For example, if I modify the MLIR file after --triton-to-linalg-experimental by changing f32 to i8 I run into translation errors when trying to lower to affine loops. If I change float32 to int8 at the source level for test_matmul.py I run into errors about int8 not being supported in triton_shared: RuntimeError: "normal_kernel_cpu" not implemented for 'Char' . I have seen this: https://pytorch.org/blog/int8-quantization/ but could not get this to work for test_matmul.py.

Hi, yes, Triton supports int8. Could you please share the error that you see? Maybe I can help fixing it.

16 replies

parsifal-47 Aug 22, 2024

sure:

    a = torch.randint(low=-128, high=128, size=(rows1, cols1), device='cpu', dtype=torch.int8)
    b = torch.randint(low=-128, high=128, size=(rows2, cols2), device='cpu', dtype=torch.int8)

Shouldn't this be -128, 127?

similarly to range() in python, the upper bound is exclusive

parsifal-47 Aug 22, 2024

Is this the internal error you are getting?:

AssertionError('Loop-carried variable accumulator has initial type <[16, 16], int8> but is re-assigned to <[16, 16], int32> in loop! Please make sure that the type stays consistent.')

It looks like tl.dot is perhaps converting the int8 to an int32 and returning that type back to accumulator in the loop. Is that what you've seen?

EDIT; Tried to add an ".to(int8)" to tl.dot call but that got errors. This seemed to get further still error so not sure this is the right thing here.

yes, I saw this error and I tried some simple workarounds like changing acc to int32, but it looks like something wrong is happening internally, need to look at internals

parsifal-47 Aug 25, 2024

I have found that tt.dot wasn't implemented properly, here is the fix:
#170

rtayl Aug 28, 2024
Author

I have found that tt.dot wasn't implemented properly, here is the fix: #170

This doesn't seem to have fixed the issue, I am still getting the same error.

parsifal-47 Sep 6, 2024

a = torch.randint(low=-128, high=128, size=(rows1, cols1), device='cpu', dtype=torch.int8)
b = torch.randint(low=-128, high=128, size=(rows2, cols2), device='cpu', dtype=torch.int8)

sorry, I forgot to follow up, the kernel now works for me, below are the changes, if these changes do not work, try to purge triton cache with "rm -rf ~/.triton/cache/*"

-    accumulator = tl.zeros((BLOCK_SIZE_M, BLOCK_SIZE_N), dtype=tl.float32)
+    accumulator = tl.zeros((BLOCK_SIZE_M, BLOCK_SIZE_N), dtype=tl.int32)

-    c = accumulator.to(tl.float32)
+    c = accumulator.to(tl.int32)

-    a = torch.randn((rows1, cols1), device=device, dtype=torch.float32)
-    b = torch.randn((rows2, cols2), device=device, dtype=torch.float32)

+    a = torch.randint(low=-128, high=128, size=(rows1, cols1), device='cpu', dtype=torch.int8)
+    b = torch.randint(low=-128, high=128, size=(rows2, cols2), device='cpu', dtype=torch.int8)

please let me know if it works for you, thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

X86 Quantization on test_matmul.py? #168

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 16 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

X86 Quantization on test_matmul.py? #168

rtayl Aug 21, 2024

Replies: 1 comment · 16 replies

parsifal-47 Aug 21, 2024

parsifal-47 Aug 22, 2024

parsifal-47 Aug 22, 2024

parsifal-47 Aug 25, 2024

rtayl Aug 28, 2024 Author

parsifal-47 Sep 6, 2024

rtayl
Aug 21, 2024

Replies: 1 comment 16 replies

parsifal-47
Aug 21, 2024

rtayl Aug 28, 2024
Author