Support mixed MX element dtype in `mx_mm` function and `MXLinear`. #1667

balancap · 2025-02-05T14:23:19Z

Following the MXFP and quantization literature, it is useful to support different element dtypes for activations, weights and gradients. This PR is simply adding a more general interface to mx_mm. A similar choice could be done with MXLinear

General issue: #1666

Following the MXFP and quantization literature, it is useful to support different element dtypes for activations, weights and gradients.

pytorch-bot · 2025-02-05T14:23:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1667

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCM Infra failures during checkout of PyTorch

✅ No Failures

As of commit 5c8eb6d with merge base 8afd10e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2025-02-05T15:38:57Z

torchao/prototype/mx_formats/mx_linear.py

@@ -23,25 +23,31 @@ class mx_mm(torch.autograd.Function):
    # 1.       input @ weight_t    = output     (forward pass)
    # 2. grad_output @ weight      = grad_input (backward pass)
    # 3.     input_t @ grad_output = grad_weight (backward pass)
+    # 
+    # input, weight and grad_output have each their own MX element dtype.


nit: "can have"?

vkuzo · 2025-02-05T15:42:45Z

this makes sense, it would be great to cover with a test

the easiest place to test it would be here (

ao/test/prototype/mx_formats/test_mx_linear.py

Line 47 in 8afd10e

def test_linear_eager(elem_dtype, bias, input_shape):

), and that requires adding this to MXLinear. Would you be interested in doing that in this PR?

by the way, pytorch/pytorch#146414 outlines bringing MX dtypes to PyTorch core, and we plan to evolve torchao/prototype/mx_formats/ accordingly

…er factory method. Passing a tuple of 3 element dtypes avoids introducing a breaking change in the current interface of `MXLinear` and `swap_linear_with_mx_linear`. Some additional unit test coverage has been added on MXLinear.

balancap · 2025-02-05T17:04:00Z

I added the support of this feature in MXLinear too. In order to avoid breaking the interface (and keeping things simple in the single dtype case), you can now pass either a single element dtype or a tuple of 3.

I expanded the coverage in the test you mentioned (plus a small test on the factory side to check the 2 cases above are working properly).

Thanks for the link on PyTorch MX plan 👍 I would assume that the MX "simulated" mode is going to stay in TorchAO for some time, as it is very useful for testing + getting ready for MX hardware until it is widely available.

vkuzo · 2025-02-05T17:11:24Z

torchao/prototype/mx_formats/mx_linear.py

    """

    @classmethod
    @torch.no_grad()
    def from_float(cls, mod, elem_dtype, block_size):
        mod.__class__ = MXLinear
-        mod.elem_dtype = elem_dtype
+        # Single element dtype passed for input, weight and gradient.


nit: can we do

def from_float( ..., elem_dtype, ..., elem_dtype_weight_override=None, elem_dtype_grad_output_override=None, ... ): ...

we plan to create a proper config object for this in the future, but for now would be good to keep things simple and avoid mixing types in the API (such as dtype vs tuple)

Should I then enforce named argument in MXLinear.from_float and swap_linear_with_mx_linear for block_size and filter_fn? And have a default block_size=32?

sounds reasonable!

vkuzo · 2025-02-05T17:12:49Z

I would assume that the MX "simulated" mode is going to stay in TorchAO for some time, as it is very useful for testing + getting ready for MX hardware until it is widely available.

yep! great to hear this is useful.

Support mixed MX element dtype in mx_mm function.

6358a71

Following the MXFP and quantization literature, it is useful to support different element dtypes for activations, weights and gradients.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2025

balancap mentioned this pull request Feb 5, 2025

[MX] Support mixed MXFP4/FP6/FP8 linear layer #1666

Open

vkuzo reviewed Feb 5, 2025

View reviewed changes

vkuzo added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Feb 5, 2025

balancap changed the title ~~Support mixed MX element dtype in mx_mm function.~~ Support mixed MX element dtype in mx_mm function and MXLinear. Feb 5, 2025

vkuzo reviewed Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support mixed MX element dtype in `mx_mm` function and `MXLinear`. #1667

Support mixed MX element dtype in `mx_mm` function and `MXLinear`. #1667

balancap commented Feb 5, 2025

pytorch-bot bot commented Feb 5, 2025 •

edited

Loading

vkuzo Feb 5, 2025

balancap Feb 5, 2025

vkuzo commented Feb 5, 2025

balancap commented Feb 5, 2025

vkuzo Feb 5, 2025 •

edited

Loading

balancap Feb 5, 2025

vkuzo Feb 5, 2025

vkuzo commented Feb 5, 2025

Support mixed MX element dtype in mx_mm function and MXLinear. #1667

Are you sure you want to change the base?

Support mixed MX element dtype in mx_mm function and MXLinear. #1667

Conversation

balancap commented Feb 5, 2025

pytorch-bot bot commented Feb 5, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1667

❗ 1 Active SEVs

✅ No Failures

vkuzo Feb 5, 2025

Choose a reason for hiding this comment

balancap Feb 5, 2025

Choose a reason for hiding this comment

vkuzo commented Feb 5, 2025

balancap commented Feb 5, 2025

vkuzo Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

balancap Feb 5, 2025

Choose a reason for hiding this comment

vkuzo Feb 5, 2025

Choose a reason for hiding this comment

vkuzo commented Feb 5, 2025

Support mixed MX element dtype in `mx_mm` function and `MXLinear`. #1667

Support mixed MX element dtype in `mx_mm` function and `MXLinear`. #1667

pytorch-bot bot commented Feb 5, 2025 •

edited

Loading

vkuzo Feb 5, 2025 •

edited

Loading