[PT2E][X86] Migrate fusion passes in Inductor to torchao #2140

Xia-Weiwen · 2025-04-28T09:09:58Z

Summary

In this PR, we migrate the fusion passes of quantized ops for X86Inductor backend from PyTorch Inductor source code to Torchao. This is the first step to migrate quantization-related fusion passes in PyTorch core to Torchao.
With this PR landed, we can add fusion passes for new ops in Torchao instead of in PyTorch core. So, we want this PR merged early.
We plan to do the migration in the following steps:

Copy fusion passes from PyTorch core to Torchao (this PR)
Depreacte and remove fusion passes in PyTorch core (TODO)
Switch to new quantize/dequantize ops in Torchao (TODO)

(Step 2 and 3 have no dependency on each other and can be reordered.)

Fusion passes need to be registered to Inductor before calling torch.compile. And it would be less user-friendly if we ask users to register them in their code. So, we decide to put the registration inside the lowering function. In other words, this PR wraps the registration in the API lower_pt2e_quantized_to_x86. So, users need to call lower_pt2e_quantized_to_x86 instead of torch.compile to get lowered model with torch.compile. For eager mode, users use the same API. The API is now designed as below:

def lower_pt2e_quantized_to_x86(
    model: torch.fx.GraphModule,
    example_inputs: Optional[tuple[torch.Tensor, ...]] = None,
    compile: bool = True,
    **compile_options: Optional[dict],
) -> torch.fx.GraphModule

The compile flag indicates using torch.compile or not (eager mode). For eager mode, users just set compile=False.

Test plan

We copied related UTs from https://github.com/pytorch/pytorch/blob/main/test/inductor/test_mkldnn_pattern_matcher.py
The test cases are run only with torch nightly since some torch features are only available in nightly, such as onednn.qconv_pointwise.
Use the following cmd to run tests

pytest test/quantization/pt2e/test_x86inductor_fusion.py

Explanation of implementation

In this PR, we mostly copy the code from torch Inductor https://github.com/pytorch/pytorch/blob/main/torch/_inductor/fx_passes/quantization.py, using internal functions, methods and utilities in Inductor by importing them directly. We think it's the simplest way to register the fusion passes.
For now, the fusion passes in torch Inductor will co-exist with the passes registered in torchao. There won't be an issue because duplicate passes won't be applied twice. It's because the patterns no longer exist in graph after fusion, and one pattern won't be matched twice.
In the future, we will switch to the new quantize/dequantize ops in torchao when they are ready. At that time, the patterns registered in torchao will be different from those in Inductor. After that, the passes in torch Inductor will be deprecated and eventually removed.

pytorch-bot · 2025-04-28T09:10:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2140

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 8e4532f with merge base 137b079 ():

NEW FAILURE - The following job has failed:

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
test/integration/test_integration.py::TestSubclass::test_int8_weight_only_quant_with_freeze_5_cuda

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Xia-Weiwen · 2025-04-29T02:00:51Z

Hi @jerryzh168 @jansel Could you please review this PR? We would like to hear your comments especially on (1) if it sounds ok to you that we copy Inductor code here in Torchao with Inductor's internal utilities, (2) if it is ok that we keep duplicate passes for now. Thanks!

jansel · 2025-04-29T18:12:22Z

I think out of tree passes are fine.

Do we need a better registration system so the changes can be local to a specific torch.compile() call rather than mutating globals?

cc @eellison

jerryzh168 · 2025-04-29T19:36:38Z

torchao/prototype/inductor/fx_passes/quantization.py

should this be in prototype? I think under torchao/quantization/pt2e might be better?

also the folder name can probably be something like inductor_passes to be more specific

I'd recommend: torchao/quantization/pt2e/inductor_passes/x86.py

Thanks. I have moved it as you suggested.

jerryzh168 · 2025-04-29T19:39:52Z

I also feel hiding compile API in lower_pt2e_quantized_to_x86 is not a good idea and compile stack should allow registering fusion passes out of tree

jerryzh168 · 2025-04-29T19:41:19Z

torchao/quantization/pt2e/lowering.py

+        global FUSION_PATH_REGISTERED
+        if not FUSION_PATH_REGISTERED:
+            global torch
+            import torch._inductor.config
+
+            from torchao.prototype.inductor.fx_passes.quantization import (
+                _register_quantization_weight_pack_pass,
+                quant_lift_up,
+            )
+
+            torch._inductor.config.pre_grad_custom_pass = quant_lift_up
+            _register_quantization_weight_pack_pass()
+            FUSION_PATH_REGISTERED = True


can this part happen during import of x86_inductor_quantizer?

Thanks. I have modified per your suggestion.

Xia-Weiwen · 2025-04-30T01:26:16Z

I think out of tree passes are fine.

Do we need a better registration system so the changes can be local to a specific torch.compile() call rather than mutating globals?

cc @eellison

Thanks for your comments. We will just keep the current implementation then.
As for a new registration system, maybe we can have something similar to pre_grad_custom_pass?

jansel · 2025-04-30T01:38:49Z

Yeah, I think this might be cleaner with something like pre_grad_custom_pass instead of global registration.

leslie-fang-intel

Hi @Xia-Weiwen, will you add the registration system in PyTorch firstly then refine this PR?

Xia-Weiwen · 2025-04-30T02:25:32Z

Hi @Xia-Weiwen, will you add the registration system in PyTorch firstly then refine this PR?

No. I plan to keep the current implementation. When new registration system is added in Inductor by Meta Inductor team, I will switch to that in another PR.

leslie-fang-intel · 2025-04-30T05:21:17Z

torchao/quantization/pt2e/lowering.py

+            )
+
+            torch._inductor.config.pre_grad_custom_pass = quant_lift_up
+            _register_quantization_weight_pack_pass()


I'm a bit concerned about this. Not sure how we should handle it, but it seems that

The patterns from quantization.py in TorchAO will be registered here once

And inside torch.compile when freezing turns on, the same patterns from pytorch/torch/_inductor/fx_passes/quantization.py inside Torch Inductor will be registered again.

Thanks for the comments. As we discussed offline and I have explained in the summary above, duplicate passes will be applied only once because once applied, the pattern is gone.

leslie-fang-intel · 2025-04-30T05:24:51Z

torchao/quantization/pt2e/lowering.py

+                quant_lift_up,
+            )
+
+            torch._inductor.config.pre_grad_custom_pass = quant_lift_up


Be careful to check if there is any other pre_grad_custom_pass registered before, check this pytorch/pytorch#151876 issue cc @Valentine233 who are working on it.

Thanks for the comment. It is potentially unsafe. I will modify this part after the pre_grad_custom_pass is refactored, probably in another PR.

Xia-Weiwen · 2025-04-30T06:32:04Z

I also feel hiding compile API in lower_pt2e_quantized_to_x86 is not a good idea and compile stack should allow registering fusion passes out of tree

Thanks for the comments. I have moved the registration out of the lowering function. Please review again. Thanks.

Xia-Weiwen added 3 commits April 28, 2025 01:47

[PT2E][X86] Migrate fusion passes in Inductor to torchao

e9076fe

Merge branch 'main' into migrate_x86_fusion_passes

c29a161

Fix conflict after merging main

2642a4b

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2025

Xia-Weiwen added topic: new feature Use this tag if this PR adds a new feature and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels Apr 28, 2025

Fix CI

bd4b9ae

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2025

Xia-Weiwen added 7 commits April 28, 2025 02:34

Fix format issues

3aaa0f5

Fix format issue

4491a98

Fix versioning issue in UT

e260028

Fix format issue

692566e

Fix CI

11c3924

Fix CI

1c2a948

Fix CI

ee2c1b1

Xia-Weiwen requested review from leslie-fang-intel, jansel and jerryzh168 April 29, 2025 01:21

Xia-Weiwen marked this pull request as ready for review April 29, 2025 01:46

Merge branch 'main' into migrate_x86_fusion_passes

e1664db

jerryzh168 reviewed Apr 29, 2025

View reviewed changes

jerryzh168 approved these changes Apr 29, 2025 •

edited

Loading

View reviewed changes

jerryzh168 reviewed Apr 29, 2025

View reviewed changes

jerryzh168 self-requested a review April 29, 2025 19:41

leslie-fang-intel reviewed Apr 30, 2025

View reviewed changes

Move registration of Inductor fusion passes to x86_inductor_quantizer.py

524281a

Xia-Weiwen requested a review from leslie-fang-intel April 30, 2025 06:36

Fix CI

8e4532f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PT2E][X86] Migrate fusion passes in Inductor to torchao #2140

[PT2E][X86] Migrate fusion passes in Inductor to torchao #2140

Xia-Weiwen commented Apr 28, 2025 •

edited

Loading

pytorch-bot bot commented Apr 28, 2025 •

edited

Loading

Xia-Weiwen commented Apr 29, 2025

jansel commented Apr 29, 2025

jerryzh168 Apr 29, 2025 •

edited

Loading

Xia-Weiwen Apr 30, 2025

jerryzh168 commented Apr 29, 2025

jerryzh168 Apr 29, 2025

Xia-Weiwen Apr 30, 2025 •

edited

Loading

Xia-Weiwen commented Apr 30, 2025

jansel commented Apr 30, 2025

leslie-fang-intel left a comment •

edited

Loading

Xia-Weiwen commented Apr 30, 2025

leslie-fang-intel Apr 30, 2025 •

edited

Loading

Xia-Weiwen Apr 30, 2025

leslie-fang-intel Apr 30, 2025 •

edited

Loading

Xia-Weiwen Apr 30, 2025 •

edited

Loading

Xia-Weiwen commented Apr 30, 2025

[PT2E][X86] Migrate fusion passes in Inductor to torchao #2140

Are you sure you want to change the base?

[PT2E][X86] Migrate fusion passes in Inductor to torchao #2140

Conversation

Xia-Weiwen commented Apr 28, 2025 • edited Loading

Summary

Test plan

Explanation of implementation

pytorch-bot bot commented Apr 28, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2140

❌ 1 New Failure

Xia-Weiwen commented Apr 29, 2025

jansel commented Apr 29, 2025

jerryzh168 Apr 29, 2025 • edited Loading

Choose a reason for hiding this comment

Xia-Weiwen Apr 30, 2025

Choose a reason for hiding this comment

jerryzh168 commented Apr 29, 2025

jerryzh168 Apr 29, 2025

Choose a reason for hiding this comment

Xia-Weiwen Apr 30, 2025 • edited Loading

Choose a reason for hiding this comment

Xia-Weiwen commented Apr 30, 2025

jansel commented Apr 30, 2025

leslie-fang-intel left a comment • edited Loading

Choose a reason for hiding this comment

Xia-Weiwen commented Apr 30, 2025

leslie-fang-intel Apr 30, 2025 • edited Loading

Choose a reason for hiding this comment

Xia-Weiwen Apr 30, 2025

Choose a reason for hiding this comment

leslie-fang-intel Apr 30, 2025 • edited Loading

Choose a reason for hiding this comment

Xia-Weiwen Apr 30, 2025 • edited Loading

Choose a reason for hiding this comment

Xia-Weiwen commented Apr 30, 2025

Xia-Weiwen commented Apr 28, 2025 •

edited

Loading

pytorch-bot bot commented Apr 28, 2025 •

edited

Loading

jerryzh168 Apr 29, 2025 •

edited

Loading

Xia-Weiwen Apr 30, 2025 •

edited

Loading

leslie-fang-intel left a comment •

edited

Loading

leslie-fang-intel Apr 30, 2025 •

edited

Loading

leslie-fang-intel Apr 30, 2025 •

edited

Loading

Xia-Weiwen Apr 30, 2025 •

edited

Loading