Remove preserve_zero and zero_point_domain from choose_qparams_affine #2149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

jainapurva wants to merge 13 commits into main from qparam_args

Contributor

jainapurva commented Apr 29, 2025

No description provided.

jainapurva added 2 commits

April 28, 2025 13:05


          Split choose_qparams_affine

b133369


          Remove preserve_zero and zero_point_domain

8e4bca8

pytorch-bot bot commented Apr 29, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2149

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures

As of commit 414df66 with merge base 137b079 ():

NEW FAILURES - The following jobs have failed:

Run Regression Tests / test (CPU 2.5.1, linux.4xlarge, torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/integration/test_integration.py::TestWeightOnlyInt8Quant::test_weight_only_groupwise_embedding_quant
Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/quantization/test_quant_api.py::TestQuantFlow::test_int4wo_cpu_float32_x_dim_3_use_hqq_True
Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/quantization/pt2e/test_quantize_pt2e.py::TestQuantizePT2EAffineQuantization::test_dynamic_per_tok_act_per_group_weights
Run Regression Tests / test (CUDA 2.5.1, linux.g5.12xlarge.nvidia.gpu, torch==2.5.1 --index-url https://download.pytorch... / linux-job (gh)
test/quantization/test_qat.py::TestQAT::test_qat_4w_quantizer
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
test/quantization/test_quant_api.py::TestQuantFlow::test_int4wo_cpu_float32_x_dim_3_use_hqq_True
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh)
test/quantization/pt2e/test_quantize_pt2e.py::TestQuantizePT2EAffineQuantization::test_dynamic_per_tok_act_per_group_weights
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
test/quantization/pt2e/test_quantize_pt2e.py::TestQuantizePT2EAffineQuantization::test_dynamic_per_tok_act_per_group_weights
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
test/quantization/pt2e/test_quantize_pt2e.py::TestQuantizePT2EAffineQuantization::test_dynamic_per_tok_act_per_group_weights
Run TorchAO Experimental Tests / test-cpu-ops (macos-14) (gh)
test_replace_q_dq_patterns_with_quantized_linear_ops_pass

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed label

jainapurva added 4 commits

April 29, 2025 12:26


          Update choose_qparams_affine_min_max

a68a679


          Update float8 choose_qparams

b9c7c53


          Use float8 choose/quantize/dequantize

ea5525e


          Updates to choose_qparams_affine uses

cff885b

jainapurva added topic: not user facing topic: for developers labels

jerryzh168 reviewed

View reviewed changes

test/quantization/test_quant_primitives.py Outdated Show resolved Hide resolved

jerryzh168 reviewed

View reviewed changes

torchao/quantization/observer.py Show resolved Hide resolved

jerryzh168 reviewed

View reviewed changes

torchao/quantization/quant_primitives.py

+              def choose_qparams_affine_tiny_gemm(
+                  input: torch.Tensor,
+                  mapping_type: MappingType,
+                  block_size: Tuple[int, ...],

Contributor

jerryzh168 Apr 30, 2025

nit: change this to Tuple[int] as well to be consistent, assuming it means the same thing

Contributor Author

jainapurva Apr 30, 2025

Block size is tuple with multiple integers, hence will need to do Tuple[int, ...]

jerryzh168 reviewed

View reviewed changes

torchao/quantization/quant_primitives.py

Comment on lines +780 to +785

+                  target_dtype: torch.dtype,
+                  quant_min: Optional[Union[int, float]] = None,
+                  quant_max: Optional[Union[int, float]] = None,
+                  eps: Optional[float] = None,
+                  scale_dtype: Optional[torch.dtype] = None,
+                  zero_point_dtype: Optional[torch.dtype] = None,

Contributor

jerryzh168 Apr 30, 2025

I think we could probably simplify this list as well, only configurable things are needed, this can be a separate PR

Contributor Author

jainapurva Apr 30, 2025

Agreed

jerryzh168 reviewed

View reviewed changes

torchao/quantization/quant_primitives.py

Comment on lines +850 to +855

+                  target_dtype: torch.dtype,
+                  quant_min: Optional[Union[int, float, bool]] = None,
+                  quant_max: Optional[Union[int, float, bool]] = None,
+                  eps: Optional[float] = None,
+                  scale_dtype: Optional[torch.dtype] = None,
+                  zero_point_dtype: Optional[torch.dtype] = None,

Contributor

jerryzh168 Apr 30, 2025

same here


          Test fixes

f747fff

jerryzh168 reviewed

View reviewed changes

torchao/quantization/quant_primitives.py

-                      MappingType.SYMMETRIC.name,
-                      MappingType.SYMMETRIC_NO_CLIPPING_ERR.name,
-                      MappingType.ASYMMETRIC.name,
+                      MappingType.SYMMETRIC,

Contributor

jerryzh168 Apr 30, 2025

if this op has to be lowered, we'd need to use str instead of enum

Contributor Author

jainapurva Apr 30, 2025

For all the new ops I've used MappingType enum. Should I update them to str?

jainapurva added 3 commits

April 29, 2025 21:27


          Updates

694dab3


          Updates

62a99a1


          Split quantize_affine based on zero_point_domain

3a5efa7

jainapurva marked this pull request as ready for review

April 30, 2025 17:36


          Merge remote-tracking branch 'origin/main' into qparam_args

57d55b0

jainapurva marked this pull request as draft

April 30, 2025 18:10

jainapurva added 2 commits

April 30, 2025 15:00


          Fix tests

6e42999


          dequantize_affine and test fixes

414df66

jerryzh168 reviewed

View reviewed changes

torchao/quantization/quant_primitives.py

@@ @@ -301,12 +305,6 @@ def quantize_affine( @@
                     output_dtype (torch.dtype): requested dtype (e.g. torch.uint8) for output Tensor
                     quant_min (Optional[int]): minimum quantized value for output Tensor, if not specified, it will be derived from dtype
                     quant_max (Optional[int]): maximum quantized value for output Tensor, if not specified, it will be derived from dtype
-                    zero_point_domain (ZeroPointDomain): the domain that zero_point is in, should be either integer or float

Contributor

jerryzh168 May 1, 2025 •

edited

Loading

we should probably preserve these for now, and move to quant_api, same for the doc for peserve_zero arg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed topic: for developers topic: not user facing