Skip to content

Commit

Permalink
[Flow] Add pattern to reassociate dequantization + matmul `linalg.gen…
Browse files Browse the repository at this point in the history
…eric` ops

Dequantization ops that are consumed by matmuls are currently only fused
into a dispatch region, but we can do even better by reassociating these
fused operations (see iree-org#14951).

It is important to note that this pattern does affect precision, and is
a trade off between precision and performance. It is set to opt-in with
`--iree-flow-enable-quantized-matmul-reassociation`

This pattern rewrites a sequence of dequantization->matmul `linalg.generic`
ops into a new sequence of `linalg.generic` ops. The new sequence of ops
is as follows:

  1. A sequence of `linalg.generic` ops that dynamically quantize the
     non-quantized input to the matmul. This is very cheap in skinny
     matmul cases, where the non-quantized input is small compared to
     the quantized input.
  2. A `linalg.generic` op that performs an integer matmul. This is the
     key performance optimization here. On CPU, we want to be doing
     integer matmuls where we can, but the matmul needs to be picked
     up by a VectorContractCustomKernel for now. Eventually it will
     be better to rewrite to `linalg.matmul` here to target ukernels.
  3. A final `linalg.generic` op that performs the dequantization
     scale and zero point math, as well as performing the remaining
     reduction of the matmul. The matmul from 2. only reduces within
     quantized groups, while this op does the reduction across groups.
  • Loading branch information
Max191 authored and qedawkins committed Oct 12, 2023
1 parent 8d8357b commit c5ac55e
Show file tree
Hide file tree
Showing 7 changed files with 681 additions and 18 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ iree_compiler_cc_library(
],
deps = [
":PassesIncGen",
"//compiler/src/iree/compiler/Codegen/Dialect:IREECodegenDialect",
"//compiler/src/iree/compiler/Dialect/Flow/Conversion/TensorToFlow",
"//compiler/src/iree/compiler/Dialect/Flow/IR",
"//compiler/src/iree/compiler/Dialect/HAL/IR",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ iree_cc_library(
MLIRTransformDialectTransforms
MLIRTransformUtils
MLIRTransforms
iree::compiler::Codegen::Dialect::IREECodegenDialect
iree::compiler::Dialect::Flow::Conversion::TensorToFlow
iree::compiler::Dialect::Flow::IR
iree::compiler::Dialect::HAL::IR
Expand Down
Loading

0 comments on commit c5ac55e

Please sign in to comment.