How to handle FC layers with non-constant inputs? #11941
jinevening
started this conversation in
General
Replies: 1 comment
-
After offline discussion, we decided to do the first item of the alternative. The second item may be discussed later, when it can bring a clear benefit. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Let's discuss how to handle FC layers with non-const inputs.
Current status
--replace_non_const_fc_with_batch_matmul
pass converts FC with non-const weights (regardless of whether ifm is const or not) into batch matmul. This is for ease of quantization (circle quantizer does not support FC with non-const weights).ifm (const) -------+--> FC --> ofm
wgt (non-const)--+
ifm (const) -------+--> BMM --> ofm
wgt (non-const)--+
In this way, ifm is quantized per-tensor, because it is treated as an activation tensor.
Alternative
We can do as follows.
--replace_non_const_fc_with_batch_matmul
to do conversion only when ifm is non-const -> This will allow FC with const ifm and non-const weights.--replace_non_const_fc_with_transposed_fc
) which does the below conversion for FC with const ifm and non-const weights.ifm (const) -------+--> FC --> ofm
wgt (non-const)--+
wgt (non-const) --> Transpose ----+--> FC --> Transpose -> ofm
ifm (const) --> Transpose ----------+
This will allow per-channel quantization of ifm, but Transpose Ops are added. So, the alternative can have a better accuracy with potential performance penalty (Transpose Ops).
If the Transpose Ops can be canceled out with adjacent operators, this idea seems cool.
The alternative was proposed by @parjong
CC @parjong @ejjeong
Beta Was this translation helpful? Give feedback.
All reactions