-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix LayerNorm fp16 precision #3272
base: main
Are you sure you want to change the base?
Conversation
8148866
to
57fc8e9
Compare
57fc8e9
to
6805e39
Compare
return layer_norm.get_output(0), None, None | ||
layer = ctx.net.add_normalization(input, weight, bias, axes) | ||
layer.epsilon = eps | ||
set_layer_name(layer, target, name, source_ir) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Looks like you do not need to explicitly set the ILayer.precision
or ILayer.set_output_type
to set the output type of this layer with fp16 inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Setting
layer.compute_precision = input.dtype
causes accuracy issue in FP16 mode. https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Graph/Layers.html#inormalizationlayer saidBy default TensorRT will run the normalization computation in DataType.kFLOAT32 even in mixed precision mode regardless of any set builder flags to avoid overflow errors
.Also, the operator actually taking effect is only
aten.native_layer_norm.default
.aten.layer_norm
andaten.layer_norm.default
are of no use and hence redundant.To Reproduce
Before Patch
After Patch
Type of change
Checklist: