Replies: 2 comments 4 replies
-
Prototype - M (In Review)
MVP - M (In Review)
Phase 1: - n/a
Phase 2: - M
|
Beta Was this translation helpful? Give feedback.
1 reply
-
This could be unwieldy for large input tensors where a user still wants to specify. IMO the core issue is that the data Torch-TRT does shape inference on is not representative of the end users, correct? Why don't we let the user provide input data? Give the option to provide a data loader which resolves this issue, as well as makes DS + fallback easier. Thoughts? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Context
When using
Torch-TensorRT
to compile and run inference with BERT models, some users were experiencing issues with a CUDA indexing error (Issue #1418, PR #1424). The error seemed to show up only when more than two arguments were passed into the model. The source of the bug related to the fact that the third argument to these BERT models was a tensor oftorch.Long
type, which required only0
and1
values (documentation here).The shape analysis portion of partitioning, however, was initializing random Tensor inputs, sometimes with values outside of that range:
TensorRT/core/partitioning/shape_analysis.cpp
Line 23 in 5a7f00e
As a result, calls to
aten::embedding
and other indexing operations would fail, as they would be searching out of bounds. A temporary fix was made in PR #1424, addressing the issue by decreasing the range of values selected for the tensor, but a more robust fix would allow the user to (optionally) specify the valid range of values for each input tensor.Discussion
A rough framework for accomplishing this is to allow the user to specify a "low-inclusive" and "high-exclusive" value for each input, to ensure that the forward pass conducted in partitioning does not provide invalid inputs to the module. These (optionally) user-provided values would then substitute the existing default choices:
TensorRT/core/partitioning/shape_analysis.cpp
Lines 17 to 18 in b494311
If the user does not specify values, the defaults will be used. The main framework changes that would be required to implement this change are:
Input
class specifying a two-element tuple with the minimum-inclusive, and maximum-exclusive allowed input values to a Tensor, for example:partitioning
Beta Was this translation helpful? Give feedback.
All reactions