Range Argument for `Input` Class #1425

gs-olive · 2022-10-28T20:34:06Z

gs-olive
Oct 28, 2022
Collaborator

Context

When using Torch-TensorRT to compile and run inference with BERT models, some users were experiencing issues with a CUDA indexing error (Issue #1418, PR #1424). The error seemed to show up only when more than two arguments were passed into the model. The source of the bug related to the fact that the third argument to these BERT models was a tensor of torch.Long type, which required only 0 and 1 values (documentation here).

The shape analysis portion of partitioning, however, was initializing random Tensor inputs, sometimes with values outside of that range:

TensorRT/core/partitioning/shape_analysis.cpp

Line 23 in 5a7f00e

auto in = at::randint(5, shape, {at::kCUDA}).to(type);

As a result, calls to aten::embedding and other indexing operations would fail, as they would be searching out of bounds. A temporary fix was made in PR #1424, addressing the issue by decreasing the range of values selected for the tensor, but a more robust fix would allow the user to (optionally) specify the valid range of values for each input tensor.

Discussion

A rough framework for accomplishing this is to allow the user to specify a "low-inclusive" and "high-exclusive" value for each input, to ensure that the forward pass conducted in partitioning does not provide invalid inputs to the module. These (optionally) user-provided values would then substitute the existing default choices:

TensorRT/core/partitioning/shape_analysis.cpp

Lines 17 to 18 in b494311

    
           int LoValIncl = 0; 
        
           int HiValExcl = 2;

If the user does not specify values, the defaults will be used. The main framework changes that would be required to implement this change are:

An additional keyword argument to the Input class specifying a two-element tuple with the minimum-inclusive, and maximum-exclusive allowed input values to a Tensor, for example:

value_range (Tuple or List of two int/None values, optional): Allowed range of values for Tensor. Acceptable values are assumed to be integers in the range [value_range[0], value_range[1]) (note the maximum is not an inclusive endpoint). Leave one or both of the values as None to indicate that the restrictions of the Tensor datatype is sufficient to specify this range

This value range would then just be passed through to the compiler, which would provide the values to partitioning

narendasan · 2022-11-07T19:46:02Z

narendasan
Nov 7, 2022
Collaborator

Prototype - M (In Review)

Int64 input type for Input class (where internally we would insert a cast down to int32 through lowering) (Automatic int32 <=> int64 datatype conversion in fallback #1387 (comment))
Add field which is a tuple / std::pair for the dynamic range that a input may cover

MVP - M (In Review)

Implement the lowering pass to insert an aten::to for int64 inputs
Implement the overhaul of shape analysis to use in-domain sampled inputs.

Phase 1: - n/a

Add support for the same behavior for FX

Phase 2: - M

Add an example tensor field, which is optional and allows the user to provide a g.t. in domain tensor to be used in shape calculations.

1 reply

gs-olive Jan 11, 2023
Collaborator Author

Regarding Phase 1, I don't think the FX path requires a new field to support this kind of input, as both the acc_tracer and compile paths for FX allow users to specify sample input tensors, so I believe that adding a field for tensor domain would be redundant, since the user can construct this tensor themselves. Specifically, for the acc_tracer path:

# User defines inputs, as required by the model --> no need to additionally or alternatively specify a valid range
inputs = torch.rand((1, 3, 256, 256)).to(device)
traced_model = acc_tracer.trace(model, [inputs])

For the compile path:

# User defines inputs, as required by the model --> no need to additionally or alternatively specify a valid range
inputs = torch.rand((1, 3, 256, 256)).to(device)
compiled_model = torch_tensorrt.compile(model, ir="fx", inputs=[inputs])

Since FX already has support for sample inputs, I don't think it would be beneficial to add an optional tensor domain argument to the input/compile class, as the sample inputs are already user-provided.

Providing sample inputs for TorchScript, as in Phase 2, would be a key next step requiring a mechanism for translating Python-instantiated Torch Tensors to C++ Torch Tensors, for use in the partitioning phase dry-run.

ncomly-nvidia · 2022-11-15T22:21:45Z

ncomly-nvidia
Nov 15, 2022

This could be unwieldy for large input tensors where a user still wants to specify. IMO the core issue is that the data Torch-TRT does shape inference on is not representative of the end users, correct?

Why don't we let the user provide input data? Give the option to provide a data loader which resolves this issue, as well as makes DS + fallback easier. Thoughts?

3 replies

gs-olive Nov 22, 2022
Collaborator Author

the core issue is that the data Torch-TRT does shape inference on is not representative of the end users

Yes, this is correct - the end users may have constraints on the input tensors which are more stringent than the tensor type definitions (for example, if tensor entries need to be valid numerical keys in an embedding). I do agree it would ultimately be best to let the user provide input data.

Additionally, I don't think this suggested change would be more unwieldy than the current practice, as the shape analysis code already instantiates a tensor of the full input size and runs it through the network, so the user-specified tensor should not add much of a memory footprint.

ncomly-nvidia Jan 12, 2023

This assumes linearity in shape propagation which in general I don't think is safe. Also, this requires users to double the info in Input (size & range per tensor) as well as know the input ranges - do we expect that to be the case?

Can we do an RFC for user provided dataloader as well? If that is the long term solution, how much additional work is it instead?

Do we have any examples of how others solved this issue for BERT?

gs-olive Jan 12, 2023
Collaborator Author

Thanks for the comment, this makes sense and I see that when combined with dynamic shape, providing a sample tensor becomes a bit complicated. With the current implementation, we would need to somehow obtain a tensor of MIN, OPT, and MAX batch size.

As is the case currently, in the PR implementing range specification #1537, a user can specify a range of allowed values for a tensor, or omit the range and opt for the default $[0, 2)$. A user would never be required to provide this range information, and the feature as proposed would not break existing implementations.

Regarding user-provided data loaders, this solution would require some substantial changes to partitioning, as the current shape analysis with DS instantiates and runs Tensors of the minimum, optimal, and maximum batch dimension through the model to get shape information. I will discuss this more and either update this RFC, or create a new one for user-provided data loaders.

For existing resolutions of this issue on BERT, #1418 is an example, where the temporary solution was PR #1424, which changed the default ranges for sample tensors in partitioning from $[0, 5)$ to $[0, 2)$. The more robust solution is to allow arbitrary, optionally user-specified ranges as in #1537.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Range Argument for `Input` Class #1425

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Range Argument for Input Class #1425

gs-olive Oct 28, 2022 Collaborator

Context

Discussion

Replies: 2 comments · 4 replies

narendasan Nov 7, 2022 Collaborator

Prototype - M (In Review)

MVP - M (In Review)

Phase 1: - n/a

Phase 2: - M

gs-olive Jan 11, 2023 Collaborator Author

ncomly-nvidia Nov 15, 2022

gs-olive Nov 22, 2022 Collaborator Author

ncomly-nvidia Jan 12, 2023

gs-olive Jan 12, 2023 Collaborator Author

Range Argument for `Input` Class #1425

gs-olive
Oct 28, 2022
Collaborator

Replies: 2 comments 4 replies

narendasan
Nov 7, 2022
Collaborator

gs-olive Jan 11, 2023
Collaborator Author

ncomly-nvidia
Nov 15, 2022

gs-olive Nov 22, 2022
Collaborator Author

gs-olive Jan 12, 2023
Collaborator Author