[FEA] Pre-allocated input and output buffer #18

oyilmaz-nvidia · 2021-05-21T17:56:03Z

Input and output buffers are requested using Triton APIs whenever there is a new request. Once the output is sent, they are disposed. If we can keep a pre-allocated buffer for the input and output, we can accelerate the query response time. But, knowing the size of these buffers is not that easy. If we know the max batch size, we can allocate the space based on that. But, if that batch size is large, then we have to occupy a good amount of space on memory.

Maybe, for small batches, we can keep a pre-allocated buffer for input and output. For larger batch sizes, we can request a new space.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Pre-allocated input and output buffer #18

[FEA] Pre-allocated input and output buffer #18

oyilmaz-nvidia commented May 21, 2021

[FEA] Pre-allocated input and output buffer #18

[FEA] Pre-allocated input and output buffer #18

Comments

oyilmaz-nvidia commented May 21, 2021