You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Depending on how this turns out, we may or may not find it worthwhile to add a graph optimizer to condense multiple operators into a single TritonPythonModel. It would still help us avoid the scheduling overhead associated with passing requests between models, but it might not be a big boost if it combining operators no longer helps us avoid GPU-CPU roundtrip conversions.
karlhigley
changed the title
Update operators to use CuPy where possible and keep tensors on GPU between ops
Update operators keep tensors on GPU between ops (where possible)
Mar 14, 2022
The text was updated successfully, but these errors were encountered: