Update operators keep tensors on GPU between ops (where possible) #17

karlhigley · 2022-03-02T20:57:51Z

Create a numpy/cupy dispatch mechanism (like pandas/cudf in NVT)
Apply DLpack to pass GPU tensors from Python back-end to other models
Update FilterCandidates
Update SoftmaxSampling
Update Faiss and Feast ops to convert to GPU?

karlhigley · 2022-03-02T21:00:53Z

Depending on how this turns out, we may or may not find it worthwhile to add a graph optimizer to condense multiple operators into a single TritonPythonModel. It would still help us avoid the scheduling overhead associated with passing requests between models, but it might not be a big boost if it combining operators no longer helps us avoid GPU-CPU roundtrip conversions.

karlhigley added this to the 22.04 milestone Mar 2, 2022

karlhigley added the enhancement New feature or request label Mar 2, 2022

karlhigley modified the milestones: 22.04, Future Mar 11, 2022

karlhigley changed the title ~~Update operators to use CuPy where possible and keep tensors on GPU between ops~~ Update operators keep tensors on GPU between ops (where possible) Mar 14, 2022

karlhigley removed this from the Future milestone Nov 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update operators keep tensors on GPU between ops (where possible) #17

Update operators keep tensors on GPU between ops (where possible) #17

karlhigley commented Mar 2, 2022

karlhigley commented Mar 2, 2022

Update operators keep tensors on GPU between ops (where possible) #17

Update operators keep tensors on GPU between ops (where possible) #17

Comments

karlhigley commented Mar 2, 2022

karlhigley commented Mar 2, 2022