Push the OptimumEmbedders to their performance limits #384

mathislucka · 2024-02-09T08:53:26Z

Background
The OptimumEmbedders optimize inference speed for embedding inference. We have a basic version almost ready. In this task, we want to push these embedding components to their performance limits.

Ideas

optimize data loading
optimize batching
optimize TensorRT engine build
try FP16 inference
play with further optimizations (ORTOptimizer, ORTQuantizer)

shadeMe · 2024-02-12T09:51:57Z

Blocked on #137

masci added the P1 label Feb 9, 2024

masci assigned shadeMe Feb 12, 2024

shadeMe linked a pull request Feb 28, 2024 that will close this issue

feat!: Add support for Optimum optimizers and quantizers #496

Merged

shadeMe closed this as completed in #496 Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Push the OptimumEmbedders to their performance limits #384

Push the OptimumEmbedders to their performance limits #384

mathislucka commented Feb 9, 2024

shadeMe commented Feb 12, 2024

Push the OptimumEmbedders to their performance limits #384

Push the OptimumEmbedders to their performance limits #384

Comments

mathislucka commented Feb 9, 2024

shadeMe commented Feb 12, 2024