Local text generation model inference appears slower than prompt tuning inference #233

alex-jw-brooks · 2023-10-11T15:39:53Z

Describe the bug

Inference for local prompt tuned models seems to be (at least depending on model size) an order of magnitude faster than local text generation models, which should not be the case assuming parameters and data types are the same.

Sample Code

WIP

Expected behavior

Text generation model inference speed is on par with or faster than prompt tuned models.

Observed behavior

Text generation models are much slower than prompt tuned models.

Additional context

Note that generate calls have been consolidated; it would be best to have a repro case both in and out of the Caikit NLP

alex-jw-brooks added the bug Something isn't working label Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local text generation model inference appears slower than prompt tuning inference #233

Local text generation model inference appears slower than prompt tuning inference #233

alex-jw-brooks commented Oct 11, 2023

Local text generation model inference appears slower than prompt tuning inference #233

Local text generation model inference appears slower than prompt tuning inference #233

Comments

alex-jw-brooks commented Oct 11, 2023

Describe the bug

Sample Code

Expected behavior

Observed behavior

Additional context