Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local text generation model inference appears slower than prompt tuning inference #233

Open
alex-jw-brooks opened this issue Oct 11, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@alex-jw-brooks
Copy link
Collaborator

Describe the bug

Inference for local prompt tuned models seems to be (at least depending on model size) an order of magnitude faster than local text generation models, which should not be the case assuming parameters and data types are the same.

Sample Code

WIP

Expected behavior

Text generation model inference speed is on par with or faster than prompt tuned models.

Observed behavior

Text generation models are much slower than prompt tuned models.

Additional context

Note that generate calls have been consolidated; it would be best to have a repro case both in and out of the Caikit NLP

@alex-jw-brooks alex-jw-brooks added the bug Something isn't working label Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant