Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Support for Continuous Batching and Asynchronous Requests #25

Open
Msiavashi opened this issue Jun 12, 2024 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@Msiavashi
Copy link

Hi. I'm new to this LLM world. I have a few questions regarding the engine. Does it support continuous batching? I'm asking because I'm trying to set a request per second rate and wanted to know if I should implement my own batching strategy or if the framework provides any batching functionalities.

I see from the paper: "Multiple sequences are batched until they either reach a maximum batch size of 16 or a maximum waiting time of one second, both parameters referenced from AlpaServe."

According to this, is there any async version of the engine that allows adding requests at varying rates?

Thank you.

@drunkcoding
Copy link
Contributor

The batch engine is not provided yet, auto-batching which specifies max batch size and max delay is the simplist way of implementation, continuous batching is WIP

@drunkcoding drunkcoding added the enhancement New feature or request label Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants