Question: Support for Continuous Batching and Asynchronous Requests #25

Msiavashi · 2024-06-12T15:41:49Z

Hi. I'm new to this LLM world. I have a few questions regarding the engine. Does it support continuous batching? I'm asking because I'm trying to set a request per second rate and wanted to know if I should implement my own batching strategy or if the framework provides any batching functionalities.

I see from the paper: "Multiple sequences are batched until they either reach a maximum batch size of 16 or a maximum waiting time of one second, both parameters referenced from AlpaServe."

According to this, is there any async version of the engine that allows adding requests at varying rates?

Thank you.

drunkcoding · 2024-06-13T20:06:59Z

The batch engine is not provided yet, auto-batching which specifies max batch size and max delay is the simplist way of implementation, continuous batching is WIP

drunkcoding added the enhancement New feature or request label Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Support for Continuous Batching and Asynchronous Requests #25

Question: Support for Continuous Batching and Asynchronous Requests #25

Msiavashi commented Jun 12, 2024

drunkcoding commented Jun 13, 2024

Question: Support for Continuous Batching and Asynchronous Requests #25

Question: Support for Continuous Batching and Asynchronous Requests #25

Comments

Msiavashi commented Jun 12, 2024

drunkcoding commented Jun 13, 2024