You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement consistent predict method interface independent of batching configuration in LitServe.
Motivation
Example code I'm using ⤵️
importrandomimportlitserveaslsimportosimporttimeclassLitServeBatchingDemoAPI(ls.LitAPI):
defsetup(self, device):
print(f"Loading models in process {os.getpid()}")
defdecode_request(self, request):
returnrequest["inputs"]
defpredict(self, batch):
print("Received batch of size", len(batch), batch)
results= [random.random() for_inbatch]
time.sleep(1.5)
returnresultsdefencode_response(self, output):
return {"output": output}
if__name__=="__main__":
print(f"Starting server in process {os.getpid()}")
server=ls.LitServer(LitServeBatchingDemoAPI(),
workers_per_device=1,
)
server.run(port=8000)
Currently, enabling batching in LitServe (by setting max_batch_size and batch_timeout) changes the expected implementation of the predict method in LitAPI subclasses. This creates several issues:
The same API implementation behaves differently based on server configuration parameters
Developers need to maintain different implementations or add conditional logic based on whether batching is enabled
It violates the principle of separation of concerns - server configuration parameters should not affect the API contract
defpredict(self, batch):
# batch is a single input, len(batch) will return length of the stringprint("Received batch of size", len(batch), batch)
With batching enabled (max_batch_size>=2):
defpredict(self, batch):
# batch is a list of inputsprint("Received batch of size", len(batch), batch)
This inconsistency makes it harder to maintain and test APIs, especially when batching configuration might change between development and production environments.
Pitch
LitServe should provide a consistent interface for the predict method regardless of batching configuration.
Additional context
Similar serving frameworks like TorchServe and RayServe have similar approaches to batching, but this doesn't mean LitServe can't improve upon their design. A consistent API contract would make LitServe more intuitive and easier to use correctly.
hey @marrrcin I see what you're saying, it's the usual tension of making it easy to start and making it possible to scale
btw are you proposing we always pass a collection or tensor with a batch dimension of 1 as the input to predict even when batching is off?
what if we found a way to opt-in for the API to be batched even when batching is off? this way we would retain backward compatibility and keep the story simple for users that don't care / don't want to know about batching, while still achieving your goal (make the API implementation independent from batching)
there are a few ways to do so, one could be setting an attribute of the instance in setup (self.batched = True), or relying on the name of the argument to predict (if it contains batch*, then it's batched - weird but pytest works that way), or passing a batched=True to the constructor of the API, or passing a batched_api=True to LitServer (although this pertains to the API rather than the server). I'm not particularly fond of any of them but I'm sure we can find something acceptable.
🚀 Feature
Implement consistent
predict
method interface independent of batching configuration in LitServe.Motivation
Example code I'm using⤵️
Currently, enabling batching in LitServe (by setting
max_batch_size
andbatch_timeout
) changes the expected implementation of thepredict
method inLitAPI
subclasses. This creates several issues:For example, with batching disabled, sending:
With batching enabled (
max_batch_size>=2
):This inconsistency makes it harder to maintain and test APIs, especially when batching configuration might change between development and production environments.
Pitch
LitServe should provide a consistent interface for the
predict
method regardless of batching configuration.Additional context
Similar serving frameworks like TorchServe and RayServe have similar approaches to batching, but this doesn't mean LitServe can't improve upon their design. A consistent API contract would make LitServe more intuitive and easier to use correctly.
More context in recent Discord discussion.
The proposed change:
The text was updated successfully, but these errors were encountered: