Skip to content

Commit

Permalink
Catch token count issue while streaming with customized models (#3241)
Browse files Browse the repository at this point in the history
* Catch token count issue while streaming with customized models

If llama, llava, phi, or some other models are used for streaming (with stream=True), the current design would crash after fetching the response.

A warning is enough in this case, just like the non-streaming use cases.

* Only catch not implemented error

---------

Co-authored-by: Chi Wang <[email protected]>
Co-authored-by: Jack Gerrits <[email protected]>
  • Loading branch information
3 people authored Sep 25, 2024
1 parent c1289b4 commit ece6924
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion autogen/oai/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,12 @@ def create(self, params: Dict[str, Any]) -> ChatCompletion:

# Prepare the final ChatCompletion object based on the accumulated data
model = chunk.model.replace("gpt-35", "gpt-3.5") # hack for Azure API
prompt_tokens = count_token(params["messages"], model)
try:
prompt_tokens = count_token(params["messages"], model)
except NotImplementedError as e:
# Catch token calculation error if streaming with customized models.
logger.warning(str(e))
prompt_tokens = 0
response = ChatCompletion(
id=chunk.id,
model=chunk.model,
Expand Down

0 comments on commit ece6924

Please sign in to comment.