Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Catch token count issue while streaming with customized models (#3241)
* Catch token count issue while streaming with customized models If llama, llava, phi, or some other models are used for streaming (with stream=True), the current design would crash after fetching the response. A warning is enough in this case, just like the non-streaming use cases. * Only catch not implemented error --------- Co-authored-by: Chi Wang <[email protected]> Co-authored-by: Jack Gerrits <[email protected]>
- Loading branch information