Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama.cpp: LlamaCppGenerator.run() raises 'TypeError' when passing {"stream": True} #678

Closed
paulgekeler opened this issue Apr 22, 2024 · 3 comments

Comments

@paulgekeler
Copy link

Describe the bug
When calling LlamaCppGenerator.run() with generation_kwargs={"stream": True}, a TypeError "'generator' object is not subscriptable" is raised in line 97: replies = [output["choices"][0]["text"]], because the create_completion function of the underlying llama-cpp-python module returns a generator object in this case.

To Reproduce
Reproducable whenever run is called with generation_kwargs={"stream": True}
E.g.

from haystack_integrations.components.generators.llama_cpp import LlamaCppGenerator
g = LlamaCppGenerator(model="llama.cpp/models/llama-2-7b-chat/ggml-models-Q4_K_M.gguf", n_ctx=2048, n_batch=128, model_kwargs={"verbose": False, "use_mlock": True}) # happens no matter the model_kwargs
g.warm_up()
g.run("The purpose of life is", generation_kwargs={"stream": True})

(Won't run because of the model path on my machine obvs)

Expected behaviour
The underlying create_completion function returns a generator in this case. So should the run function.

Fix suggestion
I guess easiest would be to return the generator object in this case.

Describe your environment (please complete the following information):

  • OS: Ubuntu Linux (Wsl)
  • Haystack version: haystack_ai-2.0.1
  • Integration version: llama-cpp-haystack-0.3.0
@paulgekeler paulgekeler added the bug Something isn't working label Apr 22, 2024
@masci
Copy link
Contributor

masci commented May 10, 2024

This is unfortunately expected as currently no components are capable to return a streamable object. We're working on a solution in Haystack, when ready we'll roll it out to all the integrations that will need it.

@paulgekeler
Copy link
Author

paulgekeler commented May 10, 2024

Thanks for replying. Yes, I figured so. For anyone interested, I have a workaround for the time being. Add the following lines in the run function of generators.py.

if "stream" in updated_generation_kwargs and updated_generation_kwargs["stream"] == True:
	return {"replies": [output], "meta": []}

Then you can iterate over the generator and retrieve each chunk as such

for answer_chunk in answer_generator:
	answer_chunk["choices"][0]["text"]

Wherever the run function is called.

@anakin87
Copy link
Member

duplicate of #730

@anakin87 anakin87 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants