llama.cpp: LlamaCppGenerator.run() raises 'TypeError' when passing {"stream": True} #678

paulgekeler · 2024-04-22T19:27:28Z

Describe the bug
When calling LlamaCppGenerator.run() with generation_kwargs={"stream": True}, a TypeError "'generator' object is not subscriptable" is raised in line 97: replies = [output["choices"][0]["text"]], because the create_completion function of the underlying llama-cpp-python module returns a generator object in this case.

To Reproduce
Reproducable whenever run is called with generation_kwargs={"stream": True}
E.g.

from haystack_integrations.components.generators.llama_cpp import LlamaCppGenerator
g = LlamaCppGenerator(model="llama.cpp/models/llama-2-7b-chat/ggml-models-Q4_K_M.gguf", n_ctx=2048, n_batch=128, model_kwargs={"verbose": False, "use_mlock": True}) # happens no matter the model_kwargs
g.warm_up()
g.run("The purpose of life is", generation_kwargs={"stream": True})

(Won't run because of the model path on my machine obvs)

Expected behaviour
The underlying create_completion function returns a generator in this case. So should the run function.

Fix suggestion
I guess easiest would be to return the generator object in this case.

Describe your environment (please complete the following information):

OS: Ubuntu Linux (Wsl)
Haystack version: haystack_ai-2.0.1
Integration version: llama-cpp-haystack-0.3.0

The text was updated successfully, but these errors were encountered:

masci · 2024-05-10T05:47:07Z

This is unfortunately expected as currently no components are capable to return a streamable object. We're working on a solution in Haystack, when ready we'll roll it out to all the integrations that will need it.

paulgekeler · 2024-05-10T15:53:09Z

Thanks for replying. Yes, I figured so. For anyone interested, I have a workaround for the time being. Add the following lines in the run function of generators.py.

if "stream" in updated_generation_kwargs and updated_generation_kwargs["stream"] == True:
	return {"replies": [output], "meta": []}

Then you can iterate over the generator and retrieve each chunk as such

for answer_chunk in answer_generator:
	answer_chunk["choices"][0]["text"]

Wherever the run function is called.

anakin87 · 2024-10-28T09:56:02Z

duplicate of #730

paulgekeler added the bug Something isn't working label Apr 22, 2024

davidsbatista added the integration:llama_cpp label May 3, 2024

masci added the topic:streaming label May 10, 2024

anakin87 mentioned this issue Jun 20, 2024

Add streaming support in LlamaCPPGenerator AND LlamaCPPChatGenerator #730

Open

anakin87 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp: LlamaCppGenerator.run() raises 'TypeError' when passing {"stream": True} #678

llama.cpp: LlamaCppGenerator.run() raises 'TypeError' when passing {"stream": True} #678

paulgekeler commented Apr 22, 2024

masci commented May 10, 2024

paulgekeler commented May 10, 2024 •

edited

Loading

anakin87 commented Oct 28, 2024

llama.cpp: LlamaCppGenerator.run() raises 'TypeError' when passing {"stream": True} #678

llama.cpp: LlamaCppGenerator.run() raises 'TypeError' when passing {"stream": True} #678

Comments

paulgekeler commented Apr 22, 2024

masci commented May 10, 2024

paulgekeler commented May 10, 2024 • edited Loading

anakin87 commented Oct 28, 2024

paulgekeler commented May 10, 2024 •

edited

Loading