feat: extend PromptBuilder and deprecate DynamicPromptBuilder #7655

tstadel · 2024-05-06T15:10:18Z

Related Issues

Currently we cannot have both:

a default prompt template defined (PromptBuilder)
dynamically change prompt templates at runtime (DynamicPromptBuilder)

There are two options:

A we extend DynamicPromptBuilder and leave PromptBuilder as is
B we extend PromptBuilder and deprecate DynamicPromptBuilder

Edit 07.05.: We decided to go with B

This is Option B
See #7652 for Option A

Proposed Changes:

This extends PromptBuilder to change prompts at query time.

default_template = "This is the default prompt: \\n Query: {{query}}"
prompt_builder = PromptBuilder(template=default_template)

pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)

# using the default prompt
result = pipe.run(
    data={
        "prompt_builder": {
            "query": "Where does the speaker live?",
        },
    }
)
#  "This is the default prompt: \n Query: Where does the speaker live?"

# using the dynamic prompt
result = pipe.run(
    data={
        "prompt_builder": {
            "template": "This is the dynamic prompt:\\n Query: {{query}}",
            "query": "Where does the speaker live?",
        },
    }
)
#  "This is the dynamic prompt: \n Query: Where does the speaker live?"

How did you test it?

added tests

Notes for the reviewer

~~There is a breaking change: required_variables param has been changed to optional_variables as most variables of templates are required anyways. We can undo that if necessary.~~
DynamicPromptBuilder is being deprecated
The Chat counterpart to PromptBuilder is implemented in feat: add ChatPromptBuilder, deprecate DynamicChatPromptBuilder #7663

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

tstadel · 2024-05-07T13:04:05Z

We decided to go with this approach B.

tstadel · 2024-05-07T13:15:54Z

I've removed all breaking changes. PromptBuilder should be have the same as before, extended by the dynamic template functionality.

coveralls · 2024-05-07T14:51:03Z

Pull Request Test Coverage Report for Build 9172968594

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.02%) to 90.575%

Totals
Change from base Build 9129529675:	0.02%
Covered Lines:	6602
Relevant Lines:	7289

💛 - Coveralls

vblagoje · 2024-05-10T13:07:14Z

@bilgeyucel let me try to answer but I'll leave it to @tstadel to confirm.

don't have to use it - it is optional, no variables -> no other components providing data (e.g. documents) to PB.
It is an additional feature, to me it doesn't seem like breaking, @tstadel will confirm
We need to use variables whenever other components provide PromptBuilder with template variables data/values. Without it PB is not that usable in pipeline settings.

tstadel · 2024-05-10T13:36:30Z

@bilgeyucel let me try to answer but I'll leave it to @tstadel to confirm.

don't have to use it - it is optional, no variables -> no other components providing data (e.g. documents) to PB.

It is an additional feature, to me it doesn't seem like breaking, @tstadel will confirm

We need to use variables whenever other components provide PromptBuilder with template variables data/values. Without it PB is not that usable in pipeline settings.

Almost:
If you pass template but not variables, input slots will be inferred from template as before (No breaking change!). So

you don't have to use variables at all if you are good with the input slots inferred from template
if you don't pass template, you have to pass variables in order to use input slots in dynamic templates, there is no other way to define them
template_variables is optional, you'll never be forced to define them

TuanaCelik · 2024-05-10T13:39:03Z

Ok I think I understand what's going on. Let me explain and you tell me if I'm correct, followed by some thoughts:
What's going on:

Solution B that @tstadel suggests extends the PromptBuilder to do the following:
Basically, template becomes not only an initialization argument but also a runtime variable for PromptBuilder
When user 'overrides' template at .run(), they may also change prompt input variables (like document, query) - this is inferred? Can I just override it with whatever variable and run it?

What I am worried about:

If I am correct that @vblagoje you're suggesting that the changed variables for the templates are not inferred, we have to provide variables separately to the .run() correct?
This would be quite complex to explain to users imo. If there's any way to avoid making it so that variables of any kind have to be provided separately, I would suggest we do that.

Please educate me here though, maybe I'm misunderstanding something

tstadel · 2024-05-10T13:39:13Z

If no other components can provide data otherwise, then the variables parameter becomes a must in most pipelines such as RAG
If I can eliminate "template_variables", and pass data={"prompt_builder": {"target_language": "Spanish"}} instead of data={"prompt_builder": {"template_variables": {"target_language": "Spanish"}}}, it's great. But the example code doesn't imply that.

Here's my understanding of how to use a static prompt with PromptBuilder in a pipeline. @tstadel please confirm 🙏

Before

The current implementation of a RAG pipeline:

from haystack import Pipeline

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder = PromptBuilder(template=template))
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))

basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query}
})

After

With this PR, the updated pipeline will look like this:

from haystack import Pipeline

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

prompt_builder = PromptBuilder(template=template, variables=["documents"])

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))

basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"template_variables": {"query": query}}
})

1 - I added variables=["documents"] to my PromptBuilder because I'll inject documents coming from the retriever 2 - I added "template_variables" key as I run the pipeline

Fortunately no :-)
It will work exactly as before.

TuanaCelik · 2024-05-10T13:41:42Z

Additionally, looking at the code, I also see some inconsistencies that we shouldn't have if we must have variables..
In initialization, we optionally provide variables (my understanding, this is for when we override the template yes?)
But then, in the run function, we need to provide template_variables? Wouldn't these 2 be the same thing?

TuanaCelik · 2024-05-10T13:48:59Z

Ok so:

I can use the PromptBuilder exactly the same as before without providing variables/template variables at all even if say a retriever is fowarding documents to it in pipeline.connect()
I will have to provide variables if I'm overriding template
One thing I just don't yet fully understand is when we would use template_variables vs variables and what the difference is (even if you say we don't need to use template_variables @tstadel - thanks for the explanations!!! Really helps

vblagoje · 2024-05-10T13:51:15Z

No, we need variables so PB can accept data from other pipeline components. The other ones are provided by the user in run time

tstadel · 2024-05-10T13:57:12Z

Ok I think I understand what's going on. Let me explain and you tell me if I'm correct, followed by some thoughts: What's going on:

Solution B that @tstadel suggests extends the PromptBuilder to do the following:

Basically, template becomes not only an initialization argument but also a runtime variable for PromptBuilder

When user 'overrides' template at .run(), they may also change prompt input variables (like document, query) - this is inferred? Can I just override it with whatever variable and run it?

What I am worried about:

If I am correct that @vblagoje you're suggesting that the changed variables for the templates are not inferred, we have to provide variables separately to the .run() correct?

This would be quite complex to explain to users imo. If there's any way to avoid making it so that variables of any kind have to be provided separately, I would suggest we do that.

Please educate me here though, maybe I'm misunderstanding something

@TuanaCelik @bilgeyucel @vblagoje
Ok here is an illustrative example that should help shed light on what's not obvious:

@bilgeyucel 's example

from haystack import Pipeline

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder = PromptBuilder(template=template))
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))

basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query}
})

Here the following input slots are inferred from template:

documents
query

Now let's change template at runtime having the same variables:

fancy_template = """
This is a super fancy dynamic template:

Documents:
{% for document in documents %}
    Document {{ document.id }}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": fancy_template}
})

Then this will work seamlessly as we use the same input slots:

documents
query

Now there are two more cases for dynamic templates:
Case A)
We use less input slots as during init:

query_only_template = """
Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": query_only_template}
})

This will also work seamlessly as all template variables (i.e. query) are covered by input slots.

Case B)
We use more input slots as during init:

even_fancier_template = """
{{ header }}
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": even_fancier_template}
})

Note that the passed template now requires:

documents
query
header

The first two are covered by input slots, but the third header is not. That means there is no way to pass header through pipeline params. There are two options to set header now:

Case B1)
Set header via template_variables:

even_fancier_template = """
{{ header }}
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
header = "This is my header"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": even_fancier_template, "template_variables": {"header": header}}
})

Case B2)
Define header as input slot via variables at init:

from haystack import Pipeline

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder = PromptBuilder(template=template, variables=["query", "documents", "header"]))
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))

basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

even_fancier_template = """
{{ header }}
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
headers = "This is my header"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": even_fancier_template, "header": header}
})

Note, that variables are set to:

documents
query
header

Hence, we can pass header to prompt_builder via pipeline.

tstadel · 2024-05-10T13:58:34Z

No, we need variables so PB can accept data from other pipeline components. The other ones are provided by the user in run time

@vblagoje please don't forget that variables are being inferred from template if template is set, but variables is not.

tstadel · 2024-05-10T14:11:56Z

Additionally, looking at the code, I also see some inconsistencies that we shouldn't have if we must have variables.. In initialization, we optionally provide variables (my understanding, this is for when we override the template yes?) But then, in the run function, we need to provide template_variables? Wouldn't these 2 be the same thing?

@TuanaCelik
I wouldn't mix them up, as variables just define the variables that prompt builder instance expects to receive from the pipeline. template_variables on the other hand overwrite or extend pipeline provided variables by user defined values.
Maybe we can find a better name for template_variables here.

tstadel · 2024-05-13T11:29:09Z

@vblagoje
The new documentation / explanation approach would look like this.
We start with
https://docs.haystack.deepset.ai/docs/promptbuilder and keep it the same*.
We add the following sections:

Changing the template at runtime (Prompt Engineering)

PromptBuilder allows you to switch the prompt template of an existing pipeline. Below's example builds on top of the existing pipeline of the previous section. The existing pipeline is invoked with a new prompt template:

documents = [
    Document(content="Joe lives in Berlin", meta={"name": "doc1"}), 
    Document(content="Joe is a software engineer", meta={"name": "doc1"}),
]
new_template = """
    You are a helpful assistant.
    Given these documents, answer the question.
    Documents:
    {% for doc in documents %}
        Document {{ loop.index }}:
        Document name: {{ doc.meta['name'] }}
        {{ doc.content }}
    {% endfor %}

    Question: {{ query }}
    Answer:
    """
p.run({
      "prompt_builder": {
          "documents": documents, 
          "query": question, 
          "template": new_template,
      },
  })

If you want to use different variables during prompt engineering than in the default template, you can do so by setting PromptBuilder's variables init parameter accordingly.

Overwriting variables at runtime

In case you want to overwrite the values of variables, you can use template_variables during runtime as illustrated below:

language_template = """
    You are a helpful assistant.
    Given these documents, answer the question.
    Documents:
    {% for doc in documents %}
        Document {{ loop.index }}:
        Document name: {{ doc.meta['name'] }}
        {{ doc.content }}
    {% endfor %}

    Question: {{ query }}
    Please provide your answer in {{ answer_language | default('English') }}
    Answer:
    """
p.run({
      "prompt_builder": {
          "documents": documents, 
          "query": question, 
          "template": language_template, 
          "template_variables": {"answer_language": "German"},
      },
  })

Note that language_template introduces variable answer_language which is not bound to any pipeline variable. If not set otherwise, it would evaluate to its default value 'English'. In this example we are overwriting its value to 'German'.
template_variables allows you to overwrite pipeline variables (such as documents) as well.

= except for the already broken examples

…stack into feat/extend_promptbuilder

dfokina · 2024-05-15T13:59:36Z

Hey @vblagoje @tstadel this last message with the docs suggestions looks reasonable to me, and the idea is pretty easy to understand :) We can adjust the examples slightly to fit into the docs, and it would look good.

vblagoje · 2024-05-15T15:56:59Z

I also very much like this user-perspective driven documentation rather than what I first suggested. And would even merge this straight into main. But let's proceed forward with what we all agree.

Shall we use the above written user perspective description in class pydocs as well @tstadel ?

tstadel · 2024-05-16T12:58:59Z

I also very much like this user-perspective driven documentation rather than what I first suggested. And would even merge this straight into main. But let's proceed forward with what we all agree.

Shall we use the above written user perspective description in class pydocs as well @tstadel ?

@vblagoje Yes, why not. I can update it.

tstadel · 2024-05-17T10:09:25Z

@vblagoje pydocs have been updated.

tstadel added 4 commits May 6, 2024 12:04

feat: add default template to DynamicPromptBuilder

e8ea80a

fix mypy

4f203bf

fix mypy

37f329a

extend PromptBuilder and deprecate DynamicPromptBuilder

f0b9e9f

github-actions bot added topic:tests 2.x Related to Haystack v2.0 labels May 6, 2024

tstadel mentioned this pull request May 6, 2024

feat: add default template to DynamicPromptBuilder #7652

Closed

make backward-compatible: optional -> required

90ba6bc

github-actions bot added the type:documentation Improvements on the docs label May 7, 2024

tstadel added 8 commits May 7, 2024 14:01

make backward-compatible: _template_string

f525094

make backward-compatible: missing_required_vars error

548316e

add test for no template case

9c9659e

better docstrings

097ac98

some chors

c3640ef

some chors

958f62f

add reno

946d387

revert test_dynamic_prompt_builder.py

b858935

tstadel marked this pull request as ready for review May 7, 2024 12:56

tstadel requested review from a team as code owners May 7, 2024 12:56

tstadel requested review from dfokina and davidsbatista and removed request for a team May 7, 2024 12:56

better docstring

4e8a57e

make backward-compatible: reorder init args

fb9601b

fix tests

70f5d8b

tstadel and others added 3 commits May 13, 2024 12:58

make default template required and rework docstrings

522023a

docs chores

99da238

Merge branch 'main' into feat/extend_promptbuilder

bdf6c85

tstadel added 3 commits May 13, 2024 13:46

keep to_dict in place for easier review

bc5bf9b

Merge branch 'feat/extend_promptbuilder' of github.com:deepset-ai/hay…

53010d1

…stack into feat/extend_promptbuilder

remove unnecessary logger

f7e2a64

update docstring

5adcc58

Merge branch 'main' into feat/extend_promptbuilder

62b8964

silvanocerza approved these changes May 23, 2024

View reviewed changes

silvanocerza self-assigned this May 23, 2024

silvanocerza merged commit 83d3970 into main May 23, 2024
25 checks passed

silvanocerza deleted the feat/extend_promptbuilder branch May 23, 2024 14:03

dfokina mentioned this pull request May 24, 2024

docs: PromptBuilder changes in 2.2 #7741

Closed

silvanocerza mentioned this pull request Jul 5, 2024

Pipeline run order wrong #7985

Closed

1 task

shadeMe mentioned this pull request Jul 25, 2024

chore: Remove deprecated DynamicPromptBuilder and DynamicChatPromptBuilder components #8085

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: extend PromptBuilder and deprecate DynamicPromptBuilder #7655

feat: extend PromptBuilder and deprecate DynamicPromptBuilder #7655

tstadel commented May 6, 2024 •

edited

Loading

tstadel commented May 7, 2024

tstadel commented May 7, 2024

coveralls commented May 7, 2024 •

edited

Loading

vblagoje commented May 10, 2024 •

edited

Loading

tstadel commented May 10, 2024 •

edited

Loading

TuanaCelik commented May 10, 2024

tstadel commented May 10, 2024

Before

After

TuanaCelik commented May 10, 2024

TuanaCelik commented May 10, 2024

vblagoje commented May 10, 2024

tstadel commented May 10, 2024 •

edited

Loading

tstadel commented May 10, 2024

tstadel commented May 10, 2024

tstadel commented May 13, 2024 •

edited

Loading

dfokina commented May 15, 2024

vblagoje commented May 15, 2024 •

edited

Loading

tstadel commented May 16, 2024

tstadel commented May 17, 2024

feat: extend PromptBuilder and deprecate DynamicPromptBuilder #7655

feat: extend PromptBuilder and deprecate DynamicPromptBuilder #7655

Conversation

tstadel commented May 6, 2024 • edited Loading

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

tstadel commented May 7, 2024

tstadel commented May 7, 2024

coveralls commented May 7, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9172968594

Details

💛 - Coveralls

vblagoje commented May 10, 2024 • edited Loading

tstadel commented May 10, 2024 • edited Loading

TuanaCelik commented May 10, 2024

tstadel commented May 10, 2024

Before

After

TuanaCelik commented May 10, 2024

TuanaCelik commented May 10, 2024

vblagoje commented May 10, 2024

tstadel commented May 10, 2024 • edited Loading

tstadel commented May 10, 2024

tstadel commented May 10, 2024

tstadel commented May 13, 2024 • edited Loading

Changing the template at runtime (Prompt Engineering)

Overwriting variables at runtime

dfokina commented May 15, 2024

vblagoje commented May 15, 2024 • edited Loading

tstadel commented May 16, 2024

tstadel commented May 17, 2024

tstadel commented May 6, 2024 •

edited

Loading

coveralls commented May 7, 2024 •

edited

Loading

vblagoje commented May 10, 2024 •

edited

Loading

tstadel commented May 10, 2024 •

edited

Loading

tstadel commented May 10, 2024 •

edited

Loading

tstadel commented May 13, 2024 •

edited

Loading

vblagoje commented May 15, 2024 •

edited

Loading