Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: extend PromptBuilder and deprecate DynamicPromptBuilder #7655

Merged
merged 25 commits into from
May 23, 2024

Conversation

tstadel
Copy link
Member

@tstadel tstadel commented May 6, 2024

Related Issues

Currently we cannot have both:

  • a default prompt template defined (PromptBuilder)
  • dynamically change prompt templates at runtime (DynamicPromptBuilder)

There are two options:

  • A we extend DynamicPromptBuilder and leave PromptBuilder as is
  • B we extend PromptBuilder and deprecate DynamicPromptBuilder

Edit 07.05.: We decided to go with B

This is Option B
See #7652 for Option A

Proposed Changes:

This extends PromptBuilder to change prompts at query time.

default_template = "This is the default prompt: \\n Query: {{query}}"
prompt_builder = PromptBuilder(template=default_template)

pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)

# using the default prompt
result = pipe.run(
    data={
        "prompt_builder": {
            "query": "Where does the speaker live?",
        },
    }
)
#  "This is the default prompt: \n Query: Where does the speaker live?"

# using the dynamic prompt
result = pipe.run(
    data={
        "prompt_builder": {
            "template": "This is the dynamic prompt:\\n Query: {{query}}",
            "query": "Where does the speaker live?",
        },
    }
)
#  "This is the dynamic prompt: \n Query: Where does the speaker live?"

How did you test it?

  • added tests

Notes for the reviewer

Checklist

@github-actions github-actions bot added the type:documentation Improvements on the docs label May 7, 2024
@tstadel tstadel marked this pull request as ready for review May 7, 2024 12:56
@tstadel tstadel requested review from a team as code owners May 7, 2024 12:56
@tstadel tstadel requested review from dfokina and davidsbatista and removed request for a team May 7, 2024 12:56
@tstadel
Copy link
Member Author

tstadel commented May 7, 2024

We decided to go with this approach B.

@tstadel
Copy link
Member Author

tstadel commented May 7, 2024

I've removed all breaking changes. PromptBuilder should be have the same as before, extended by the dynamic template functionality.

@coveralls
Copy link
Collaborator

coveralls commented May 7, 2024

Pull Request Test Coverage Report for Build 9172968594

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 90.575%

Totals Coverage Status
Change from base Build 9129529675: 0.02%
Covered Lines: 6602
Relevant Lines: 7289

💛 - Coveralls

@vblagoje
Copy link
Member

vblagoje commented May 10, 2024

@bilgeyucel let me try to answer but I'll leave it to @tstadel to confirm.

  • don't have to use it - it is optional, no variables -> no other components providing data (e.g. documents) to PB.
  • It is an additional feature, to me it doesn't seem like breaking, @tstadel will confirm
  • We need to use variables whenever other components provide PromptBuilder with template variables data/values. Without it PB is not that usable in pipeline settings.

@tstadel
Copy link
Member Author

tstadel commented May 10, 2024

@bilgeyucel let me try to answer but I'll leave it to @tstadel to confirm.

  • don't have to use it - it is optional, no variables -> no other components providing data (e.g. documents) to PB.
  • It is an additional feature, to me it doesn't seem like breaking, @tstadel will confirm
  • We need to use variables whenever other components provide PromptBuilder with template variables data/values. Without it PB is not that usable in pipeline settings.

Almost:
If you pass template but not variables, input slots will be inferred from template as before (No breaking change!). So

  • you don't have to use variables at all if you are good with the input slots inferred from template
  • if you don't pass template, you have to pass variables in order to use input slots in dynamic templates, there is no other way to define them
  • template_variables is optional, you'll never be forced to define them

@TuanaCelik
Copy link
Contributor

Ok I think I understand what's going on. Let me explain and you tell me if I'm correct, followed by some thoughts:
What's going on:

  • Solution B that @tstadel suggests extends the PromptBuilder to do the following:
  • Basically, template becomes not only an initialization argument but also a runtime variable for PromptBuilder
  • When user 'overrides' template at .run(), they may also change prompt input variables (like document, query) - this is inferred? Can I just override it with whatever variable and run it?

What I am worried about:

  • If I am correct that @vblagoje you're suggesting that the changed variables for the templates are not inferred, we have to provide variables separately to the .run() correct?
  • This would be quite complex to explain to users imo. If there's any way to avoid making it so that variables of any kind have to be provided separately, I would suggest we do that.

Please educate me here though, maybe I'm misunderstanding something

@tstadel
Copy link
Member Author

tstadel commented May 10, 2024

  • If no other components can provide data otherwise, then the variables parameter becomes a must in most pipelines such as RAG
  • If I can eliminate "template_variables", and pass data={"prompt_builder": {"target_language": "Spanish"}} instead of data={"prompt_builder": {"template_variables": {"target_language": "Spanish"}}}, it's great. But the example code doesn't imply that.

Here's my understanding of how to use a static prompt with PromptBuilder in a pipeline. @tstadel please confirm 🙏

Before

The current implementation of a RAG pipeline:

from haystack import Pipeline

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder = PromptBuilder(template=template))
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))

basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query}
})

After

With this PR, the updated pipeline will look like this:

from haystack import Pipeline

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

prompt_builder = PromptBuilder(template=template, variables=["documents"])

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))

basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"template_variables": {"query": query}}
})

1 - I added variables=["documents"] to my PromptBuilder because I'll inject documents coming from the retriever 2 - I added "template_variables" key as I run the pipeline

Fortunately no :-)
It will work exactly as before.

@TuanaCelik
Copy link
Contributor

Additionally, looking at the code, I also see some inconsistencies that we shouldn't have if we must have variables..
In initialization, we optionally provide variables (my understanding, this is for when we override the template yes?)
But then, in the run function, we need to provide template_variables? Wouldn't these 2 be the same thing?

@TuanaCelik
Copy link
Contributor

Ok so:

  • I can use the PromptBuilder exactly the same as before without providing variables/template variables at all even if say a retriever is fowarding documents to it in pipeline.connect()
  • I will have to provide variables if I'm overriding template
  • One thing I just don't yet fully understand is when we would use template_variables vs variables and what the difference is (even if you say we don't need to use template_variables @tstadel - thanks for the explanations!!! Really helps

@vblagoje
Copy link
Member

No, we need variables so PB can accept data from other pipeline components. The other ones are provided by the user in run time

@tstadel
Copy link
Member Author

tstadel commented May 10, 2024

Ok I think I understand what's going on. Let me explain and you tell me if I'm correct, followed by some thoughts: What's going on:

  • Solution B that @tstadel suggests extends the PromptBuilder to do the following:
  • Basically, template becomes not only an initialization argument but also a runtime variable for PromptBuilder
  • When user 'overrides' template at .run(), they may also change prompt input variables (like document, query) - this is inferred? Can I just override it with whatever variable and run it?

What I am worried about:

  • If I am correct that @vblagoje you're suggesting that the changed variables for the templates are not inferred, we have to provide variables separately to the .run() correct?
  • This would be quite complex to explain to users imo. If there's any way to avoid making it so that variables of any kind have to be provided separately, I would suggest we do that.

Please educate me here though, maybe I'm misunderstanding something

@TuanaCelik @bilgeyucel @vblagoje
Ok here is an illustrative example that should help shed light on what's not obvious:

@bilgeyucel 's example

from haystack import Pipeline

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder = PromptBuilder(template=template))
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))

basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query}
})

Here the following input slots are inferred from template:

  • documents
  • query

Now let's change template at runtime having the same variables:

fancy_template = """
This is a super fancy dynamic template:

Documents:
{% for document in documents %}
    Document {{ document.id }}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": fancy_template}
})

Then this will work seamlessly as we use the same input slots:

  • documents
  • query

Now there are two more cases for dynamic templates:
Case A)
We use less input slots as during init:

query_only_template = """
Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": query_only_template}
})

This will also work seamlessly as all template variables (i.e. query) are covered by input slots.

Case B)
We use more input slots as during init:

even_fancier_template = """
{{ header }}
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": even_fancier_template}
})

Note that the passed template now requires:

  • documents
  • query
  • header

The first two are covered by input slots, but the third header is not. That means there is no way to pass header through pipeline params. There are two options to set header now:

Case B1)
Set header via template_variables:

even_fancier_template = """
{{ header }}
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
header = "This is my header"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": even_fancier_template, "template_variables": {"header": header}}
})

Case B2)
Define header as input slot via variables at init:

from haystack import Pipeline

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
basic_rag_pipeline.add_component("prompt_builder", prompt_builder = PromptBuilder(template=template, variables=["query", "documents", "header"]))
basic_rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-3.5-turbo"))

basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

even_fancier_template = """
{{ header }}
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{query}}
Answer:
"""

query = "What does Rhodes Statue look like?"
headers = "This is my header"
response = basic_rag_pipeline.run({
                 "retriever": {"query": query}, 
                 "prompt_builder": {"query": query, "template": even_fancier_template, "header": header}
})

Note, that variables are set to:

  • documents
  • query
  • header

Hence, we can pass header to prompt_builder via pipeline.

@tstadel
Copy link
Member Author

tstadel commented May 10, 2024

No, we need variables so PB can accept data from other pipeline components. The other ones are provided by the user in run time

@vblagoje please don't forget that variables are being inferred from template if template is set, but variables is not.

@tstadel
Copy link
Member Author

tstadel commented May 10, 2024

Additionally, looking at the code, I also see some inconsistencies that we shouldn't have if we must have variables.. In initialization, we optionally provide variables (my understanding, this is for when we override the template yes?) But then, in the run function, we need to provide template_variables? Wouldn't these 2 be the same thing?

@TuanaCelik
I wouldn't mix them up, as variables just define the variables that prompt builder instance expects to receive from the pipeline. template_variables on the other hand overwrite or extend pipeline provided variables by user defined values.
Maybe we can find a better name for template_variables here.

@tstadel
Copy link
Member Author

tstadel commented May 13, 2024

@vblagoje
The new documentation / explanation approach would look like this.
We start with
https://docs.haystack.deepset.ai/docs/promptbuilder and keep it the same*.
We add the following sections:

Changing the template at runtime (Prompt Engineering)

PromptBuilder allows you to switch the prompt template of an existing pipeline. Below's example builds on top of the existing pipeline of the previous section. The existing pipeline is invoked with a new prompt template:

documents = [
    Document(content="Joe lives in Berlin", meta={"name": "doc1"}), 
    Document(content="Joe is a software engineer", meta={"name": "doc1"}),
]
new_template = """
    You are a helpful assistant.
    Given these documents, answer the question.
    Documents:
    {% for doc in documents %}
        Document {{ loop.index }}:
        Document name: {{ doc.meta['name'] }}
        {{ doc.content }}
    {% endfor %}

    Question: {{ query }}
    Answer:
    """
p.run({
      "prompt_builder": {
          "documents": documents, 
          "query": question, 
          "template": new_template,
      },
  })

If you want to use different variables during prompt engineering than in the default template, you can do so by setting PromptBuilder's variables init parameter accordingly.

Overwriting variables at runtime

In case you want to overwrite the values of variables, you can use template_variables during runtime as illustrated below:

language_template = """
    You are a helpful assistant.
    Given these documents, answer the question.
    Documents:
    {% for doc in documents %}
        Document {{ loop.index }}:
        Document name: {{ doc.meta['name'] }}
        {{ doc.content }}
    {% endfor %}

    Question: {{ query }}
    Please provide your answer in {{ answer_language | default('English') }}
    Answer:
    """
p.run({
      "prompt_builder": {
          "documents": documents, 
          "query": question, 
          "template": language_template, 
          "template_variables": {"answer_language": "German"},
      },
  })

Note that language_template introduces variable answer_language which is not bound to any pipeline variable. If not set otherwise, it would evaluate to its default value 'English'. In this example we are overwriting its value to 'German'.
template_variables allows you to overwrite pipeline variables (such as documents) as well.

  • = except for the already broken examples

@dfokina
Copy link
Contributor

dfokina commented May 15, 2024

Hey @vblagoje @tstadel this last message with the docs suggestions looks reasonable to me, and the idea is pretty easy to understand :) We can adjust the examples slightly to fit into the docs, and it would look good.

@vblagoje
Copy link
Member

vblagoje commented May 15, 2024

I also very much like this user-perspective driven documentation rather than what I first suggested. And would even merge this straight into main. But let's proceed forward with what we all agree.

Shall we use the above written user perspective description in class pydocs as well @tstadel ?

@tstadel
Copy link
Member Author

tstadel commented May 16, 2024

I also very much like this user-perspective driven documentation rather than what I first suggested. And would even merge this straight into main. But let's proceed forward with what we all agree.

Shall we use the above written user perspective description in class pydocs as well @tstadel ?

@vblagoje Yes, why not. I can update it.

@tstadel
Copy link
Member Author

tstadel commented May 17, 2024

@vblagoje pydocs have been updated.

@silvanocerza silvanocerza self-assigned this May 23, 2024
@silvanocerza silvanocerza merged commit 83d3970 into main May 23, 2024
25 checks passed
@silvanocerza silvanocerza deleted the feat/extend_promptbuilder branch May 23, 2024 14:03
@silvanocerza silvanocerza mentioned this pull request Jul 5, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants