Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add pipeline run with kwargs to Haystack 2.x #6266

Closed
wants to merge 14 commits into from

Conversation

vblagoje
Copy link
Member

@vblagoje vblagoje commented Nov 9, 2023

Why:

The current implementation of Haystack 2.x pipelines requires users to explicitly map input data to specific components, which can be cumbersome and error-prone, especially as the complexity of the pipeline grows.

What:

We've streamlined the process of feeding inputs into the pipeline by allowing users to pass kwargs directly. The pipeline can now intelligently resolve which components can utilize the provided inputs, simplifying the interface and reducing the likelihood of user errors.

How can it be used:

With this enhancement, running a pipeline becomes more intuitive. Users can provide the inputs without specifying the components they correspond to:

@component
class Hello:
    @component.output_types(output=str)
    def run(self, word: str):
        """
        Takes a string in input and returns "Hello, <string>!"
        in output.
        """
        return {"output": f"Hello, {word}!"}

pipeline = Pipeline()
pipeline.add_component("hello", Hello())
pipeline.add_component("hello2", Hello())
pipeline.connect("hello.output", "hello2.word")

# current approach
result = pipeline.run(data={"hello": {"word": "world"}})
assert result == {"hello2": {"output": "Hello, Hello, world!!"}}

# new simplified input resolution approach
result = pipeline.run(word="world")
assert result == {"hello2": {"output": "Hello, Hello, world!!"}}

How did you test it:

This is a work in progress as we await a release of the latest canals with supporting methods for component input that this PR relies on. The unit tests will be in test/preview/test_pipeline.py

Notes for reviewer:

This is a draft for now so we can discuss the approach early. We are still awaiting the new release of canals.
DO NOT INTEGRATE

@vblagoje vblagoje requested a review from a team as a code owner November 9, 2023 15:28
@vblagoje vblagoje requested review from ZanSara and removed request for a team November 9, 2023 15:28
@vblagoje vblagoje marked this pull request as draft November 9, 2023 15:28
@github-actions github-actions bot added topic:tests 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Nov 9, 2023
@masci
Copy link
Contributor

masci commented Nov 9, 2023

I'm hesitant to "pollute" the keyword args for the run method. What about keep using data, but without addressing the component explicitly? Something like

# current approach
result = pipeline.run(data={"hello": {"word": "world"}})

# new simplified input resolution approach
result = pipeline.run(data={"word": "world"})

@vblagoje
Copy link
Member Author

I'm hesitant to "pollute" the keyword args for the run method. What about keep using data, but without addressing the component explicitly? Something like

# current approach
result = pipeline.run(data={"hello": {"word": "world"}})

# new simplified input resolution approach
result = pipeline.run(data={"word": "world"})

One option we also have, which I'm not sure how smart is, is to leave the overloads along with bare kwargs run, while documenting and promoting data parameter run only.

@vblagoje vblagoje marked this pull request as ready for review November 17, 2023 17:08
@vblagoje vblagoje requested a review from a team as a code owner November 17, 2023 17:08
@vblagoje
Copy link
Member Author

vblagoje commented Nov 17, 2023

Increasing the visibility from Draft to Open, although this PR is not ready yet. I would love to get your general feedback one more time before I add more unit tests. See a single unit test to understand how we now support three pipeline run invocation variants:

@component
class Hello:
    @component.output_types(output=str)
    def run(self, word: str):
        """
        Takes a string in input and returns "Hello, <string>!"
        in output.
        """
        return {"output": f"Hello, {word}!"}

pipeline = Pipeline()
pipeline.add_component("hello", Hello())
pipeline.add_component("hello2", Hello())

pipeline.connect("hello.output", "hello2.word")

# Current approach
result = pipeline.run(data={"hello": {"word": "world"}})
assert result == {"hello2": {"output": "Hello, Hello, world!!"}}

# Massi's approach, see above
result = pipeline.run(data={"word": "world"})
assert result == {"hello2": {"output": "Hello, Hello, world!!"}}

# Raw kwargs, we can clearly comment that it is unsupported and leave it, or completely remove it
result = pipeline.run(word="world")
assert result == {"hello2": {"output": "Hello, Hello, world!!"}}

Let's have a discussion about what's best going forward.

@ZanSara
Copy link
Contributor

ZanSara commented Nov 20, 2023

I personally prefer the kwargs approach, like:

pipeline.run(hello="world")

indeed that would "pollute" the run method kwargs, but there is always the option of setting other parameters as attributes, like:

pipeline.max_loops = 10
pipeline.debug = True
pipeline.run(hello="world")

Not the most elegant, but could work I believe. We may even keep the "old" dict-based run method as a private method, such as:

pipeline._run(data={"greet": {"hello": "world"}}, max_loops=10)

to be used in case of need.

@masci
Copy link
Contributor

masci commented Nov 20, 2023

Please keep in mind reversibility and uncertainty here: at the time being we are not sure we won't need to pass additional keyword arguments to run, and the decision of going with the "kwargs" approach becomes irreversible.

On the other side, by simplifying the data dictionary we get an immediate improvement that can be later changed into the "kwargs" approach when we are sure we won't need any additional keyword argument.

@vblagoje
Copy link
Member Author

Can we go forward here with Massi's proposal while leaving the run with args and kwargs in the code, as in the PR right now, but slapping a warning that it is not a supported interface? LMK @masci @ZanSara

Having this decision in place, I can focus on additional pydocs and unit tests

@masci
Copy link
Contributor

masci commented Nov 23, 2023

I think the code can be much simpler than this if we keep the data={} approach, I would prefer we don't introduce complexity we're not sure we're going to need.

@vblagoje
Copy link
Member Author

No problem @masci , I'll simplify and add more unit tests!

@vblagoje
Copy link
Member Author

This should be it, more or less. If someone can figure out how to create a unit test to cause some unexpected or non-logical result - please do.

@vblagoje
Copy link
Member Author

@ZanSara and @masci this one should be ready to go now, we should be able to handle both types of inputs now via data parameter

@vblagoje vblagoje closed this Nov 27, 2023
@vblagoje vblagoje deleted the pipeline_run_input branch November 27, 2023 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow Simplified Input Slot Resolution for Haystack 2.x Pipelines
3 participants