feat: Add pipeline run with kwargs to Haystack 2.x #6266

vblagoje · 2023-11-09T15:28:38Z

Why:

The current implementation of Haystack 2.x pipelines requires users to explicitly map input data to specific components, which can be cumbersome and error-prone, especially as the complexity of the pipeline grows.

fixes Allow Simplified Input Slot Resolution for Haystack 2.x Pipelines #6101

What:

We've streamlined the process of feeding inputs into the pipeline by allowing users to pass kwargs directly. The pipeline can now intelligently resolve which components can utilize the provided inputs, simplifying the interface and reducing the likelihood of user errors.

How can it be used:

With this enhancement, running a pipeline becomes more intuitive. Users can provide the inputs without specifying the components they correspond to:

@component
class Hello:
    @component.output_types(output=str)
    def run(self, word: str):
        """
        Takes a string in input and returns "Hello, <string>!"
        in output.
        """
        return {"output": f"Hello, {word}!"}

pipeline = Pipeline()
pipeline.add_component("hello", Hello())
pipeline.add_component("hello2", Hello())
pipeline.connect("hello.output", "hello2.word")

# current approach
result = pipeline.run(data={"hello": {"word": "world"}})
assert result == {"hello2": {"output": "Hello, Hello, world!!"}}

# new simplified input resolution approach
result = pipeline.run(word="world")
assert result == {"hello2": {"output": "Hello, Hello, world!!"}}

How did you test it:

This is a work in progress as we await a release of the latest canals with supporting methods for component input that this PR relies on. The unit tests will be in test/preview/test_pipeline.py

Notes for reviewer:

This is a draft for now so we can discuss the approach early. We are still awaiting the new release of canals.
DO NOT INTEGRATE

masci · 2023-11-09T18:02:27Z

I'm hesitant to "pollute" the keyword args for the run method. What about keep using data, but without addressing the component explicitly? Something like

# current approach
result = pipeline.run(data={"hello": {"word": "world"}})

# new simplified input resolution approach
result = pipeline.run(data={"word": "world"})

vblagoje · 2023-11-10T09:11:14Z

I'm hesitant to "pollute" the keyword args for the run method. What about keep using data, but without addressing the component explicitly? Something like
# current approach
result = pipeline.run(data={"hello": {"word": "world"}})

# new simplified input resolution approach
result = pipeline.run(data={"word": "world"})

One option we also have, which I'm not sure how smart is, is to leave the overloads along with bare kwargs run, while documenting and promoting data parameter run only.

vblagoje · 2023-11-17T17:10:43Z

Increasing the visibility from Draft to Open, although this PR is not ready yet. I would love to get your general feedback one more time before I add more unit tests. See a single unit test to understand how we now support three pipeline run invocation variants:

@component
class Hello:
    @component.output_types(output=str)
    def run(self, word: str):
        """
        Takes a string in input and returns "Hello, <string>!"
        in output.
        """
        return {"output": f"Hello, {word}!"}

pipeline = Pipeline()
pipeline.add_component("hello", Hello())
pipeline.add_component("hello2", Hello())

pipeline.connect("hello.output", "hello2.word")

# Current approach
result = pipeline.run(data={"hello": {"word": "world"}})
assert result == {"hello2": {"output": "Hello, Hello, world!!"}}

# Massi's approach, see above
result = pipeline.run(data={"word": "world"})
assert result == {"hello2": {"output": "Hello, Hello, world!!"}}

# Raw kwargs, we can clearly comment that it is unsupported and leave it, or completely remove it
result = pipeline.run(word="world")
assert result == {"hello2": {"output": "Hello, Hello, world!!"}}

Let's have a discussion about what's best going forward.

ZanSara · 2023-11-20T09:36:43Z

I personally prefer the kwargs approach, like:

pipeline.run(hello="world")

indeed that would "pollute" the run method kwargs, but there is always the option of setting other parameters as attributes, like:

pipeline.max_loops = 10
pipeline.debug = True
pipeline.run(hello="world")

Not the most elegant, but could work I believe. We may even keep the "old" dict-based run method as a private method, such as:

pipeline._run(data={"greet": {"hello": "world"}}, max_loops=10)

to be used in case of need.

masci · 2023-11-20T18:52:15Z

Please keep in mind reversibility and uncertainty here: at the time being we are not sure we won't need to pass additional keyword arguments to run, and the decision of going with the "kwargs" approach becomes irreversible.

On the other side, by simplifying the data dictionary we get an immediate improvement that can be later changed into the "kwargs" approach when we are sure we won't need any additional keyword argument.

vblagoje · 2023-11-23T14:45:05Z

Can we go forward here with Massi's proposal while leaving the run with args and kwargs in the code, as in the PR right now, but slapping a warning that it is not a supported interface? LMK @masci @ZanSara

Having this decision in place, I can focus on additional pydocs and unit tests

masci · 2023-11-23T15:10:55Z

I think the code can be much simpler than this if we keep the data={} approach, I would prefer we don't introduce complexity we're not sure we're going to need.

vblagoje · 2023-11-23T15:54:51Z

No problem @masci , I'll simplify and add more unit tests!

vblagoje · 2023-11-23T17:40:53Z

This should be it, more or less. If someone can figure out how to create a unit test to cause some unexpected or non-logical result - please do.

vblagoje · 2023-11-24T10:31:15Z

@ZanSara and @masci this one should be ready to go now, we should be able to handle both types of inputs now via data parameter

Initial implementation

05167b8

vblagoje requested a review from a team as a code owner November 9, 2023 15:28

vblagoje requested review from ZanSara and removed request for a team November 9, 2023 15:28

vblagoje marked this pull request as draft November 9, 2023 15:28

github-actions bot added topic:tests 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Nov 9, 2023

vblagoje added 3 commits November 14, 2023 13:48

Handle run with data parameter with flat inputs (no component names)

157222e

Remove def run(self, **kwargs) -> Dict[str, Any]: as not needed

476816e

Add API run note

33edfe5

vblagoje force-pushed the pipeline_run_input branch from 71ff449 to 33edfe5 Compare November 14, 2023 13:01

vblagoje added 3 commits November 17, 2023 17:27

Merge branch 'main' into pipeline_run_input

2506174

Add release note

0b37677

Minor updates

6a2fe46

vblagoje marked this pull request as ready for review November 17, 2023 17:08

vblagoje requested a review from a team as a code owner November 17, 2023 17:08

vblagoje added 5 commits November 23, 2023 17:08

Remove over-engineering

a8e4720

Update existing unit tests

51495da

Pylint fixes

659f095

Add more unit tests

92c24de

More unit tests

e5388b6

Simplify and improve pydocs

1f20ee2

Improve pydocs, typing

ee005c9

vblagoje closed this Nov 27, 2023

vblagoje deleted the pipeline_run_input branch November 27, 2023 08:20

vblagoje mentioned this pull request Nov 27, 2023

feat: Allow flat dictionary Pipeline.run() inputs #6413

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add pipeline run with kwargs to Haystack 2.x #6266

feat: Add pipeline run with kwargs to Haystack 2.x #6266

vblagoje commented Nov 9, 2023 •

edited

Loading

masci commented Nov 9, 2023

vblagoje commented Nov 10, 2023

vblagoje commented Nov 17, 2023 •

edited

Loading

ZanSara commented Nov 20, 2023

masci commented Nov 20, 2023

vblagoje commented Nov 23, 2023

masci commented Nov 23, 2023

vblagoje commented Nov 23, 2023

vblagoje commented Nov 23, 2023

vblagoje commented Nov 24, 2023

feat: Add pipeline run with kwargs to Haystack 2.x #6266

feat: Add pipeline run with kwargs to Haystack 2.x #6266

Conversation

vblagoje commented Nov 9, 2023 • edited Loading

Why:

What:

How can it be used:

How did you test it:

Notes for reviewer:

masci commented Nov 9, 2023

vblagoje commented Nov 10, 2023

vblagoje commented Nov 17, 2023 • edited Loading

ZanSara commented Nov 20, 2023

masci commented Nov 20, 2023

vblagoje commented Nov 23, 2023

masci commented Nov 23, 2023

vblagoje commented Nov 23, 2023

vblagoje commented Nov 23, 2023

vblagoje commented Nov 24, 2023

vblagoje commented Nov 9, 2023 •

edited

Loading

vblagoje commented Nov 17, 2023 •

edited

Loading