[Feature] Generic input/output pipeline components #199

eddiebergman · 2023-12-09T19:15:24Z

It could be useful to consider pipelines where there is no concrete "thing" to build, piecing together all the components, as is done with the sklearn_builder. The reason this is not implemented is that for a given component, we have no idea how to pass something in to it and get something out of it, to pass to the next step. This is solved by sklearn where they assume a fit and transform/predict which lets them chain components together.

The current "workaround" is for users to define their own builder= to Node.build() which pieces together some object they would like, yet this is likely to complicated to expect of a new user for even a simple linear sequence of components.

For a more general pipeline structure, we would likely need our own custom Pipeline, along with a user specification of how to handle the input/output problem.

Some possible approaches for this problem:

Introduce a Pipe(lambda prv_output, thing: thing.get_output(prv_output)) that can sit between components, allowing a Pipeline to know how to funnel data through the Pipeline.
1. This will however run into issues where you have things like a Split or Join in which the signature might get complicated.
2. This also means most of the functionality of a Pipeline would live outside of the actual definition when considering more complex pipe definitions. (I wish Python allowed multi-line anonymous functions, grrrr)
3. This will arbitrarily lengthen the Pipeline definition as a Pipe would have to necessarily fit between every component in the sequence.
4. The operation might need extra context which is just not available in that scope and so we introduce another context passing issue, in which some context would also have to be passed through the pipe. Consider you need to make a choice of what method to call on the previous component based on a task type, it's not clear how this Pipe should be given the task type during the call through the Pipeline.

Sequential(
    Component(A),
	Pipe(lambda prv_output, a: a.do_something(prv_output),
	Component(B),
	Pipe(lambda prv_output, b: b.do_something_else(prv_output))
)

Another approach is to add attributes to the existing Node types themselves. This solves part 3. and partially solves 1. (at least by enforcing some signature) but is also likely not super adequate.

Sequential(
    Component(
		A,
		operation=lambda initial_data, a: a.do_something(initial_data),
	),
	Component(
		B,
		operation=lambda prev_output, b: b.do_something(prev_output)
	),
)

The text was updated successfully, but these errors were encountered:

eddiebergman added the feature A new feature label Dec 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Generic input/output pipeline components #199

[Feature] Generic input/output pipeline components #199

eddiebergman commented Dec 9, 2023 •

edited

Loading

[Feature] Generic input/output pipeline components #199

[Feature] Generic input/output pipeline components #199

Comments

eddiebergman commented Dec 9, 2023 • edited Loading

eddiebergman commented Dec 9, 2023 •

edited

Loading