Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Generic input/output pipeline components #199

Open
eddiebergman opened this issue Dec 9, 2023 · 0 comments
Open

[Feature] Generic input/output pipeline components #199

eddiebergman opened this issue Dec 9, 2023 · 0 comments
Labels
feature A new feature

Comments

@eddiebergman
Copy link
Contributor

eddiebergman commented Dec 9, 2023

It could be useful to consider pipelines where there is no concrete "thing" to build, piecing together all the components, as is done with the sklearn_builder. The reason this is not implemented is that for a given component, we have no idea how to pass something in to it and get something out of it, to pass to the next step. This is solved by sklearn where they assume a fit and transform/predict which lets them chain components together.

The current "workaround" is for users to define their own builder= to Node.build() which pieces together some object they would like, yet this is likely to complicated to expect of a new user for even a simple linear sequence of components.


For a more general pipeline structure, we would likely need our own custom Pipeline, along with a user specification of how to handle the input/output problem.

Some possible approaches for this problem:

  • Introduce a Pipe(lambda prv_output, thing: thing.get_output(prv_output)) that can sit between components, allowing a Pipeline to know how to funnel data through the Pipeline.
    1. This will however run into issues where you have things like a Split or Join in which the signature might get complicated.
    2. This also means most of the functionality of a Pipeline would live outside of the actual definition when considering more complex pipe definitions. (I wish Python allowed multi-line anonymous functions, grrrr)
    3. This will arbitrarily lengthen the Pipeline definition as a Pipe would have to necessarily fit between every component in the sequence.
    4. The operation might need extra context which is just not available in that scope and so we introduce another context passing issue, in which some context would also have to be passed through the pipe. Consider you need to make a choice of what method to call on the previous component based on a task type, it's not clear how this Pipe should be given the task type during the call through the Pipeline.
Sequential(
    Component(A),
	Pipe(lambda prv_output, a: a.do_something(prv_output),
	Component(B),
	Pipe(lambda prv_output, b: b.do_something_else(prv_output))
)
  • Another approach is to add attributes to the existing Node types themselves. This solves part 3. and partially solves 1. (at least by enforcing some signature) but is also likely not super adequate.
Sequential(
    Component(
		A,
		operation=lambda initial_data, a: a.do_something(initial_data),
	),
	Component(
		B,
		operation=lambda prev_output, b: b.do_something(prev_output)
	),
)
@eddiebergman eddiebergman added the feature A new feature label Dec 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature
Projects
None yet
Development

No branches or pull requests

1 participant