You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It could be useful to consider pipelines where there is no concrete "thing" to build, piecing together all the components, as is done with the sklearn_builder. The reason this is not implemented is that for a given component, we have no idea how to pass something in to it and get something out of it, to pass to the next step. This is solved by sklearn where they assume a fit and transform/predict which lets them chain components together.
The current "workaround" is for users to define their own builder= to Node.build() which pieces together some object they would like, yet this is likely to complicated to expect of a new user for even a simple linear sequence of components.
For a more general pipeline structure, we would likely need our own custom Pipeline, along with a user specification of how to handle the input/output problem.
Some possible approaches for this problem:
Introduce a Pipe(lambda prv_output, thing: thing.get_output(prv_output)) that can sit between components, allowing a Pipeline to know how to funnel data through the Pipeline.
This will however run into issues where you have things like a Split or Join in which the signature might get complicated.
This also means most of the functionality of a Pipeline would live outside of the actual definition when considering more complex pipe definitions. (I wish Python allowed multi-line anonymous functions, grrrr)
This will arbitrarily lengthen the Pipeline definition as a Pipe would have to necessarily fit between every component in the sequence.
The operation might need extra context which is just not available in that scope and so we introduce another context passing issue, in which some context would also have to be passed through the pipe. Consider you need to make a choice of what method to call on the previous component based on a task type, it's not clear how this Pipe should be given the task type during the call through the Pipeline.
Another approach is to add attributes to the existing Node types themselves. This solves part 3. and partially solves 1. (at least by enforcing some signature) but is also likely not super adequate.
It could be useful to consider pipelines where there is no concrete "thing" to build, piecing together all the components, as is done with the
sklearn_builder
. The reason this is not implemented is that for a given component, we have no idea how to pass something in to it and get something out of it, to pass to the next step. This is solved by sklearn where they assume afit
andtransform
/predict
which lets them chain components together.The current "workaround" is for users to define their own
builder=
toNode.build()
which pieces together some object they would like, yet this is likely to complicated to expect of a new user for even a simple linear sequence of components.For a more general pipeline structure, we would likely need our own custom
Pipeline
, along with a user specification of how to handle the input/output problem.Some possible approaches for this problem:
Pipe(lambda prv_output, thing: thing.get_output(prv_output))
that can sit between components, allowing aPipeline
to know how to funnel data through thePipeline
.Split
orJoin
in which the signature might get complicated.Pipeline
would live outside of the actual definition when considering more complex pipe definitions. (I wish Python allowed multi-line anonymous functions, grrrr)Pipeline
definition as aPipe
would have to necessarily fit between every component in the sequence.Pipe
should be given the task type during the call through thePipeline
.Node
types themselves. This solves part 3. and partially solves 1. (at least by enforcing some signature) but is also likely not super adequate.The text was updated successfully, but these errors were encountered: