Pipeline Parametrization Merged into Main Codebase #783
yoonspark
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We are happy to share that our work on pipeline parametrization has been merged into the
main
codebase! It is yet to be included in our next release, but you can give it a quick try now if you want:Overview
Oftentimes, data scientists/engineers need to run the same pipeline with different parameters. For instance, they may want to use a different data set for model training and/or prediction. To produce a parametrized pipeline, we can use pipeline API's (optional)
input_parameters
argument.As a concrete example, consider the following development code:
Now, if we simply run
we get an "inflexible" pipeline where data sources are fixed rather than tunable:
Instead, we can run
to get a parametrized pipline, like so:
As shown, we now have
url1
andurl2
factored out as easily tunable parameters of the pipeline, which allows us to run it with various data sources beyond those we started with (hence increasing the pipeline's utility).Limitations
Currently,
input_parameters
only accepts variables from literal assignment such asa = "123"
. For each variable to be parametrized, there should be only one literal assignment across all artifact code for the pipeline. For instance, if botha = "123"
anda = "abc"
exist in the pipeline's artifact code, we cannot makea
an input parameter since its reference is ambiguous, i.e., we are not sure which literal assignmenta
refers to.Reference(s)
Related PRs include (listing the latest first):
Beta Was this translation helpful? Give feedback.
All reactions