Bring your own runtime (schema) #1672
Replies: 7 comments 15 replies
-
After spending some time on a file-based POC, I think it might be better to pursue an entry-points approach to supporting custom schemas, at least for runtimes. A file-based approach has the following issues.
By adopting an entry_points approach, none of these are issues for us - making support tremendously easier.
Thoughts? |
Beta Was this translation helpful? Give feedback.
-
I went ahead and perused the code to determine what kinds of things are required to bring a runtime to Elyra. Here's what I came up with.
Additional runtimes will also need to configure a |
Beta Was this translation helpful? Give feedback.
-
Taking a step back for a minute, what would core requirements be that a byor would have to meet?
If support for generic components is viewed as an optional feature it would have implications on the VPE and raise the need for the means to define the byor capabilities in the schema. |
Beta Was this translation helpful? Give feedback.
-
@kevin-bates I think we also need to make the frontend more configurable. Right now looks like I'd need to change the frontend code just to add a new icon for the editor in the launcher, like this:
The icons are still hardcoded now. |
Beta Was this translation helpful? Give feedback.
-
Hi @duongnt - we've been curious how the Argo runtime efforts are going and what further refinements we can make to the BYO Runtime stuff. Thanks. |
Beta Was this translation helpful? Give feedback.
-
Now that a few separate (offline) attempts have been made at bringing a runtime, the necessary implementation points noted in this comment have been confirmed. After some discussion, we've determined some additional considerations that could improve the process. This is to include (with a rough estimation of priority):
The immediate next step would be to expand on the discussion started by this comment, which starts to enumerate a potential list of capabilities that each runtime processor may or may not support. Determining a full list of capabilities will inform future design discussions.
Some difficulty may arise here in distinguishing between capabilities of a runtime type vs. those of a runtime processor (a relationship that can theoretically be 1:many right now). For example, the component and components properties REST API endpoints are queried using the runtime type, whereas support for generic components may be determined by the runtime processor. @lresende @kevin-bates @ajbozarth feel free to edit my comment or add additional points below based on our discussion yesterday |
Beta Was this translation helpful? Give feedback.
-
After another offline discussion about the BYO runtime processor concept, we've identified some must-do's and next steps. In order to BYO processor implementation today, there are several places that various items must be added (e.g. adding a schema and schema provider, adding an entrypoint for the processor, adding catalog types, etc.). These items are them separately discovered by Elyra via various mechanisms. This creates a disjoint feel and necessitates some hardcoding at times. In an ideal situation, we would rather have an inversion of this control. This would look something like injecting a processor into the registry, and this processor takes care of providing the schemas, capabilities toggling (see above comment), etc. into the right places. Related to the above, we've also determined that it would be best to promote the 'local' execution mode into its own The above would also put us on the path for it to be possible to disable local execution. As raised in the comment thread above, there should be a mechanism to disable any processor implementation. We may decide later that each implementation should be made available as a separate package, but this is likely to occur as a secondary step after adding more binary support (toggle a processor implementation on/off).
|
Beta Was this translation helpful? Give feedback.
-
While working on #1668 it became clear that we don't really have a good story for those developing their own pipeline processors associated with a runtime outside of our built-in runtimes ('kfp' and 'airflow'). Where the story breaks down is the location of where the runtime's schema will be located that corresponds to the new processor - since it's (essentially) the schema from which the pipeline processor implementation is discovered.
Issue
The issue with this approach is that all schema files are currently located in the elyra package installation area under
metadata/schemas
. As a result, third parties following the documented guidelines can successfully author and deploy their own pipeline processor implementations provided they also place their runtime's schema in the elyra package installation area undermetadata/schemas
. However, once the user upgrades elyra, there's a strong likelihood the third-party runtime schemas will be deleted, requiring a redeployment/setup of the third-party integration - which is not acceptable.Although we have talked about a bring-your-own-schema model before, given our current set of namespaces ('runtimes', 'code-snippets', and 'runtime-images') I think it makes sense to only consider a bring-your-own-runtime capability for now. We can talk about adding other namespaces or extending our "factory" namespaces with additional schemas, but I believe we must first promote namespace to a first-class object (not just the string that it is today) in order to make bring-your-own-namespace viable.
Approach - sys.prefix/share/jupyter/metadata/runtimes/schemas
To accomplish additional runtime schemas, we can easily extend our
SchemaManager
to support looking in multiple locations. This would allow us to retain the factory location as we do today and add other locations.I believe we just need one additional location and that location should be similar to where we already install factory runtime-image instances. Because metadata persistence already treats the
sys.prefix
hierarchy as read-only, there should be little to no disruption to allowing the addition of third-party schema files to that hierarchy. The schema files would be isolated from any kind of instance data (although there should be none at this time) because they would be placed in aschemas
sub-directory under the namespace-named parent directory.For example, if
sys.prefix
was/opt/anaconda/envs/elyra-dev
and a user wanted to introduce a runtime for Flyte, they would add a schema file (e.g.,flyte.json
) to/opt/anaconda/envs/elyra-dev/share/jupyter/metadata/runtimes/schemas
. Upon request to load schemas for a given namespace, theSchemaManager
would collect schemas for the namespace, first from the "factory" location, then from<sys.prefix>/share/jupyter/metadata/<namespace>/schemas
.Given the presence of the
flyte.json
schema, runtime metadata instances can be created, containing values corresponding to the Flyte platform. In addition, a Flyte-aware runtime pipeline processor can already be registered via entry_points. Then when the Elyra pipeline service receives a pipeline payload indicating a runtime offlyte
, it can discover and load the registered processor, and, also given the runtime configuration instance name, load the corresponding Flyte-specific metadata.I don't think we should consider other file locations in the Jupyter hierarchy (like user HOME directories) for schemas since these kinds of files should absolutely span user configurations and should be sensitive to where the third-party pipeline processor implementations are installed (by virtue of the
sys.prefix
association).Approach - entry-points
elyra.metadata.runtimes.schemas
Another mechanism for discovering third-party schema files would be to leverage entry-points - which would be similar to how pipeline processor implementations are discovered. In this case, we could define a group name like
elyra.metadata.runtimes.schemas
but it's not clear to me how theentry_point
load mechanisms would return JSON content indicating the schema other than defining a well-known method on the registered object (e.g.,get_schema()
). This would prevent third parties from having to install their schema files outside their package's installation location. I think it might be worth taking looking into this as it would result in a cleaner experience. (I suppose we could extend the runtime pipeline processor definition to include a method returning the corresponding schema since developers already register the processor via entry_points - but this isn't a very general approach outside of pipeline processors.) Also, this approach might break down in the face of custom namespaces, unless those too are discovered viaentry_points
(e.g.,elyra.metadata.namespaces
)At any rate, this is a discussion item. Sorry, it sounds more like a proposal, but I figured we need to start with some kind of strawman because we should address this soon.
Beta Was this translation helpful? Give feedback.
All reactions