Bring your own runtime (schema) #1672

kevin-bates · 2021-05-14T23:38:20Z

kevin-bates
May 14, 2021
Collaborator

While working on #1668 it became clear that we don't really have a good story for those developing their own pipeline processors associated with a runtime outside of our built-in runtimes ('kfp' and 'airflow'). Where the story breaks down is the location of where the runtime's schema will be located that corresponds to the new processor - since it's (essentially) the schema from which the pipeline processor implementation is discovered.

Issue

The issue with this approach is that all schema files are currently located in the elyra package installation area under metadata/schemas. As a result, third parties following the documented guidelines can successfully author and deploy their own pipeline processor implementations provided they also place their runtime's schema in the elyra package installation area under metadata/schemas. However, once the user upgrades elyra, there's a strong likelihood the third-party runtime schemas will be deleted, requiring a redeployment/setup of the third-party integration - which is not acceptable.

Although we have talked about a bring-your-own-schema model before, given our current set of namespaces ('runtimes', 'code-snippets', and 'runtime-images') I think it makes sense to only consider a bring-your-own-runtime capability for now. We can talk about adding other namespaces or extending our "factory" namespaces with additional schemas, but I believe we must first promote namespace to a first-class object (not just the string that it is today) in order to make bring-your-own-namespace viable.

Approach - sys.prefix/share/jupyter/metadata/runtimes/schemas

To accomplish additional runtime schemas, we can easily extend our SchemaManager to support looking in multiple locations. This would allow us to retain the factory location as we do today and add other locations.

I believe we just need one additional location and that location should be similar to where we already install factory runtime-image instances. Because metadata persistence already treats the sys.prefix hierarchy as read-only, there should be little to no disruption to allowing the addition of third-party schema files to that hierarchy. The schema files would be isolated from any kind of instance data (although there should be none at this time) because they would be placed in a schemas sub-directory under the namespace-named parent directory.

For example, if sys.prefix was /opt/anaconda/envs/elyra-dev and a user wanted to introduce a runtime for Flyte, they would add a schema file (e.g., flyte.json) to /opt/anaconda/envs/elyra-dev/share/jupyter/metadata/runtimes/schemas. Upon request to load schemas for a given namespace, the SchemaManager would collect schemas for the namespace, first from the "factory" location, then from <sys.prefix>/share/jupyter/metadata/<namespace>/schemas.

Given the presence of the flyte.json schema, runtime metadata instances can be created, containing values corresponding to the Flyte platform. In addition, a Flyte-aware runtime pipeline processor can already be registered via entry_points. Then when the Elyra pipeline service receives a pipeline payload indicating a runtime of flyte, it can discover and load the registered processor, and, also given the runtime configuration instance name, load the corresponding Flyte-specific metadata.

I don't think we should consider other file locations in the Jupyter hierarchy (like user HOME directories) for schemas since these kinds of files should absolutely span user configurations and should be sensitive to where the third-party pipeline processor implementations are installed (by virtue of the sys.prefix association).

Approach - entry-points `elyra.metadata.runtimes.schemas`

Another mechanism for discovering third-party schema files would be to leverage entry-points - which would be similar to how pipeline processor implementations are discovered. In this case, we could define a group name like elyra.metadata.runtimes.schemas but it's not clear to me how the entry_point load mechanisms would return JSON content indicating the schema other than defining a well-known method on the registered object (e.g., get_schema()). This would prevent third parties from having to install their schema files outside their package's installation location. I think it might be worth taking looking into this as it would result in a cleaner experience. (I suppose we could extend the runtime pipeline processor definition to include a method returning the corresponding schema since developers already register the processor via entry_points - but this isn't a very general approach outside of pipeline processors.) Also, this approach might break down in the face of custom namespaces, unless those too are discovered via entry_points (e.g., elyra.metadata.namespaces)

At any rate, this is a discussion item. Sorry, it sounds more like a proposal, but I figured we need to start with some kind of strawman because we should address this soon.

kevin-bates · 2021-05-16T16:53:29Z

kevin-bates
May 16, 2021
Collaborator Author

After spending some time on a file-based POC, I think it might be better to pursue an entry-points approach to supporting custom schemas, at least for runtimes. A file-based approach has the following issues.

The file-based approach gets into the weeds when it comes to persisting the files. Not so much on where to place the files but more on what to do if the files change. Do these changes need to be auto-detected? I think it would fine to require a restart to accommodate changes to schemas (as opposed to metadata instances).
If we wanted to follow the persistence model set forth by metadata instances, then the schemas should probably reside (i.e., in the same storage mechanism) where the metadata instances reside. However, persistence rules are different when it comes to schemas (there's no hierarchy to implement) so slightly different storage mechanisms are required.
The setuptools data_flies are being deprecated, so packages "installing" their schemas in that manner would need to change.

By adopting an entry_points approach, none of these are issues for us - making support tremendously easier.

Entry_points, by definition, would adhere to the sys.prefix requirement, since only those packages registered in the same Python env would be discovered.
The schema file remains local to the package, we just access the file there and cache it into the schema manager.
Relative to custom runtime authors, they already need a package to expose their runtime implementation, so exposing the schema from that same package (under a different entry_point) is actually easier for the package authors. As noted earlier, its tough to imagine a need for other custom schemas other than for runtimes, but those could be easily addressed via entry_points.
We get out of the business of having to use special one-off persistence mechanisms for schema files.
No "CRUD" endpoints or tools would be required. All custom schemas are discovered at application launch (and only at application launch).
Since schemas are required to name their namespace, a single entrpoint for all elyra schemas can be used and the loading of schemas by the SchemaManager associates them with the correct namespace.

Thoughts?

2 replies

duongnt Aug 26, 2021
Collaborator

@kevin-bates I just started looking at using elyra in production for my company. Thank you for your work on this!

On this topic, I agree that the entry-points based approach makes more sense. Is there any updates on this work, and is there anything I could help with? We are interested in writing a runtime for argo.

kevin-bates Aug 27, 2021
Collaborator Author

Hi @duongnt - thank you for your response and affirmation regarding the use of entrypoints. We have just opened #2075 and will be beginning work on converting our existing metadata service to an entrypoint model for both namespaces (termed schemaspaces) and schemas.

Folks implementing their own pipeline runtime support would then create a SchemasProvider that is loaded by the SchemaManager to introduce their runtime configuration schema to the Runtimes schemaspace. For those introducing their own runtime, this SchemasProvider can simply reside in the same package as their RuntimePipelineProcessor implementation, etc.

I believe we still need to get a better picture of what exactly is required to introduce your own runtime processor. This exercise will draw from our experiences with KFP and Airflow.

Your support and input will be valuable regarding adding Argo support. Areas where we envision runtime-specific behaviors are in the RuntimePipelineProcessor and ComponentParser hierarchies. It would be helpful if you could familiarize yourself with how those hierarchies are used relative to KFP and Airflow, then, perhaps, extrapolate what kinds of things would need to occur to implement similar in Argo. For example, Argo may have the same parser requirements as KFP since it is used in KFP, but it may also differ. As a result, one could conceivably envision the Argo parser deriving from KfpComponentParser.

Keep in mind though that we may choose to rework some classes based on these discussions to make the user experience as clean as possible.

I hope that helps.

kevin-bates · 2021-08-27T22:26:31Z

kevin-bates
Aug 27, 2021
Collaborator Author

I went ahead and perused the code to determine what kinds of things are required to bring a runtime to Elyra. Here's what I came up with.

RuntmePipelineProcessor
A subclass of RuntmePipelineProcessor that is registered as an entrypoint under the group "elyra.pipeline.processors".
PipelineProcessorResponse
A subclass of PipelineProcessorResponse. An instance of this class is returned from the process() method of the processor. (We can probably remove this as a requirement and make PipelineProcessorResponse a concrete class.)
SchemasProvider
A subclass of SchemasProvider (which is pending in this issue) that is registered as an entrypoint under the group "metadata.schemas" and its schema(s) are associated with the Runtimes schemaspace. This will define the parameters necessary to communicate with the runtime platform.
ComponentParser
A subclass of ComponentParser that understands how to parse contextual component definitions specific to the runtime platform into Component and ComponentParameter instances.

Additional runtimes will also need to configure a ComponentRegistry instance via metadata and the schema defined in this issue.

2 replies

lresende Aug 28, 2021
Maintainer

Also, if you build a pipeline today and export as yaml file, you should be able to get an Argo yaml. Just mentioning in case this helps you in any way.

Fyi: @duongnt

duongnt Aug 28, 2021
Collaborator

Also, if you build a pipeline today and export as yaml file, you should be able to get an Argo yaml. Just mentioning in case this helps you in any way.

Fyi: @duongnt

Thanks @lresende, I noticed that as well. The generated yaml is pretty opinionated as expected, but I suppose we can have much more control on how it's generated if we can bring our own argo runtime.

ptitzler · 2021-08-30T05:39:33Z

ptitzler
Aug 30, 2021
Maintainer

Taking a step back for a minute, what would core requirements be that a byor would have to meet?
Generally speaking, existing runtime environments currently support

generic components (enabling execution of Jupyter notebooks, Python scripts, and R scripts)
custom [runtime specific] components

If support for generic components is viewed as an optional feature it would have implications on the VPE and raise the need for the means to define the byor capabilities in the schema.

2 replies

kevin-bates Aug 30, 2021
Collaborator Author

If support for generic components is viewed as an optional feature it would have implications on the VPE and raise the need for the means to define the byor capabilities in the schema.

This sounds correct. I don't think we can assume the generic components are required.

To make sure I understand: I'm assuming the "schema" you're referring to is the runtime configuration schema and I suppose we'd minimally need a boolean stating whether runtime configuration instances support the generic components or not. The VPE would use this boolean to disable the palette entries for generic components when such a "dedicated" pipeline is created. Is that what you're getting at?

ptitzler Aug 30, 2021
Maintainer

Yes!

duongnt · 2021-08-31T21:41:15Z

duongnt
Aug 31, 2021
Collaborator

@kevin-bates I think we also need to make the frontend more configurable. Right now looks like I'd need to change the frontend code just to add a new icon for the editor in the launcher, like this:

export const getRuntimeIcon = (runtime?: string): LabIcon => {
  const runtimeIcons = [kubeflowIcon, airflowIcon];
  for (const runtimeIcon of runtimeIcons) {
    if (`elyra:${runtime}` === runtimeIcon.name) {
      return runtimeIcon;
    }
  }
  return pipelineIcon;
};

The icons are still hardcoded now.

6 replies

duongnt Aug 31, 2021
Collaborator

Yes, I think the frontend should be decoupled - it should not have any direct reference to kubeflow or airflow, or assumption of their existence. We could have another endpoint to serve the icons from the backend.

We can also refactor elyra by introducing packages such as elyra-kubeflow and elyra-airflow, which should serve as model implementations for people who want to bring their own runtimes.

ajbozarth Aug 31, 2021
Collaborator

It would be possible with some work to get the icon from the schema instead, it is how kernels do it in jupyterlab core, I was not a fan of it initially because it means that a schema creator can put a low quality image in, as seen in many kernels. But that path forward does look like it will make the most sense once this feature is ready

kevin-bates Aug 31, 2021
Collaborator Author

@ajbozarth - what size are the icons we use for the runtimes today or where are they persisted in GitHub?

@duongnt - I agree with your response. Where we've struggled in the past is maintaining separate packages like elyra-airflow, but I suspect that's because they were in separate repositories with separate release cycles. I wonder if having multiple packages within a repository would be more easily maintained? Do you have experience with multi-package repositories? I don't think we (as a team right now) have much experience in this area and it would be good to hear options.

I completely agree that separation like that would serve as ideal model implementations.

duongnt Sep 1, 2021
Collaborator

I don't have experience with multi-package repos, but I guess it could be done.

Pros and cons of keeping packages in separate repos:

[+] New packages from the community (e.g. elyra-argo) are likely to be and should be in separate repos anyway?
[+] Easy to release each package.
[-] potential versioning hell (e.g. which versions of elyra-airflow is compatible with which versions of elyra). Will we need to maintain a compatibility matrix? (ugh)
[-] Some new features might require changing two or more repos. This could be hard to track and release.

This sounds like the "monorepo or not monorepo" debate. I can make an argument either way and both could be valid, but personally I prefer polyrepos for python (i.e. one package per repo).

kevin-bates Sep 1, 2021
Collaborator Author

This sounds like the "monorepo or not monorepo" debate.

Yep - exactly.

New packages from the community (e.g. elyra-argo) are likely to be and should be in separate repos anyway?

Yes, that is the plan.

Some new features might require changing two or more repos. This could be hard to track and release.

This has been a pain point for us wrt to kfp-notebook and airflow-notebook.

kevin-bates · 2022-01-31T18:26:15Z

kevin-bates
Jan 31, 2022
Collaborator Author

Hi @duongnt - we've been curious how the Argo runtime efforts are going and what further refinements we can make to the BYO Runtime stuff. Thanks.

0 replies

kiersten-stokes · 2022-11-04T17:01:28Z

kiersten-stokes
Nov 4, 2022
Collaborator

Now that a few separate (offline) attempts have been made at bringing a runtime, the necessary implementation points noted in this comment have been confirmed. After some discussion, we've determined some additional considerations that could improve the process. This is to include (with a rough estimation of priority):

(prio 1) pluggable per-processor pipeline validation
(prio 1) method to make the static folder (which hosts the runtime icons) extensible (see DEFAULT_STATIC_FILES_PATH in elyra_app.py)
(prio 1) custom pipeline submission response content (provided by the RuntimePipelineProcessorResponse subclass)
(prio 2) method to disable unnecessary packages with conflicting dependencies
(prio 2) simpler way to add new runtime type to the component-catalogs schemas
...

The immediate next step would be to expand on the discussion started by this comment, which starts to enumerate a potential list of capabilities that each runtime processor may or may not support. Determining a full list of capabilities will inform future design discussions.

support for generic components
support for custom components
support for existing and new Elyra-owned properties (e.g. environment variables)
- in generic components
- in custom components
support for pipeline parameters
support for pipeline export
support for pipeline submit
support for comments
support for component catalog connectors (i.e. does a processor support url-based catalogs, file-based catalogs, and so on)

Some difficulty may arise here in distinguishing between capabilities of a runtime type vs. those of a runtime processor (a relationship that can theoretically be 1:many right now). For example, the component and components properties REST API endpoints are queried using the runtime type, whereas support for generic components may be determined by the runtime processor.

@lresende @kevin-bates @ajbozarth feel free to edit my comment or add additional points below based on our discussion yesterday

1 reply

kevin-bates Nov 4, 2022
Collaborator Author

Yes. I was recalling that RuntimeType represents the "platform" and is 1:many with RuntimeProcessor implementations.

While there are some "capabilities" implied in the RuntimeType (like export suffixes), I think those need to be moved to the implementation. This should only specify its type, display name, and perhaps that type's icon.

I also recall the reason new RuntimeType entries are contributed was because there can be multiple implementations of a given Type. From our discussion, it seems like the decision on whether a new RuntimeType is required is predicated on whether existing Component Catalogs (and their Custom Components) can be used by a given Runtime Processor implementation. If so, that implementation would leverage the Runtime Type corresponding to its components.

Another thing that should be discussed is whether or not multiple implementations of the same RuntimeType can be supported at the same time. I'm not sure this is possible.

Assuming we can only support one RuntimeProcessor implementation of a given RuntimeType, and because users have no choice but to install our OOTB entrypoints for Airflow and KFP processor implementations, then I think RuntimeProcessor implementations need a configurable boolean trait that allows them to be disabled via configuration, so users could "turn off" our Airflow and/or Kfp processor implementations.

(Since LocalProcessor is also an entrypoint, it too should be configurable, although I suspect this would be yet another area where we run into issues with "Local" not being its own runtime.)

kiersten-stokes · 2022-12-02T18:20:53Z

kiersten-stokes
Dec 2, 2022
Collaborator

After another offline discussion about the BYO runtime processor concept, we've identified some must-do's and next steps.

In order to BYO processor implementation today, there are several places that various items must be added (e.g. adding a schema and schema provider, adding an entrypoint for the processor, adding catalog types, etc.). These items are them separately discovered by Elyra via various mechanisms. This creates a disjoint feel and necessitates some hardcoding at times. In an ideal situation, we would rather have an inversion of this control. This would look something like injecting a processor into the registry, and this processor takes care of providing the schemas, capabilities toggling (see above comment), etc. into the right places.

Related to the above, we've also determined that it would be best to promote the 'local' execution mode into its own
RuntimePipelineProcessor implementation. Note: this should not, however, result in the over-complication of the implementation of the local runtime: it should support the same functionality that it does currently (and a docker-based implementation could be an additional runtime on top of that even)

The above would also put us on the path for it to be possible to disable local execution. As raised in the comment thread above, there should be a mechanism to disable any processor implementation. We may decide later that each implementation should be made available as a separate package, but this is likely to occur as a secondary step after adding more binary support (toggle a processor implementation on/off).

Assuming we can only support one RuntimeProcessor implementation of a given RuntimeType, and because users have no choice but to install our OOTB entrypoints for Airflow and KFP processor implementations, then I think RuntimeProcessor implementations need a configurable boolean trait that allows them to be disabled via configuration, so users could "turn off" our Airflow and/or Kfp processor implementations.

(Since LocalProcessor is also an entrypoint, it too should be configurable, although I suspect this would be yet another area where we run into issues with "Local" not being its own runtime.)

2 replies

kevin-bates Dec 2, 2022
Collaborator Author

Thank you for summarizing this @kiersten-stokes!

This would look something like injecting a processor into the registry, and this processor takes care of providing the schemas, capabilities toggling (see above comment), etc. into the right places.

So it sounds like we agree the "pieces" are correct, just that we'd like to make their collection easier. Ie., it sounds like the "registry mechanism" would essentially act as a "driver" (or "herder" 😄) to collect and "deploy" the pieces into the system. Is that a correct statement?

kiersten-stokes Dec 2, 2022
Collaborator

Exactly! Assuming my interpretation of the discussion was accurate 😄

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring your own runtime (schema) #1672

{{title}}

Replies: 7 comments 15 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Bring your own runtime (schema) #1672

kevin-bates May 14, 2021 Collaborator

Issue

Approach - sys.prefix/share/jupyter/metadata/runtimes/schemas

Approach - entry-points elyra.metadata.runtimes.schemas

Replies: 7 comments · 15 replies

kevin-bates May 16, 2021 Collaborator Author

duongnt Aug 26, 2021 Collaborator

kevin-bates Aug 27, 2021 Collaborator Author

kevin-bates Aug 27, 2021 Collaborator Author

lresende Aug 28, 2021 Maintainer

duongnt Aug 28, 2021 Collaborator

ptitzler Aug 30, 2021 Maintainer

kevin-bates Aug 30, 2021 Collaborator Author

ptitzler Aug 30, 2021 Maintainer

duongnt Aug 31, 2021 Collaborator

duongnt Aug 31, 2021 Collaborator

ajbozarth Aug 31, 2021 Collaborator

kevin-bates Aug 31, 2021 Collaborator Author

duongnt Sep 1, 2021 Collaborator

kevin-bates Sep 1, 2021 Collaborator Author

kevin-bates Jan 31, 2022 Collaborator Author

kiersten-stokes Nov 4, 2022 Collaborator

kevin-bates Nov 4, 2022 Collaborator Author

kiersten-stokes Dec 2, 2022 Collaborator

kevin-bates Dec 2, 2022 Collaborator Author

kiersten-stokes Dec 2, 2022 Collaborator

kevin-bates
May 14, 2021
Collaborator

Approach - entry-points `elyra.metadata.runtimes.schemas`

Replies: 7 comments 15 replies

kevin-bates
May 16, 2021
Collaborator Author

duongnt Aug 26, 2021
Collaborator

kevin-bates Aug 27, 2021
Collaborator Author

kevin-bates
Aug 27, 2021
Collaborator Author

lresende Aug 28, 2021
Maintainer

duongnt Aug 28, 2021
Collaborator

ptitzler
Aug 30, 2021
Maintainer

kevin-bates Aug 30, 2021
Collaborator Author

ptitzler Aug 30, 2021
Maintainer

duongnt
Aug 31, 2021
Collaborator

duongnt Aug 31, 2021
Collaborator

ajbozarth Aug 31, 2021
Collaborator

kevin-bates Aug 31, 2021
Collaborator Author

duongnt Sep 1, 2021
Collaborator

kevin-bates Sep 1, 2021
Collaborator Author

kevin-bates
Jan 31, 2022
Collaborator Author

kiersten-stokes
Nov 4, 2022
Collaborator

kevin-bates Nov 4, 2022
Collaborator Author

kiersten-stokes
Dec 2, 2022
Collaborator

kevin-bates Dec 2, 2022
Collaborator Author

kiersten-stokes Dec 2, 2022
Collaborator