Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Node Grouping in Kedro Deployment #4319

Open
DimedS opened this issue Nov 11, 2024 · 2 comments
Open

Improve Node Grouping in Kedro Deployment #4319

DimedS opened this issue Nov 11, 2024 · 2 comments
Assignees
Milestone

Comments

@DimedS
Copy link
Contributor

DimedS commented Nov 11, 2024

Overview

Part of #4317. Users have expressed the need to merge multiple Kedro nodes into a single task on deployment platforms for better clarity and efficiency. Current plugins offer limited support for this, often requiring manual grouping, which complicates deployment and reduces performance.

User Insights and Challenges

  • "Combining nodes into single tasks improves overview, but we currently have to manually group them in Databricks."
  • "We can convert a single node to a Kubeflow Component, but deploying 400 nodes as separate containers adds complexity."
  • "Running each Kedro node in a separate container could make a small node execute in one or two seconds, but Argo’s longer pod startup time would make this inefficient."

Problem Statement

How can we design a flexible and efficient node grouping mechanism - using tags, namespaces, pipelines, or other methods - to maximise usefulness for users and streamline the deployment process?

Proposed Solution

  • Centralised Grouping Functionality: Instead of developing node grouping features separately for each plugin, centralise this functionality within the Kedro framework. This approach would standardise and simplify node grouping, making it easier to implement and maintain across different deployment platforms.
@merelcht merelcht added this to the Deployment milestone Nov 11, 2024
@merelcht merelcht moved this to To Do in Kedro Framework Nov 11, 2024
@datajoely
Copy link
Contributor

This is the most important problem for me. It's also tightly coupled with dependency management - the minute we make it easier to isolate different parts of the pipeline to be run on different containers you get into dependency isolation questions.

@datajoely
Copy link
Contributor

Users today also tend towards tags because namespaces are a pain to use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

4 participants