Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flytepropeller] Watch agent metadata service dynamically #5460

Merged
merged 41 commits into from
Jun 28, 2024

Conversation

Future-Outlier
Copy link
Member

@Future-Outlier Future-Outlier commented Jun 8, 2024

Tracking issue

#3936

Why are the changes needed?

Users don't need to configure supportedTaskTypes for agent service after this PR merged.
We will connect to the agent server (or agent pod) to retrieve agent metadata every 10 seconds.

The config map for agent-service will be like:

  agent-service:
    supportedTaskTypes:
      # - chatgpt
      # - sensor  
      # - spark
    #   - default_task
    #   - custom_task
    #   - sensor
    #   - airflow
    # By default, all the request will be sent to the default agent.
    defaultAgent:
      endpoint: "localhost:8000"
      insecure: true
      timeouts:
        CreateTask: 100s
        GetTask: 100s
      defaultTimeout: 100s

Architecture Diagram

image

For example, the lifecycle in the propeller with agent will be like below

  1. call the watchAgent function in webapi/agent/plugin.go
  2. send request to localhost:8000 and localhost:8001
  3. update agentRegistry // map[taskTypeName][taskTypeVersion] => Agent
  4. update agentService *core.AgentService
    // Propeller will use it to route to the correct plugin. In our case, it’s agent-service.
  5. wait for 10 seconds p.cfg.PollInterval.Duration, and go back to 1
agent-service:
    # By default, all the request will be sent to the default agent.
    defaultAgent:
      endpoint: "localhost:8000"
      insecure: true
      timeouts:
        CreateTask: 100s
        GetTask: 100s
      defaultTimeout: 100s
    agents:
      custom_agent:
        endpoint: "localhost:8001"
        insecure: true
        timeouts:
          ExecuteTaskSync: 300s
          GetTask: 100s
        defaultTimeout: 300s
    agentForTaskTypes:
      # It will override the default agent for custom_task, which means propeller will send the request to this agent.
      - chatgpt: custom_agent
      - airflow: custom_agent

What changes were proposed in this pull request?

TBD

How was this patch tested?

single binary

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

#3936

Docs link

pingsutw and others added 23 commits March 5, 2024 17:51
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Copy link

codecov bot commented Jun 8, 2024

Codecov Report

Attention: Patch coverage is 56.89655% with 25 lines in your changes missing coverage. Please review.

Project coverage is 60.97%. Comparing base (5cc7f58) to head (297d575).
Report is 120 commits behind head on master.

Files with missing lines Patch % Lines
...yteplugins/go/tasks/plugins/webapi/agent/plugin.go 46.66% 16 Missing ⚠️
...yteplugins/go/tasks/plugins/webapi/agent/client.go 64.28% 4 Missing and 1 partial ⚠️
...lytepropeller/pkg/controller/nodes/task/handler.go 33.33% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5460      +/-   ##
==========================================
- Coverage   60.99%   60.97%   -0.02%     
==========================================
  Files         794      794              
  Lines       51475    51488      +13     
==========================================
- Hits        31398    31397       -1     
- Misses      17185    17199      +14     
  Partials     2892     2892              
Flag Coverage Δ
unittests-datacatalog 69.31% <ø> (ø)
unittests-flyteadmin 58.69% <ø> (-0.05%) ⬇️
unittests-flytecopilot 17.79% <ø> (ø)
unittests-flytectl 68.03% <ø> (ø)
unittests-flyteidl 79.04% <ø> (ø)
unittests-flyteplugins 61.85% <59.61%> (-0.01%) ⬇️
unittests-flytepropeller 57.29% <33.33%> (-0.02%) ⬇️
unittests-flytestdlib 65.59% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Future-Outlier
Copy link
Member Author

@honnix sorry for the late reply and thank you for your patient.
After this PR is merged, the watcher is done, and you can build the image to update your propeller.

Thank you so much!

@honnix
Copy link
Member

honnix commented Jun 8, 2024

Thank you for working on the watcher feature! Could you please write some description to help me understand what this achieves? Thank you.

@Future-Outlier
Copy link
Member Author

Thank you for working on the watcher feature! Could you please write some description to help me understand what this achieves? Thank you.

No problem, I will do it, thank you!

pingsutw and others added 6 commits June 10, 2024 12:09
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier
Copy link
Member Author

Future-Outlier commented Jun 11, 2024

Hi, @honnix, can you take a look at this PR?
The implementation might change, but could you help review it to check for potential issues?
Thank you!

Signed-off-by: Future-Outlier <[email protected]>
@honnix
Copy link
Member

honnix commented Jun 11, 2024

@Future-Outlier I didn't look into details of the change, but overall it's a great feature to be able to refresh agent capability at runtime without restarting propeller.

@Future-Outlier
Copy link
Member Author

Hello, @EngHabu.
Can you take a look? Kevin and I will fix it ASAP if any changes are needed. Thank you!

@Future-Outlier Future-Outlier changed the title Watch agent metadata service dynamically [flytepropeller] Watch agent metadata service dynamically Jun 13, 2024
@Future-Outlier
Copy link
Member Author

Future-Outlier commented Jun 14, 2024

Hi, @honnix
Before we merge this PR, can you help test this one in your scenario?
I can help you build the image you need, here are the actionable steps.

  1. checkout this branch
  2. Use this Dockerfile to build a new image, replace the flytepropeller's deployment and restart the flytepropeller
    https://github.com/flyteorg/flyte/blob/master/Dockerfile.flytepropeller

@honnix
Copy link
Member

honnix commented Jun 26, 2024

Hi, @honnix Before we merge this PR, can you help test this one in your scenario? I can help you build the image you need, here are the actionable steps.

  1. checkout this branch
  2. Use this Dockerfile to build a new image, replace the flytepropeller's deployment and restart the flytepropeller
    master/Dockerfile.flytepropeller

Sorry for replying late. I was on vacation.

I unfortunately don't have capacity to take on this. I can ask around in the team.

@Future-Outlier
Copy link
Member Author

Hi, @honnix Before we merge this PR, can you help test this one in your scenario? I can help you build the image you need, here are the actionable steps.

  1. checkout this branch
  2. Use this Dockerfile to build a new image, replace the flytepropeller's deployment and restart the flytepropeller
    master/Dockerfile.flytepropeller

Sorry for replying late. I was on vacation.

I unfortunately don't have capacity to take on this. I can ask around in the team.

No problem, I will be on vacation too!
Have a nice vacation!

@pingsutw pingsutw merged commit 5b0d787 into master Jun 28, 2024
48 of 50 checks passed
@pingsutw pingsutw deleted the watch-agent-dynamically branch June 28, 2024 21:54
robert-ulbrich-mercedes-benz pushed a commit to robert-ulbrich-mercedes-benz/flyte that referenced this pull request Jul 2, 2024
)

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
vlibov pushed a commit to vlibov/flyte that referenced this pull request Aug 16, 2024
)

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Signed-off-by: Vladyslav Libov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants