Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redun caching logic for a master_task #98

Open
njbernstein opened this issue May 31, 2024 · 2 comments
Open

Redun caching logic for a master_task #98

njbernstein opened this issue May 31, 2024 · 2 comments

Comments

@njbernstein
Copy link

njbernstein commented May 31, 2024

Hi there,

I have a master_task which kicks off a bunch of subtasks for a scatter-gather.

                 task_a     task_b
master_task -> task_1 -> task_2
master_task -> task_3 -> task_4
master_task -> task_5 -> task_6

The master_task takes in the global parameters for the master task and all the configurable inputs for task_a and task_b, e.g. master_task(global_input)

global_input is a class made up of inputs to the subtasks, e.g. global_input.task_a_input_1 global_input.task_b_input_1

task_1, task_3, task_5 are task_a with different inputs.
task_2, task_4, task_6 are task_b with different inputs.

If we change the inputs for the master_task which are only given to task_b on a rerun we see that the whole task is rerun.

How can we have redun not evaluate caching at the master_task level but only on subtasks?

I think maybe turn off caching on the master task would work?

@ctk3b
Copy link
Member

ctk3b commented May 31, 2024

Hmm is it possible to share the task definitions with some concrete/dummy examples?

I'm wondering if the context feature may provide what you need: https://insitro.github.io/redun/config.html#context

@mattrasmus
Copy link
Collaborator

mattrasmus commented Jun 6, 2024

Hi @njbernstein thanks for posting this.

I think I understand your question. If master task is a just routing args from its inputs to its child tasks, then rerunning master_task() may not be a significant issue performance-wise. However, if master_task() does some heavy lifting itself or if there many layers of tasks calls until task_b (say in a more realistic pipeline), then you may be interested in a new feature we call Context. It works similar to React Context for routing config to deeply nested tasks without needing the pass the config through all the higher level tasks, which just increases the chance of unncessary reruns.

For more info, see the docs:
https://insitro.github.io/redun/config.html#context

You can also check out an example here:
https://github.com/insitro/redun/blob/fd9479d13a8d94274fd8e1def14f7d30db1f9572/examples/context/workflow.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants