Monte carlo simulations (non-ML use) and looping pipelines #2958

quantumtrope · 2023-08-22T13:32:29Z

quantumtrope
Aug 22, 2023

I have a question about using Kedro in a non-ML setting. Specifically, I am trying to use Kedro in a data analysis (no learning) and statistical modeling/simulation pipeline. A simplified view of the use case is something like:

Set input parameters
Run 10,000 instances of a monte-carlo simulation
Calculate statistics on that run
Save data

So far so good: Kedro defines these operations really nicely and keeps things tidy, along with visualizations and saving data for experiments. (keep in mind that in reality, steps 2 and 3 are probably 6-8 nodes long split across two or three pipelines in Kedro).

Now, the problem is that I need to explore a large space of input parameters. Like sweep an input parameter in 100 steps of log space from 1e-6 to 1e-4. So the (simplified) workflow now becomes:

For input param1 in logspace(1e-6,1e-4,100):
Set input param
Run 10,000 instances
Calculate statistics
Save data
Go-to 1 until done

I know Kedro wasn't built for this, but I want to highlight that the Kedro way is very amenable to general statistical modeling and simulation efforts that don't include ML.

My question is: what's the "Kedro canonical" way to do that? From initial attempts I can see one of two options:

Instantiate 100 modular pipelines programmatically, then run through all of them in some way (ideally with a parallel runner).
Write my own for loop with threading, and pass a changed contex, pipeline, or catalog to a SerialRunner within each thread/for loop.

(keep in mind this is also a simplified example, I probably have two or three variables that I want to loop over in similar ways, upping the amount of total pipelines to run to something like 10000+).

noklam · 2023-08-22T13:40:12Z

noklam
Aug 22, 2023
Collaborator

Thanks for the questions! This is very interesting use of Kedro. Any chance you can jump on our slack.kedro.org to ask the question to see if the community have an answer for this?

3 replies

quantumtrope Aug 22, 2023
Author

I would love to! all the links I clicked on for slack channels here and here say the joining links are expired...

astrojuanlu Aug 22, 2023
Maintainer

Hey @quantumtrope, sorry about that - turns out all those invites had expired. I have fixed them for future reference, can you try https://slack.kedro.org ?

quantumtrope Aug 22, 2023
Author

Ok, I'm into slack, I'll get my question up

astrojuanlu · 2023-08-22T17:15:20Z

astrojuanlu
Aug 22, 2023
Maintainer

Hi @quantumtrope, this is a good question and one for which there's no "Kedro canonical" answer. It's a frequently requested feature, you have more context here #1606

In summary, either of the two options you proposed (modular pipelines or custom for loop) should work. Let us know which one you pick in the end.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monte carlo simulations (non-ML use) and looping pipelines #2958

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Monte carlo simulations (non-ML use) and looping pipelines #2958

quantumtrope Aug 22, 2023

Replies: 2 comments · 3 replies

noklam Aug 22, 2023 Collaborator

quantumtrope Aug 22, 2023 Author

astrojuanlu Aug 22, 2023 Maintainer

quantumtrope Aug 22, 2023 Author

astrojuanlu Aug 22, 2023 Maintainer

quantumtrope
Aug 22, 2023

Replies: 2 comments 3 replies

noklam
Aug 22, 2023
Collaborator

quantumtrope Aug 22, 2023
Author

astrojuanlu Aug 22, 2023
Maintainer

quantumtrope Aug 22, 2023
Author

astrojuanlu
Aug 22, 2023
Maintainer