-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data management #87
Comments
Hi @berceanu. In my mind, the libE history array is quite flexible as long as you know the size of the entry for each simulation output. If the collective outputs from the simulations is too large to consider storing in one place, I would recommend saving each output to a file (e.g., named with a UUID) and then putting the name of the file in the history array. Is there some additional information that is desired in the libE history array that isn't currently available? |
Hi @jmlarson1. Well, usually each PIC simulation code outputs hundreds of The point is also that sometimes in long-running simulations you don't always know a priori all the things you want to analyze, so it can be useful the keep the data around. |
Hi @berceanu. I certainly understand that. We've tried to design libEnsemble (and optimas) to allow for all simulation-output to be stored in a (relatively) easy to retrieve manner. For example, libEnsemble can be configured so that every simulation occurs in its own directory (with a name that includes the Would that work for you? |
I see. How about the workflow management part, for instance, if one wants to join together two different codes, and use the output of one code as input for another, but only launching the second code once the first one finished successfully? |
Hi @berceanu, as mentioned by @jmlarson1 each simulation carried out in optimas generates it's own directory. All the output (e.g., the hdf5 diagnostics) will be stored there. You will typically define a function that analyzes this output to determine the value of your optimization objectives or any other diagnostics you are interested in. You can check an example on how to do this with FBPIC here. Regarding your last point, are you running a study where each simulation needs to run two codes? (e.g., an FBPIC simulation, followed by another with ASTRA, Geant4, etc where you use the beam produced by FBPIC?) If so, what you would could do right now is create a It's not an ideal solution, as by doing this there is no obvious way of running each simulation with different resources (e.g., MPI processes). It's a good point that we could consider for the future. Basically being able to have several "steps" per evaluation, each being a different simulation that might require different resources. |
A follow-on question for @berceanu . Are the resource requirements for both of the two codes large but different? If one is "very cheap", then there should be no need to worry about resources. But if both codes need considerable compute resources, but one needs many CPUs (but no GPUs) and the other is the opposite, then it's certainly an interesting use case to consider supporting. |
@AngelFP and @jmlarson1, thank you both for the follow-up. Indeed, the scenario I have in mind consists of joining together a GPU-intensive PIC code, followed by a CPU-intensive radiation code, so the resource allocation would look quite different for the two. |
Ok, sounds good. I think this is a feature we should implement. I'll have a look at how this could be done, but I can't promise any ETA at the moment. |
@berceanu We've had a few discussions of this and many questions were raised
|
@jmlarson1 Probably all we would need is a |
Perhaps it would help if we make this a bit more concrete by picking some specific scenarios. |
Any feedback on this? |
I'm certainly interested in helping support this use case. Do you have an example script (or scripts) that performs these three calls?
|
Same here, I think we should support this type of workflow. This could probably be implemented by allowing the user to create several |
From the user side, we could support this workflow by allowing something like this: # Create one evaluator per simulation code
ev_fbpic = TemplateEvaluator(
sim_template='template_simulation_fbpic.py',
analysis_func=analyze_simulation_fbpic,
n_gpus=2
)
ev_g4 = TemplateEvaluator(
sim_template='template_simulation_geant4.txt',
analysis_func=analyze_simulation_g4,
executable='geant4',
n_procs=40
)
# Create exploration.
exp = Exploration(
generator=gen,
evaluator=[ev_fbpic, ev_g4],
max_evals=100,
sim_workers=4
) The user would pass a list of evaluators to the exploration in the order in which they should run. Each evaluator can have its own resources and analysis function (e.g., in this example Any comments/ideas on this? |
This seems reasonable to me. The underlying libE executor should be able to handle this. |
Hi @AngelFP that is great, can't wait to test it out! It might take a while though, since I am currently on holiday. |
Let us know if you find any issues once you get to try it. Enjoy the holidays! |
By the way, speaking of combining multiple codes such as PIC + particle tracking, I recently wrote an |
Thanks for sharing, that could indeed be useful in some cases. Did the solution with the |
I will report back as soon as I try it out, didn't get the chance yet. |
Hi @berceanu, any updates on this? I've been using the |
Not sure if this issue belongs here or in the upstream
libEnsemble
.When doing manual parameter scans in the past, I encountered the need for a data management framework, in order to store, organize and later post-process the output
openPMD
files. This led to me couple thefbpic
code with thesignac
framework, as shown here. Furthermore, the use ofsignac-flow
allows one to couple multiple codes, like using the PIC output as input forGeant4
etc.My question is, is this something you plan to have in the scope of
optimas
? My understanding is thatlibEnsemble
's history array is much more limited in this regard.The text was updated successfully, but these errors were encountered: