Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding redundant data download in parallel plus a nice way to download data prior to launching an MPI job #206

Closed
glwagner opened this issue Oct 31, 2024 · 1 comment
Labels
data wrangling We must feed the models so they don't get cranky

Comments

@glwagner
Copy link
Member

If we try to run a simulation for the first time with mpiexec, I think we can end up redundantly downloading the same data from different MPI processes.

I'm not sure the best way to handle this --- we could only download from rank 0, or we can somehow distribute the downloading amongst processes (even cooler).

A related issue is that we don't have a simple way to download data in isolation before running a script (the data is all downloaded automatically by trying to run a simulation, which is very nice of course). But probably there are situations where a user may want to get the downloading out of the way in advance before launching a job and it would be nice to make that simple.

@glwagner glwagner added the data wrangling We must feed the models so they don't get cranky label Oct 31, 2024
@simone-silvestri
Copy link
Collaborator

simone-silvestri commented Nov 28, 2024

should be closed by #208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data wrangling We must feed the models so they don't get cranky
Projects
None yet
Development

No branches or pull requests

2 participants