-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow for downloading input forcing files without GPU node internet access #344
Comments
The files go to As you say, I think the way to run these cases is to initiate the simulation on CPU, but at a very coarse resolution and only running for a short amount of time. Do you think that is an acceptable workflow? Perhaps, rather than using a simulation we could develop a utility that's something like |
The issue isn't file access - the GPU nodes are able to access the depot path. The issue is that any attempt to download data (using wget, curl etc.) doesn't work because the GPU nodes don't have internet access. Yes, the ideal workflow would be to create a standalone function that can be run on the login node (for example, I have written a bash script with wget that downloads the JRA, ECCO and Bathymetry files into the necessary folders directly from the login node). I may be overblowing the issue, but I think CPU/GPU nodes in many HPCs don't have internet access. So integrating a simple download script that can be run from the login node prior to run!(simulation) would help those running on such HPC environments. Hope that makes sense! |
Why do you have to change the folders that the data is downloaded into? |
I don't change the folders the data is downloaded into. I just manually download the data into the folders that code would ordinarily download the data into (if it had internet access). |
If I understand correctly @taimoorsohail is wrote a bash script to do what the proposed method |
Actually minor note: the CPU nodes on HPCs often don’t have internet access either. So the issue is not GPU specific. |
If you run the same script on the login node, using CPU architecture (and coarse resolution and say changing the stop_iteration=1), does it achieve the desired effect? |
Also it'd be great to see the bash script!! I am confused why bash is better than julia, but I might be missing something. Possibly, if there is a bash script then we can simply translate the same commands into julia. It may also be possible to hand a function all of the metadata / other objects that may be associated with data. The challenge I think is that the data is not explicitly tied to the model. For example, we provide functionality for users to force their model with restoring to ECCO. But they need not set it up the same way every time. They could use a callback, or a forcing function. It is not rigid. So it may be hard to serve a function that is guaranteed to work. We can serve a function that makes many assumptions about a typical setup, looks for data in the typical place, etc. But at that point I am not sure we have made much progress. A more robust strategy is to run the script that we want to run on the login node, perhaps at low resolution and for a short time. I think that should trigger all the downloads that would be needed for the simulation. It's robust because we directly use the same script that would be used for the simulation itself. It may not reuqire much more manual intervention from the user, since using a function utility is similarly difficult as changing architecture / problem size? Curious to hear thoughts. |
Nobody claimed that the bash script is better than Julia (yet, right?). |
Hm... I see @glwagner's point. For the bathymetry, it should be straightforward to download the raw data before any regrinding etc happens to it. But yes, it's not until the users construct a coupled model with atmosphere the simulation has all the available information, right? Perhaps providing a keyword to the appropriate methods to use data from |
I agree with @glwagner that it makes sense to just run the same script with a coarse grid to download the necessary data. The issue, however, is that the login node has additional storage and walltime constraints. So, if I run Github isn't allowing uploading bash scripts so I'll just paste it below.
|
Always prefer pasting scripts rather than links |
I think ETOPO and ECCO are downloaded by |
PS I think this issue should be brought up in parallel with the admin of the HPC center. Internet on nodes is likely not changeable, but login constraints probably are right? 15 min is too short to download large files. From the info provided, it sounds like the HPC actually somehow requires one to use bash. I don't think using different julia functions will solve the 15 min issue? |
Ok, so it looks like to me that one does not have to get to ClimaOcean.jl/src/DataWrangling/JRA55.jl Line 416 in 459d76d
since ClimaOcean.jl/src/DataWrangling/JRA55.jl Lines 674 to 682 in 459d76d
are all within JRA55PrescribedAtmosphere(arch, time_indices; kw...) somewhere. As far as I can tell, it does not matter that |
As for ETOPO1 data, we currently call this within ClimaOcean.jl/src/Bathymetry.jl Lines 104 to 106 in 459d76d
so we just need to isolate this in a function like function download_bathymetry_data(
url = "https://www.ngdc.noaa.gov/thredds/fileServer/global/ETOPO2022/60s/60s_surface_elev_netcdf",
filename = "ETOPO_2022_v1_60s_N90W180_surface.nc",
progress = download_progress)
filepath = joinpath(dir, filename)
fileurl = url * "/" * filename # joinpath on windows creates the wrong url
Downloads.download(fileurl, filepath; progress=download_progress)
return nothing
end then users can call using ClimaOcean.Bathymetry: download_bathymetry_data
download_bathymetry_data() to download it. |
Finally for
|
Hi all,
I am trying to run ClimaOcean on the Gadi supercomputer in Australia, and only the login CPU node has internet access on the HPC (for security reasons).
This means that I can't run examples that require downloaded files without first manually downloading the input files, placing them in the necessary folders, and then submitting a job to the GPU or CPU nodes on the HPC. This is an OK workaround, but I realised that as others run this model, they may also be using HPC environments that don't have internet access outside of the login node.
I just wanted to flag this as a potential issue, and to discuss whether it may be worth developing a workflow which avoids this need to manually download input files prior to running the model. This may be something that is unavoidable, but I figured I would flag it! Thanks
The text was updated successfully, but these errors were encountered: