Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download ECCO files using Downloads and .netrc files #281

Merged
merged 33 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
81d93cc
this should work
simone-silvestri Nov 28, 2024
867da9c
better naming
simone-silvestri Nov 28, 2024
71f2b19
only one download
simone-silvestri Nov 28, 2024
b36f60f
add download test
simone-silvestri Nov 28, 2024
31fb896
joinpath does not work on windows
simone-silvestri Nov 28, 2024
5b229db
test also downloading the bathymetry
simone-silvestri Nov 28, 2024
3ee4eaa
test dowloading bathymetry
simone-silvestri Nov 28, 2024
cba1991
restore tests
simone-silvestri Nov 28, 2024
51aff06
gracefull downloading
simone-silvestri Nov 28, 2024
d22edc9
try it now
simone-silvestri Nov 28, 2024
ebaa07d
fix typo
simone-silvestri Nov 28, 2024
879d611
make sure we delete the previous data before testing the download
simone-silvestri Nov 28, 2024
cd54cdb
Merge branch 'main' into ss/download-everywhere
simone-silvestri Nov 28, 2024
0c38ce1
should work
simone-silvestri Dec 2, 2024
25216bc
Merge branch 'ss/download-everywhere' of github.com:CliMA/ClimaOcean.…
simone-silvestri Dec 2, 2024
c0892b9
test distributed downloading
simone-silvestri Dec 2, 2024
6d73e98
Update test_distributed_utils.jl
simone-silvestri Dec 2, 2024
3fc3bd5
fix the download
simone-silvestri Dec 2, 2024
f3dec5c
generalize the downloader
simone-silvestri Dec 2, 2024
3525b76
generalize more
simone-silvestri Dec 2, 2024
6a3fa7d
generalize filename
simone-silvestri Dec 2, 2024
91be188
download_progress is part of the downloading utilities
simone-silvestri Dec 2, 2024
ab2563d
better docstring
simone-silvestri Dec 2, 2024
25b42cd
better docstring
simone-silvestri Dec 2, 2024
a77cebb
change docstring
simone-silvestri Dec 2, 2024
28f597c
Merge branch 'main' into ss/download-everywhere
simone-silvestri Dec 3, 2024
c8b6e97
fix tests
simone-silvestri Dec 4, 2024
42da589
Merge branch 'ss/download-everywhere' of github.com:CliMA/ClimaOcean.…
simone-silvestri Dec 4, 2024
ad64272
distribute among tasks
simone-silvestri Dec 5, 2024
5c38029
whoops added wrong file
simone-silvestri Dec 5, 2024
926394f
correct looping
simone-silvestri Dec 9, 2024
a3254de
bugfix
simone-silvestri Dec 10, 2024
5b65403
Merge branch 'main' into ss/download-everywhere
simone-silvestri Dec 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ jobs:
arch:
- x64
include:
- os: windows-latest
arch: x86
version: '1.10'
- os: macOS-latest
arch: arm64
version: '1.10'
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ docs/src/literated/
*.svg
*.gif

# Password files
*.netrc

# File generated by Pkg, the package manager, based on a corresponding Project.toml
# It records a fixed state of all packages used by the project. As such, it should not be
# committed for packages, but should be committed for applications that require a static
Expand Down
15 changes: 4 additions & 11 deletions src/Bathymetry.jl
Original file line number Diff line number Diff line change
Expand Up @@ -93,17 +93,10 @@ function regrid_bathymetry(target_grid;
major_basins = Inf) # Allow an `Inf` number of ``lakes''

filepath = joinpath(dir, filename)
fileurl = joinpath(url, filename)

@root begin # perform all this only on rank 0, aka the "root" rank
if !isfile(filepath)
try
Downloads.download(fileurl, filepath; progress=download_progress, verbose=true)
catch
cmd = `wget --no-check-certificate -O $filepath $fileurl`
@root run(cmd)
end
end
fileurl = url * "/" * filename # joinpath on windows creates the wrong url

@root if !isfile(filepath) # perform all this only on rank 0, aka the "root" rank
Downloads.download(fileurl, filepath; progress=download_progress)
end

dataset = Dataset(filepath)
Sbozzolo marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
2 changes: 1 addition & 1 deletion src/DataWrangling/ECCO/ECCO.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ export ECCORestoring, LinearlyTaperedPolarMask

using ClimaOcean
using ClimaOcean.DataWrangling
using ClimaOcean.DataWrangling: inpaint_mask!, NearestNeighborInpainting
using ClimaOcean.DataWrangling: inpaint_mask!, NearestNeighborInpainting, download_progress
using ClimaOcean.InitialConditions: three_dimensional_regrid!, interpolate!

using Oceananigans
Expand Down
52 changes: 47 additions & 5 deletions src/DataWrangling/ECCO/ECCO_metadata.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ using ClimaOcean.DataWrangling
import Dates: year, month, day

using Base: @propagate_inbounds
using Downloads

import Oceananigans.Fields: set!, location
import Base
Expand Down Expand Up @@ -144,12 +145,12 @@ short_name(data::ECCOMetadata{<:Any, <:ECCO2Daily}) = ECCO2_short_names[data.n
short_name(data::ECCOMetadata{<:Any, <:ECCO2Monthly}) = ECCO2_short_names[data.name]
short_name(data::ECCOMetadata{<:Any, <:ECCO4Monthly}) = ECCO4_short_names[data.name]

metadata_url(prefix, m::ECCOMetadata{<:Any, <:ECCO2Daily}) = joinpath(prefix, short_name(m), metadata_filename(m))
metadata_url(prefix, m::ECCOMetadata{<:Any, <:ECCO2Monthly}) = joinpath(prefix, short_name(m), metadata_filename(m))
metadata_url(prefix, m::ECCOMetadata{<:Any, <:ECCO2Daily}) = prefix * "/" * short_name(m) * "/" * metadata_filename(m)
metadata_url(prefix, m::ECCOMetadata{<:Any, <:ECCO2Monthly}) = prefix * "/" * short_name(m) * "/" * metadata_filename(m)

function metadata_url(prefix, m::ECCOMetadata{<:Any, <:ECCO4Monthly})
year = string(Dates.year(m.dates))
return joinpath(prefix, short_name(m), year, metadata_filename(m))
return prefix * "/" * short_name(m) * "/" * year * "/" * metadata_filename(m)
end

location(data::ECCOMetadata) = ECCO_location[data.name]
Expand Down Expand Up @@ -218,6 +219,9 @@ function download_dataset(metadata::ECCOMetadata; url = urls(metadata))
password = get(ENV, "ECCO_PASSWORD", nothing)
dir = metadata.dir

# Write down the username and password in a .netrc file
downloader = ECCO_downloader(username, password, dir)

@distribute for metadatum in metadata # Distribute the download among ranks if MPI is initialized

fileurl = metadata_url(url, metadatum)
Expand All @@ -237,10 +241,48 @@ function download_dataset(metadata::ECCOMetadata; url = urls(metadata))
throw(ArgumentError(msg))
end

cmd = `wget --http-user=$(username) --http-passwd=$(password) --directory-prefix=$dir $fileurl`
run(cmd)
Downloads.download(fileurl, filepath; downloader, progress=download_progress)
end
end

# Remove the .netrc file after downloading to avoid storing the credentials
remove_netrc!(dir)
simone-silvestri marked this conversation as resolved.
Show resolved Hide resolved

return nothing
end

# ECCO downloader
function ECCO_downloader(username, password, dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is specific about ECCO? This function looks general

Copy link
Collaborator Author

@simone-silvestri simone-silvestri Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the netrc file writes down ECCO specific information (the machine). We could generalize it easily if we think we might need this somewhere else

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have generalized the function to accept a "machine" argument

netrc_file = ECCO_netrc!(username, password, dir)
downloader = Downloads.Downloader()
easy_hook = (easy, _) -> Downloads.Curl.setopt(easy, Downloads.Curl.CURLOPT_NETRC_FILE, netrc_file)

downloader.easy_hook = easy_hook
return downloader
end

# Code snippet adapted from https://github.com/evetion/SpaceLiDAR.jl/blob/master/src/utils.jl#L150
function ECCO_netrc!(username, password, dir)
simone-silvestri marked this conversation as resolved.
Show resolved Hide resolved
if Sys.iswindows()
filepath = joinpath(dir, "ECCO_netrc")
else
filepath = joinpath(dir, "ECCO.netrc")
end

open(filepath, "a") do f
write(f, "\n")
write(f, "machine ecco.jpl.nasa.gov login $username password $password\n")
end

return filepath
end

function remove_netrc!(dir)
if Sys.iswindows()
filepath = joinpath(dir, "ECCO_netrc")
else
filepath = joinpath(dir, "ECCO.netrc")
end

rm(filepath; force = true)
end
simone-silvestri marked this conversation as resolved.
Show resolved Hide resolved
simone-silvestri marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion src/DataWrangling/JRA55.jl
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,7 @@ function JRA55_field_time_series(variable_name;

# Note, we don't re-use existing jld2 files.
@root begin
isfile(filepath) || download(url, filepath)
isfile(filepath) || download(url, filepath; progress=download_progress)
isfile(jld2_filepath) && rm(jld2_filepath)
end

Expand Down
6 changes: 3 additions & 3 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ if test_group == :init || test_group == :all
CUDA.set_runtime_version!(v"12.6"; local_toolkit = true)
CUDA.precompile_runtime()

####
#### Download bathymetry data
####
###
### Download bathymetry data
###

download_bathymetry()

Expand Down
7 changes: 5 additions & 2 deletions test/runtests_setup.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ using Oceananigans.Architectures: architecture, on_architecture
using Oceananigans.OutputReaders: interpolate!

using ClimaOcean
using ClimaOcean.Bathymetry: download_bathymetry_cache
using CFTime
using Dates

Expand All @@ -28,13 +29,15 @@ temperature_metadata = ECCOMetadata(:temperature, dates)
salinity_metadata = ECCOMetadata(:salinity, dates)

# Fictitious grid that triggers bathymetry download
function download_bathymetry()
function download_bathymetry(; dir = download_bathymetry_cache,
filename = "ETOPO_2022_v1_60s_N90W180_surface.nc")

grid = LatitudeLongitudeGrid(size = (10, 10, 1),
longitude = (0, 100),
latitude = (0, 50),
z = (-6000, 0))

bottom = regrid_bathymetry(grid)
bottom = regrid_bathymetry(grid; dir, filename)

return nothing
end
22 changes: 22 additions & 0 deletions test/test_downloading.jl
Original file line number Diff line number Diff line change
@@ -1,8 +1,30 @@
include("runtests_setup.jl")

using ClimaOcean.ECCO: metadata_path

@testset "Availability of JRA55 data" begin
@info "Testing that we can download all the JRA55 data..."
for name in ClimaOcean.DataWrangling.JRA55.JRA55_variable_names

fts = ClimaOcean.JRA55.JRA55_field_time_series(name; time_indices=2:3)
end
end

@testset "Availability of ECCO data" begin
@info "Testing that we can download ECCO data..."
for variable in keys(ClimaOcean.ECCO.ECCO4_short_names)
metadata = ECCOMetadata(variable)
filepath = metadata_path(metadata)
isfile(filepath) && rm(filepath; force=true)
ClimaOcean.ECCO.download_dataset(metadata)
end
end

@testset "Availability of the Bathymetry" begin
@info "Testing that we can download the bathymetry..."
dir="./"
filename="ETOPO_2022_v1_60s_N90W180_surface.nc"
filepath=joinpath(dir, filename)
isfile(filepath) && rm(filepath; force=true)
download_bathymetry(; dir, filename)
end
simone-silvestri marked this conversation as resolved.
Show resolved Hide resolved