-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download ECCO files using Downloads
and .netrc
files
#281
Conversation
It seems to pass on windows architectures |
@waywardpidgeon this should work on windows if you want to test it there |
should close #179 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #281 +/- ##
=====================================
Coverage 0.00% 0.00%
=====================================
Files 34 34
Lines 1910 1921 +11
=====================================
- Misses 1910 1921 +11 ☔ View full report in Codecov by Sentry. |
return nothing | ||
end | ||
|
||
# ECCO downloader | ||
function ECCO_downloader(username, password, dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is specific about ECCO? This function looks general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the netrc file writes down ECCO specific information (the machine). We could generalize it easily if we think we might need this somewhere else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have generalized the function to accept a "machine" argument
# Write down the username and password in a .netrc file | ||
downloader = netrc_downloader(username, password, "ecco.jpl.nasa.gov", tmp) | ||
|
||
@distribute for metadatum in metadata # Distribute the download among ranks if MPI is initialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having multiple processes download the same file to the same location can lead to race conditions. It is also not the best of the etiquettes: if you run with 1000 MPI processes, this will send 1000 requests to the ECCO servers at the very same time, which would lead to high load on their end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the @distribute
macro distributes the download over the ranks, so in theory we should never have different ranks downloading the same file. The second comment is right though, if we have 1000 ranks and 1000 files to download we will have 1000 requests at the same time to the ECCO servers (different files though). Will this create problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the @distribute macro distributes the download over the ranks, so in theory we should never have different ranks downloading the same file.
Got it. Thanks
The second comment is right though, if we have 1000 ranks and 1000 files to download we will have 1000 requests at the same time to the ECCO servers (different files though). Will this create problems?
I don't think it's necessarily a problem, but it's something to be mindful about (mostly as a pattern to not abuse). If one is running on 1000s processes, it is probably best to pre-download the data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also a hack, why would we distributed among MPI processes? It might be better to instead distribute with tasks, but only from rank 0. That way the download is distributed and fast, but we limit the number of requests to something reasonable.
src/DataWrangling/DataWrangling.jl
Outdated
|
||
Create a downloader that uses a netrc file to authenticate with the given machine. | ||
This downlader writes down a netrc file with the username and password for the given machine in the directory `dir`. | ||
To avoid storing passwords in plain text, it is recommended to initialize the downloader in a temporary directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this avoid storing passwords in plain text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, well it does but then the temporary directory with the netrc file is deleted when the do tmp; ...; end
block is closed. So in theory for a finite amount of time the password is indeed stored in the filesystem. I can probably rewrite this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory? I don't understand. The password is definitely stored in plain text right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not merely "theoretical", the design explicitly stores it in text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, it is definitely stored in plain text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also is it a problem that .netrc
is written? Isn't this the recommended way to use .netrc
anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the environment variables route is much cleaner, but it looks like with .netrc we are forced to write down a file. I probably wouldn't like to have a stored file with a password automatically written so I think it is nice to raise this point in the docstring.
Following my previous comment, I removed all my packages and made a clean start. However the warning of a circular dependency when attempting to add ClimaOcean remained and the precompilation of about 80 packages was skipped. The same result occurred with Pkg.instantiate().
I am still using version 1.10.5 of Julia, but this does not look like a Julia problem.
…________________________________
From: Simone Silvestri ***@***.***>
Sent: Friday, November 29, 2024 7:15 AM
To: CliMA/ClimaOcean.jl ***@***.***>
Cc: Kevin Broughan ***@***.***>; Mention ***@***.***>
Subject: Re: [CliMA/ClimaOcean.jl] Download ECCO files using `Downloads` and `.netrc` files (PR #281)
@waywardpidgeon<https://github.com/waywardpidgeon> this should work on windows if you want to test it there
—
Reply to this email directly, view it on GitHub<#281 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AUPNYEQPEBHGPRVBFUJM5QT2C5MVNAVCNFSM6AAAAABSVA2TKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBWGIYDGNZQGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
We've been informed that the circular dependency issue will be resolved with future Julia versions. But I don't think the warning itself stops any code from running. At least, I have not encountered this. One simply has to press forward, ignoring the warnings. |
No, those warnings were just a prelude to other problems. Here is an excerpt from the Julia output v1.10.5:
```
....
SciMLOperatorsSparseArraysExt [9985400b-97ec-5583-b534-4f70b643bcf7]
│ ClimaSeaIce [6ba0ff68-24e6-4315-936c-2e99227c95a4]
└ @ Pkg.API C:\Users\kab\.julia\juliaup\julia-1.10.5+0.x64.w64.mingw32\share\julia\stdlib\v1.10\Pkg\src\API.jl:1279
[ Info: Precompiling ClimaOcean [0376089a-ecfe-4b0e-a64f-9c555d74d754]
┌ Warning: Module CUDA with build ID ffffffff-ffff-ffff-0006-1b6894581c7d is missing from the cache.
│ This may mean CUDA [052768ef-5323-5732-b1bb-66c8b64840ba] does not support precompilation but is imported by a module that does.
└ @ Base loading.jl:1948
┌ Error: Error during loading of extension AtomixCUDAExt of Atomix, use `Base.retry_load_extensions()` to retry.
│ exception =
│ 1-element ExceptionStack:
│ Declaring __precompile__(false) is not allowed in files that are being precompiled.
│ Stacktrace:
│ [1] _require(pkg::Base.PkgId, env::Nothing)
│ @ Base .\loading.jl:1999
│ [2] __require_prelocked(uuidkey::Base.PkgId, env::Nothing)
│ @ Base .\loading.jl:1812
│ [3] #invoke_in_world#3
│ @ .\essentials.jl:926 [inlined]
│ [4] invoke_in_world
│ @ .\essentials.jl:923 [inlined]
│ [5] _require_prelocked
│ @ .\loading.jl:1803 [inlined]
│ [6] _require_prelocked
│ @ .\loading.jl:1802 [inlined]
│ [7] run_extension_callbacks(extid::Base.ExtensionId)
│ @ Base .\loading.jl:1295
│ [8] run_extension_callbacks(pkgid::Base.PkgId)
│ @ Base .\loading.jl:1330
│ [9] run_package_callbacks(modkey::Base.PkgId)
│ @ Base .\loading.jl:1164
│ [10] _tryrequire_from_serialized(modkey::Base.PkgId, build_id::UInt128)
│ @ Base .\loading.jl:1451
│ [11] _tryrequire_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String)
│ @ Base .\loading.jl:1524
│ [12] _require(pkg::Base.PkgId, env::String)
│ @ Base .\loading.jl:1990
│ [13] __require_prelocked(uuidkey::Base.PkgId, env::String)
│ @ Base .\loading.jl:1812
│ [14] #invoke_in_world#3
│ @ .\essentials.jl:926 [inlined]
│ [15] invoke_in_world
│ @ .\essentials.jl:923 [inlined]
│ [16] _require_prelocked(uuidkey::Base.PkgId, env::String)
│ @ Base .\loading.jl:1803
│ [17] macro expansion
│ @ .\loading.jl:1790 [inlined]
│ [18] macro expansion
│ @ .\lock.jl:267 [inlined]
│ [19] __require(into::Module, mod::Symbol)
│ @ Base .\loading.jl:1753
│ [20] #invoke_in_world#3
│ @ .\essentials.jl:926 [inlined]
│ [21] invoke_in_world
│ @ .\essentials.jl:923 [inlined]
│ [22] require(into::Module, mod::Symbol)
│ @ Base .\loading.jl:1746
│ [23] include(mod::Module, _path::String)
│ @ Base .\Base.jl:495
│ [24] include(x::String)
│ @ CUDA C:\Users\kab\.julia\packages\CUDA\2kjXI\src\CUDA.jl:1
│ [25] top-level scope
│ @ C:\Users\kab\.julia\packages\CUDA\2kjXI\src\CUDA.jl:123
│ [26] include
│ @ .\Base.jl:495 [inlined]
│ [27] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
│ @ Base .\loading.jl:2222
│ [28] top-level scope
│ @ stdin:3
│ [29] eval
│ @ .\boot.jl:385 [inlined]
│ [30] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
│ @ Base .\loading.jl:2076
│ [31] include_string
│ @ .\loading.jl:2086 [inlined]
│ [32] exec_options(opts::Base.JLOptions)
│ @ Base .\client.jl:316
│ [33] _start()
│ @ Base .\client.jl:552
└ @ Base loading.jl:1301
WARNING: could not import Grids.coordinates into ImmersedBoundaries
┌ Warning: Circular dependency detected. Precompilation will be skipped for:
│ DiffEqBaseChainRulesCoreExt [b00db79b-61e3-50fb-b26f-2d35b2d9e4ed]
│ StaticArrayInterface [0d7ed370-da01-4f52-bd93-41d350b8b718]
│ KernelAbstractions [63c18a36-062a-441e-b654-da1e3ab1ce7c]
│ LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890]
....
When this type of circular dependencies were detetected in Unix C libraries (back in my youth!) it spelt disaster for linking. Sun Microsystems with Java fixed this problem by giving the libraries a good structure.
Yes, of course I ignore a few warnings, but these warnings + errors are hard to ignore. However inspect_JRA55_data.jl DOES appear to be working run at the power shell command line.
Running ecco_inspect_temperature_salinity.jl I get:
```
include("ecco_inspect_temperature_salinity.jl")
ERROR: LoadError: UndefVarError: `BuoyancyModels` not defined
Stacktrace:
[1] include(fname::String)
@ Base.MainInclude .\client.jl:489
[2] top-level scope
@ REPL[10]:1
in expression starting at C:\Users\kab\cc\ccex0\ClimaOcean.jl\examples\ecco_inspect_temperature_salinity.jl:35
```
My time has run out. I will get back to this in mid January but hope to find time to run some more tests.
Happy holidays!
…________________________________
From: Gregory L. Wagner ***@***.***>
Sent: Tuesday, December 17, 2024 1:40 PM
To: CliMA/ClimaOcean.jl ***@***.***>
Cc: Kevin Broughan ***@***.***>; Mention ***@***.***>
Subject: Re: [CliMA/ClimaOcean.jl] Download ECCO files using `Downloads` and `.netrc` files (PR #281)
We've been informed that the circular dependency issue will be resolved with future Julia versions. But I don't think the warning itself stops any code from running. At least, I have not encountered this. One simply has to press forward, ignoring the warnings.
—
Reply to this email directly, view it on GitHub<#281 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AUPNYEXRPKYZKY6PR7DYFAT2F5XJXAVCNFSM6AAAAABSVA2TKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBXGI2TQNRSGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
This is fixed in the latest ClimaOcean.
Well, I totally agree that a circular dependency seems bad. However, we have been informed that the error is wrong, and there is not a real circular dependency here: CliMA/Oceananigans.jl#3964 It tentatively seems like the code will work and we can close this issue, but we can wait until after the holidays to confirm. Happy holidays to you as well! |
This should solve #272
I have added some windows tests, let's see if they pass.
supersedes #183