-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download ECCO files using Downloads
and .netrc
files
#281
Conversation
It seems to pass on windows architectures |
@waywardpidgeon this should work on windows if you want to test it there |
should close #179 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #281 +/- ##
=====================================
Coverage 0.00% 0.00%
=====================================
Files 34 34
Lines 1910 1921 +11
=====================================
- Misses 1910 1921 +11 ☔ View full report in Codecov by Sentry. |
return nothing | ||
end | ||
|
||
# ECCO downloader | ||
function ECCO_downloader(username, password, dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is specific about ECCO? This function looks general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the netrc file writes down ECCO specific information (the machine). We could generalize it easily if we think we might need this somewhere else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have generalized the function to accept a "machine" argument
# Write down the username and password in a .netrc file | ||
downloader = netrc_downloader(username, password, "ecco.jpl.nasa.gov", tmp) | ||
|
||
@distribute for metadatum in metadata # Distribute the download among ranks if MPI is initialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having multiple processes download the same file to the same location can lead to race conditions. It is also not the best of the etiquettes: if you run with 1000 MPI processes, this will send 1000 requests to the ECCO servers at the very same time, which would lead to high load on their end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the @distribute
macro distributes the download over the ranks, so in theory we should never have different ranks downloading the same file. The second comment is right though, if we have 1000 ranks and 1000 files to download we will have 1000 requests at the same time to the ECCO servers (different files though). Will this create problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the @distribute macro distributes the download over the ranks, so in theory we should never have different ranks downloading the same file.
Got it. Thanks
The second comment is right though, if we have 1000 ranks and 1000 files to download we will have 1000 requests at the same time to the ECCO servers (different files though). Will this create problems?
I don't think it's necessarily a problem, but it's something to be mindful about (mostly as a pattern to not abuse). If one is running on 1000s processes, it is probably best to pre-download the data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also a hack, why would we distributed among MPI processes? It might be better to instead distribute with tasks, but only from rank 0. That way the download is distributed and fast, but we limit the number of requests to something reasonable.
src/DataWrangling/DataWrangling.jl
Outdated
|
||
Create a downloader that uses a netrc file to authenticate with the given machine. | ||
This downlader writes down a netrc file with the username and password for the given machine in the directory `dir`. | ||
To avoid storing passwords in plain text, it is recommended to initialize the downloader in a temporary directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this avoid storing passwords in plain text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, well it does but then the temporary directory with the netrc file is deleted when the do tmp; ...; end
block is closed. So in theory for a finite amount of time the password is indeed stored in the filesystem. I can probably rewrite this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory? I don't understand. The password is definitely stored in plain text right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not merely "theoretical", the design explicitly stores it in text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, it is definitely stored in plain text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also is it a problem that .netrc
is written? Isn't this the recommended way to use .netrc
anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the environment variables route is much cleaner, but it looks like with .netrc we are forced to write down a file. I probably wouldn't like to have a stored file with a password automatically written so I think it is nice to raise this point in the docstring.
This should solve #272
I have added some windows tests, let's see if they pass.
supersedes #183