Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anmn_temp_gridded_product - possible future improvements #812

Open
mhidas opened this issue Feb 20, 2018 · 2 comments
Open

anmn_temp_gridded_product - possible future improvements #812

mhidas opened this issue Feb 20, 2018 · 2 comments
Assignees

Comments

@mhidas
Copy link
Contributor

mhidas commented Feb 20, 2018

The temperature gridded product code (after #809) seems to work fine and it's reasonably clear what it does. However, I think it could be made clearer, simpler and possibly faster/more efficient by making use of some existing packages:

  • Use numpy arrays for all data handling and avoid looping arrays or through lists of arrays.
  • Better still, use the xarray package to handle both netCDF i/o and array arithmetic. E.g. it has methods that could replace some of the binning code.
  • Use boto3 to get files from S3 directly, rather than via HTTP.

Don't know if we'll ever have time to work on these, just wanted to make a note while I thought of it. To an extent this also applies to the burst-averaging code.

@lbesnard
Copy link
Contributor

I'm pretty sure using boto3 means you would have to develop the code and test it only on a authorised machine (@lwgordonimos ?) . Also If I'm correct, it does mean you cannot share it with people outside of the IMOS AODN organisation

@ghost
Copy link

ghost commented Feb 21, 2018

Not necessarily. If it relates to a public bucket, it is possible to still access it anonymously, and get the benefit of nice efficient S3 interaction . Refer: https://github.com/aodn/utilities/blob/master/jenkins/get_latest_artifact.py#L19 (note: that script is overly condensed so disregard the rest but the key point is the UNSIGNED requests to S3).

One other benefit would be that it has efficient download code internally, which avoids manually chunking the downloads, i.e. https://github.com/aodn/data-services/pull/809/files#diff-84abcef0df7fb618752454ed770a069aR157

The only drawbacks I could see are that:

  1. it becomes S3 only, instead of potentially any URL
  2. it creates a new dependency on boto3 (however I suspect this is not a problem, since it is likely to be installed anywhere this would be used anyway)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants