-
Notifications
You must be signed in to change notification settings - Fork 13
Processing time series data in Python on CHS Cloud
Note: CHS requires access to the DOI Network. That means you will need to be on the TIC (or get approved by CHS to use Amazon WorkSpaces, once it becomes available for general use).
-
Sign up for access to the Pangeo JupyterHub: Fill out the Pangeo Service Request Form. For more info on the process, see the CHS Pangeo Support Page.
-
Sign up for access to the Pangeo S3 Bucket: Fill out the Pangeo S3 Access form.
-
After you get approved (it might take a day or two), login using your Active Directory credentials http://pangeo.chs.usgs.gov. You will get a menu asking which environment you want to run. Choose the "default environment", which already contains stglib, xarray, pandas, hvplot, and lots of other useful libraries. It will take a few minutes to spin up.
-
On login, you get the standard Jupyter notebook interface by default. If you want to switch to the JupyterLab interface, edit the URL, replacing the trailing "tree" with "lab".
Example Jupyter Notebook interface: http://pangeo.chs.usgs.gov/user/[email protected]/tree
Example Jupyterlab interface: http://pangeo.chs.usgs.gov/user/[email protected]/lab
-
Explore your JupyterHub environment. Open a terminal in either Jupyter interface and you can do bash shell commands like
df -h
andls -lR
to examine your environment. Anything in /home/jovyan will be persisted, and you can use git or the direct upload feature in JupyterHub (drag and drop) to bring code and small datasets into the environment. Some repos that work with the CHS Pangeo default environment are at: https://code.chs.usgs.gov/earthmap/notebooks, including the Pangeo Tutorial. Feel free to clone them to your directory. -
Working with Cloud object storage buckets (S3). The Pangeo data bucket is at
s3://chs-pangeo-data-bucket
and you have read/write access under your username. Before you can use the bucket you need to open a terminal, install theaws cli
and runaws configure
supplying the amazon public and secret keys given you by CHS. After this is done, you should be able to write data with code like this:import fsspec import xarray as xr infile = fsspec.open("s3://anaconda-public-datasets/iris/iris.csv", mode='rt', anon=True) with infile as f: df = pd.read_csv(f) outfile = fsspec.open("s3://chs-pangeo-data-bucket/rsignell/testing/iris.csv", mode='wt', profile='default') with outfile as f: df.to_csv(f)
You can also move data from the command line using the AWS command line interface (CLI) from the terminal, either on the Pangeo JupyterHub, or your local workstation (as long as you are on the DOI network).
For example, this should work:
aws s3 cp 2561-A.nc s3://chs-pangeo-data-bucket/rsignell/ncfiles/2561-A.nc --profile default