Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate ioos by the numbers generation #2

Open
MathewBiddle opened this issue Jul 8, 2022 · 9 comments · Fixed by #3
Open

Automate ioos by the numbers generation #2

MathewBiddle opened this issue Jul 8, 2022 · 9 comments · Fixed by #3
Assignees
Labels
by the numbers Things related to calculating IOOS by the Numbers (BTN)

Comments

@MathewBiddle
Copy link
Contributor

MathewBiddle commented Jul 8, 2022

Currently the IOOS by the numbers metrics are collected on a manual basis by running the Jupyter Notebook. The process should be automatically ran, using GH-Actions similar to the GTS metrics, on an annual? quarterly? basis.

@MathewBiddle MathewBiddle self-assigned this Jul 8, 2022
@MathewBiddle
Copy link
Contributor Author

Steps:

  • Translate notebook to python script
  • determine frequency
  • build GitHub Action

@MathewBiddle
Copy link
Contributor Author

@ocefpaf do you have an example of running a jupyter notebook in a GitHub Action?

@MathewBiddle MathewBiddle linked a pull request Jul 8, 2022 that will close this issue
@ocefpaf
Copy link
Member

ocefpaf commented Jul 9, 2022

@ocefpaf do you have an example of running a jupyter notebook in a GitHub Action?

There are many ways to do that but I prefer a single call to nbconvert. I usually save the notebook without any output and "convert" it to a notebook with the outputs.

@MathewBiddle MathewBiddle reopened this Jul 11, 2022
@MathewBiddle
Copy link
Contributor Author

I'd like to follow the single call to nbconvert to run the notebook and be done. Remove the standalone python script. Here's the addition to run the notebook.

      - name: Setup Conda
        uses: s-weigand/setup-conda@v1
        with:
          activate-conda: false
          conda-channels: conda-forge
      - name: Build environment
        shell: bash -l {0}
        run: |
          conda env create -f environment.yml
      - name: Execute Notebook
        run: |
          source activate ioos-btn
          jupyter nbconvert --to notebook --execute IOOS_BTN.ipynb --output=IOOS_BTN.ipynb

@ocefpaf
Copy link
Member

ocefpaf commented Jul 11, 2022

I never tried to overwrite the same notebook, need to test it first. The rest looks good. BTW, are you saving some results and publishing? What is the format? You can probably create a table or a simple page and publish as gh-pages to keep the notebook untouched.

@MathewBiddle
Copy link
Contributor Author

BTW, are you saving some results and publishing? What is the format?

Yeah, I'm saving the results as a csv file, similar to what I've done for the GTS metrics [1] which get collected quarterly with my GH Action [2].

[1] https://github.com/MathewBiddle/ioos_by_the_numbers/tree/main/gts
[2] https://github.com/MathewBiddle/ioos_by_the_numbers/blob/main/.github/workflows/metrics.yml

We're also looking to do something similar for the NGDAC metrics.

So, if you have ideas on how to make this more efficient/prettier, I'm all ears.

We can chat on Thursday about it too.

@ocefpaf
Copy link
Member

ocefpaf commented Jul 12, 2022

So, if you have ideas on how to make this more efficient/prettier, I'm all ears.

More efficient? No, what you are doing is the best as far as I know. Prettier? Yes. We should keep the csv, b/c that is more flexible, but save an HTML and table to post as gh-pages. That is a single line with pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_html.html.

@MathewBiddle
Copy link
Contributor Author

I've been considering moving IOOS_BTN.ipynb [1] to a series of functions, again, for each of the metrics and wrap that all in a standalone python script. That way we could call to something like:

import ioos_metrics

df_btn = ioos_metrics.btn() # by the numbers as a df
df_atn_gts = ioos_metrics.atn_gts() # atn gts metrics as a df

ngdac_glider_days = ioos_metrics.ngdac.glider_days() # give back the number of glider days. Maybe expand to accept start/end

The Jupyter Notebook it becoming a little unwieldly at this point IMO.

[1] - https://github.com/MathewBiddle/ioos_by_the_numbers/blob/main/IOOS_BTN.ipynb

@MathewBiddle MathewBiddle added the by the numbers Things related to calculating IOOS by the Numbers (BTN) label Jan 25, 2024
@MathewBiddle
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
by the numbers Things related to calculating IOOS by the Numbers (BTN)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants