Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

220 - Define stage in, process, stage out tasks in entrypoint #240

Merged
merged 20 commits into from
Dec 20, 2024

Conversation

nikki-t
Copy link
Collaborator

@nikki-t nikki-t commented Nov 26, 2024

Purpose

  • Run stage in, process, and stage out CWL tasks in entrypoint file so that data is stored locally on EBS.

Notes to consider:

  • Used example CWL found in this repo to get things working.
  • Stage out required the use of AWS credentials. I have temporarily stored the short term access credentials in manually created SSM parameters.
  • Review input parameters to see if these are what we would expect the user to want to change.
  • Stage in and stage out workflows are defined as constants at the top of the CWL DAG making it easier to change as the cwl Docker container does not need to be rebuilt.

Proposed Changes

  • [ADD] Modify and add stage in, process, and stage out tasks to CWL DAG entrypoint.

Issues

Testing

Deployed to unity-venue-dev for testing:

Stage-In, Processing, Stage-Out logs

[2024-11-26, 21:18:01 UTC] {pod_manager.py:472} INFO - [base] Executing the CWL workflow: https://raw.githubusercontent.com/mike-gangl/unity-OGC-example-application/refs/heads/main/stage_in.cwl with json arguments: https://raw.githubusercontent.com/mike-gangl/unity-OGC-example-application/refs/heads/main/test/ogc_app_package/stage_in.yml and working directory: /scratch
[2024-11-26, 21:18:01 UTC] {pod_manager.py:472} INFO - [base] Executing the CWL workflow: https://raw.githubusercontent.com/mike-gangl/unity-OGC-example-application/refs/heads/main/process.cwl with json arguments: ./job_args_process.json and working directory: /scratch
...
[2024-11-26, 21:18:10 UTC] {pod_manager.py:472} INFO - [base] + cwltool --outdir stage_in --copy-output https://raw.githubusercontent.com/mike-gangl/unity-OGC-example-application/refs/heads/main/stage_in.cwl https://raw.githubusercontent.com/mike-gangl/unity-OGC-example-application/refs/heads/main/test/ogc_app_package/stage_in.yml
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base] INFO Final process status is success
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base] + stage_in='{
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]     "stage_in_collection_file": {
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]         "location": "file:///scratch/stage_in/8wyg114f/stage-in-results.json",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]         "basename": "stage-in-results.json",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]         "class": "File",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]         "checksum": "sha1$be5051f72925ab11b9f5a21d41e90c309901aa9e",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]         "size": 3024,
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]         "path": "/scratch/stage_in/8wyg114f/stage-in-results.json"
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]     },
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]     "stage_in_download_dir": {
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]         "location": "file:///scratch/stage_in/8wyg114f",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]         "basename": "8wyg114f",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]         "class": "Directory",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]         "listing": [
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]             {
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "class": "File",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "location": "file:///scratch/stage_in/8wyg114f/SNDR.SS1330.CHIRP.20160822T0005.m06.g001.L1_AQ.std.v02_48.G.200425095850.nc",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "basename": "SNDR.SS1330.CHIRP.20160822T0005.m06.g001.L1_AQ.std.v02_48.G.200425095850.nc",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "size": 51875041,
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "checksum": "sha1$34798d328d763432964b132ecb843a35f2046399",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "path": "/scratch/stage_in/8wyg114f/SNDR.SS1330.CHIRP.20160822T0005.m06.g001.L1_AQ.std.v02_48.G.200425095850.nc"
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]             },
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]             {
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "class": "File",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "location": "file:///scratch/stage_in/8wyg114f/SNDR.SS1330.CHIRP.20160822T0011.m06.g002.L1_AQ.std.v02_48.G.200425095901.nc",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "basename": "SNDR.SS1330.CHIRP.20160822T0011.m06.g002.L1_AQ.std.v02_48.G.200425095901.nc",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "size": 52398321,
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "checksum": "sha1$c84575ed7568c9c41f04cec7d3b58488edeec15d",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "path": "/scratch/stage_in/8wyg114f/SNDR.SS1330.CHIRP.20160822T0011.m06.g002.L1_AQ.std.v02_48.G.200425095901.nc"
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]             },
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]             {
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "class": "File",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "location": "file:///scratch/stage_in/8wyg114f/G2040068613-GES_DISC.stac.json",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "basename": "G2040068613-GES_DISC.stac.json",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "size": 1972,
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "checksum": "sha1$0a90d96ddaf8672043dd68d73bb0fd78bdb23697",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "path": "/scratch/stage_in/8wyg114f/G2040068613-GES_DISC.stac.json"
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]             },
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]             {
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "class": "File",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "location": "file:///scratch/stage_in/8wyg114f/G2040068619-GES_DISC.stac.json",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "basename": "G2040068619-GES_DISC.stac.json",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "size": 1988,
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "checksum": "sha1$755f72c08adf732260d0e1e6482bae37c0de7163",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "path": "/scratch/stage_in/8wyg114f/G2040068619-GES_DISC.stac.json"
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]             },
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]             {
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "class": "File",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "location": "file:///scratch/stage_in/8wyg114f/catalog.json",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "basename": "catalog.json",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "size": 529,
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "checksum": "sha1$6d7ad3ef736aa536c8881f72dc6da24c9238fe19",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "path": "/scratch/stage_in/8wyg114f/catalog.json"
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]             },
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]             {
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "class": "File",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "location": "file:///scratch/stage_in/8wyg114f/stage-in-results.json",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "basename": "stage-in-results.json",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "size": 3024,
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "checksum": "sha1$be5051f72925ab11b9f5a21d41e90c309901aa9e",
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]                 "path": "/scratch/stage_in/8wyg114f/stage-in-results.json"
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]             }
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]         ],
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]         "path": "/scratch/stage_in/8wyg114f"
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base]     }
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base] }'
...
[2024-11-26, 21:18:40 UTC] {pod_manager.py:472} INFO - [base] + cwltool https://raw.githubusercontent.com/mike-gangl/unity-OGC-example-application/refs/heads/main/process.cwl ./job_args_process.json
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base] INFO Final process status is success
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base] + process='{
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]     "output": {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]         "location": "file:///scratch/iu5di8f8",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]         "basename": "iu5di8f8",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]         "class": "Directory",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]         "listing": [
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]             {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "class": "File",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "location": "file:///scratch/iu5di8f8/.jupyter-server-log.txt",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "basename": ".jupyter-server-log.txt",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "size": 4237,
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "checksum": "sha1$8c41676e84d53043dda407294d503508a0cec4f4",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "path": "/scratch/iu5di8f8/.jupyter-server-log.txt"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]             },
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]             {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "class": "File",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "location": "file:///scratch/iu5di8f8/process_out.ipynb",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "basename": "process_out.ipynb",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "size": 20992,
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "checksum": "sha1$c1ae4e28bcee8043bdb4b77ad6b1ca2362dcdc10",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "path": "/scratch/iu5di8f8/process_out.ipynb"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]             },
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]             {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "class": "Directory",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "location": "file:///scratch/iu5di8f8/.ipython",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "basename": ".ipython",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "listing": [
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                     {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                         "class": "Directory",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                         "location": "file:///scratch/iu5di8f8/.ipython/profile_default",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                         "basename": "profile_default",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                         "listing": [
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                             {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "class": "Directory",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "location": "file:///scratch/iu5di8f8/.ipython/profile_default/security",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "basename": "security",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "listing": [],
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "path": "/scratch/iu5di8f8/.ipython/profile_default/security"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                             },
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                             {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "class": "Directory",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "location": "file:///scratch/iu5di8f8/.ipython/profile_default/log",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "basename": "log",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "listing": [],
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "path": "/scratch/iu5di8f8/.ipython/profile_default/log"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                             },
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                             {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "class": "Directory",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "location": "file:///scratch/iu5di8f8/.ipython/profile_default/startup",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "basename": "startup",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "listing": [
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                     {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                         "class": "File",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                         "location": "file:///scratch/iu5di8f8/.ipython/profile_default/startup/README",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                         "basename": "README",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                         "size": 371,
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                         "checksum": "sha1$375b32a1fcddcc54ad42c5181f0d339c425aa0ec",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                         "path": "/scratch/iu5di8f8/.ipython/profile_default/startup/README"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                     }
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 ],
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "path": "/scratch/iu5di8f8/.ipython/profile_default/startup"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                             },
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                             {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "class": "Directory",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "location": "file:///scratch/iu5di8f8/.ipython/profile_default/pid",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "basename": "pid",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "listing": [],
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "path": "/scratch/iu5di8f8/.ipython/profile_default/pid"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                             },
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                             {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "class": "Directory",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "location": "file:///scratch/iu5di8f8/.ipython/profile_default/db",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "basename": "db",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "listing": [],
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                                 "path": "/scratch/iu5di8f8/.ipython/profile_default/db"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                             }
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                         ],
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                         "path": "/scratch/iu5di8f8/.ipython/profile_default"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                     }
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 ],
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "path": "/scratch/iu5di8f8/.ipython"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]             },
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]             {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "class": "File",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "location": "file:///scratch/iu5di8f8/summary_table.txt",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "basename": "summary_table.txt",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "size": 970,
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "checksum": "sha1$7d0e8f6e60bb22985b269dc9b40bfc6e9bb900cf",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "path": "/scratch/iu5di8f8/summary_table.txt"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]             },
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]             {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "class": "File",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "location": "file:///scratch/iu5di8f8/summary_table.txt.json",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "basename": "summary_table.txt.json",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "size": 1024,
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "checksum": "sha1$a40df9c2583bd02f7d70d1a745b7058e784d9615",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "path": "/scratch/iu5di8f8/summary_table.txt.json"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]             },
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]             {
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "class": "File",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "location": "file:///scratch/iu5di8f8/catalog.json",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "basename": "catalog.json",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "size": 346,
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "checksum": "sha1$d35cb2009088feaa445a0ed4ecd6e5914e6f2c31",
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]                 "path": "/scratch/iu5di8f8/catalog.json"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]             }
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]         ],
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]         "path": "/scratch/iu5di8f8"
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base]     }
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base] }'
...
[2024-11-26, 21:19:15 UTC] {pod_manager.py:472} INFO - [base] + cwltool https://raw.githubusercontent.com/unity-sds/unity-sps-workflows/refs/heads/219-process-task/demos/cwl_dag_stage_out.cwl --output_dir /scratch/iu5di8f8 --staging_bucket unity-dev-unity-unity-data --collection_id example-app-collection___3 --aws_access_key_id x --aws_secret_access_key x --aws_session_token x
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base] + stage_out='{
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]     "failed_features": {
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "location": "file:///scratch/failed_features.json",
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "basename": "failed_features.json",
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "class": "File",
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "checksum": "sha1$e9b73c133ee6531ae67bea35e739b18c09efab9b",
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "size": 45,
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "path": "/scratch/failed_features.json"
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]     },
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]     "stage_out_results": {
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "location": "file:///scratch/stage-out-results.json",
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "basename": "stage-out-results.json",
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "class": "File",
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "checksum": "sha1$0f16c5b8ad6a6f3d76b056ec7a43281dd2e0eb69",
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "size": 368,
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "path": "/scratch/stage-out-results.json"
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]     },
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]     "successful_features": {
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "location": "file:///scratch/successful_features.json",
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "basename": "successful_features.json",
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "class": "File",
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "checksum": "sha1$c8a735ef8f21cd20aaaea7d5036081043b8d09b7",
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "size": 1094,
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]         "path": "/scratch/successful_features.json"
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base]     }
[2024-11-26, 21:19:43 UTC] {pod_manager.py:472} INFO - [base] }'

@nikki-t nikki-t self-assigned this Nov 26, 2024
@nikki-t
Copy link
Collaborator Author

nikki-t commented Dec 3, 2024

I updated the code to account for a new modular CWL Dag container image:

  • Kept a single Airflow container as it accommodates the original CWL DAG plus the new modular one.
  • Built a separate modular CWL docker container image to support modular utilities and entrypoint script.

I think once we merge, we will want to:

  1. Build the Airflow docker container so that it includes the updated unity_sps_utils.py file.
  2. Build the modular Docker container image to test that the Docker build script is working for the CWL DAG modular package
  3. Update the container image referenced in any Airflow Terraform module tfvars file and re-deploy as needed.

@LucaCinquini - I think this is ready to be merged but we will need to coordinate the above steps. Let me know what you think!

- Restructure entrypoint to handle file i/o between tasks
- Update DAG to pass in stage out arguments and STAC JSON
- Remove entrypoint utility script
@nikki-t
Copy link
Collaborator Author

nikki-t commented Dec 17, 2024

Updated for the following changes:

  • Updated DS container image
  • Update reference to stage in and stage out workflow links (see PR#20)
  • Grab project and venue from Airflow environment variables
  • Update reference to STAC JSON which now use roles to indicate data for stage in download

Tested in unity-venue-dev and the CWL DAG modular workflow ran successfully. I was able to retrieve the summary.txt file produced by the process CWL from the stage out S3 bucket.

@LucaCinquini - I think this may be ready to merge but take a look and let me know if you have any feedback/want to see any changes, thanks!

(We may still want to build the modular Docker container image to test that the Docker build script is working for the CWL DAG modular package)

@nikki-t
Copy link
Collaborator Author

nikki-t commented Dec 17, 2024

Successfully tested after merging develop.

Copy link
Collaborator

@LucaCinquini LucaCinquini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nikki-t : great work but I do have a few questions:

  • Can the value of DS_S3_BUCKET_PARAM be replaced with something more general? (i.e. remove "unity-nikki-1"). I think Galen and Nga have settled on an approved key for this SSM parameter.
  • Have you talked to Nga about changes to the stage in and stage out workflows?
  • It also occurred to me that we could build one single image for the sps-docker container, and override the entrypoint that is invoked by the cwl_dag_modular.py

I will approve and merge this PR for now but we might want to consider implementing point 3) above after the holidays.

@LucaCinquini LucaCinquini merged commit e4cc4e4 into develop Dec 20, 2024
2 checks passed
@LucaCinquini LucaCinquini deleted the 220-stage-in-task branch December 20, 2024 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants