-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate logging and errors for CWL workflows #227
Comments
I created a nested workflow called cwl_dag_data.cwl which calls cwl_dag_stage_in.cwl, cwl_dag_process.cwl, and cwl_dag_stage_out.cwl. Test 1) normal operations: I ran
Test 2) Normal operations: I ran
Sample logs: [2024-10-15, 21:33:30 UTC] {pod_manager.py:472} INFO - [base] INFO [job stage_in] Data to stage in: data --> staged in
[2024-10-15, 21:33:30 UTC] {pod_manager.py:472} INFO - [base] INFO [job process] Data to process: data --> staged in
[2024-10-15, 21:33:31 UTC] {pod_manager.py:472} INFO - [base] INFO [job stage_out] Data to stage out: data --> staged in --> processed Test 3) Error operations: I modified cwl_dag_stage_out.cwl so that it threw an error during execution.
Sample logs: [2024-10-15, 21:44:08 UTC] {pod_manager.py:472} INFO - [base] ERROR [job stage_out] Job error:
[2024-10-15, 21:44:08 UTC] {pod_manager.py:472} INFO - [base] ("Error collecting output for parameter 'stage_out_file': [https://raw.githubusercontent.com/unity-sds/unity-sps-workflows/refs/heads/227-investigate-cwl/demos/cwl_dag_data_stage_out.cwl:27:7:](https://raw.githubusercontent.com/unity-sds/unity-sps-workflows/refs/heads/227-investigate-cwl/demos/cwl_dag_data_stage_out.cwl:27:7:) Did not find output file with glob pattern: ['stage_out.txt'].", {})
[2024-10-15, 21:44:12 UTC] {taskinstance.py:3301} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 767, in _execute_task result = _execute_callable(context=context, **execute_callable_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 733, in _execute_callable return ExecutionCallableRunner( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/operator_helpers.py", line 252, in run return self.func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/baseoperator.py", line 406, in wrapper return func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 592, in execute return self.execute_sync(context) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 642, in execute_sync self.cleanup( File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 912, in cleanup raise AirflowException( airflow.exceptions.AirflowException: Pod cwl-task-pod-xo3tp7hf returned a failure. @LucaCinquini - Is this what you had in mind? Should we test anything else? |
@nikki-t : this is very useful. I think the result of this investigation can be summarized as follows - please correct me if I am wrong: o) CWL will either capture stdout and stderr in specific files if coded to do so, or it will echo them to the default Unix stdout and stderr streams. When running CWL through Airflow as we do, the stdout and stderr streams become the task logs, which is exactly what we want. So, in general, we should instruct users to NOT capture stdout and stderr as files. o) The job log files are permanently stored to S3, where they can be retrieved long term by the users, if they want, see exhibit #1. o) In case of error, the error message is indeed capture in the log (see exhibit #2), although it is mixed up with other CWL error messages. In short, I think the current behavior of SPP/Airflow with respect to logs and errors is what we want, thanks for taking the time to conduct this experiment. |
Create a simple CWL workflow composed of mock stage-in + process + stage-out steps.
Investigate:
o How to return standard output and error in the Airflow UI
o What happens if one of the steps fails - is all standard output and error lost?
The text was updated successfully, but these errors were encountered: