Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dask: do not duplicate worker logs across workflow jobs #627

Open
tiborsimko opened this issue Jan 27, 2025 · 0 comments
Open

dask: do not duplicate worker logs across workflow jobs #627

tiborsimko opened this issue Jan 27, 2025 · 0 comments

Comments

@tiborsimko
Copy link
Member

Current behaviour

Consider reana-demo-dask-coffea analysis example with several serial steps:

diff --git a/reana.yaml b/reana.yaml
index 340d6d0..7adec7c 100644
--- a/reana.yaml
+++ b/reana.yaml
@@ -12,6 +13,10 @@ workflow:
         environment: docker.io/coffeateam/coffea-dask-cc7:0.7.22-py3.10-g7f049
         commands:
           - python analysis.py
+      - name: sleep
+        environment: docker.io/coffeateam/coffea-dask-cc7:0.7.22-py3.10-g7f049
+        commands:
+          - sleep 10
 outputs:
   files:
     - histogram.png

That is, we have another step called "sleep" that does not do anything. However, when a user asks for job logs, one gets the previous Dask ones:

$ rcg logs -w dask-coffea-serial-kubernetes --filter step=sleep | tail -10
2025-01-27 13:08:50,599 - distributed._signals - INFO - Received signal SIGTERM (15)
2025-01-27 13:08:50,599 - distributed.nanny - INFO - Closing Nanny at 'tcp://10.244.0.194:35675'. Reason: nanny-close
2025-01-27 13:08:50,599 - distributed.nanny - INFO - Nanny asking worker to close. Reason: nanny-close
2025-01-27 13:08:50,599 - distributed._signals - INFO - Received signal SIGTERM (15)
2025-01-27 13:08:50,599 - distributed.nanny - INFO - Closing Nanny at 'tcp://10.244.0.193:38655'. Reason: nanny-close
2025-01-27 13:08:50,599 - distributed.nanny - INFO - Nanny asking worker to close. Reason: nanny-close
2025-01-27 13:08:50,600 - distributed.nanny - INFO - Worker closed
2025-01-27 13:08:50,821 - distributed.dask_worker - INFO - End worker
2025-01-27 13:08:51,341 - distributed.dask_worker - INFO - End worker

This leads to a duplication of Dask log messages:

$ rcg logs -w dask-coffea-serial-kubernetes.2 | grep 'End worker'
2025-01-27 13:08:50,821 - distributed.dask_worker - INFO - End worker
2025-01-27 13:08:51,341 - distributed.dask_worker - INFO - End worker
2025-01-27 13:08:50,821 - distributed.dask_worker - INFO - End worker
2025-01-27 13:08:51,341 - distributed.dask_worker - INFO - End worker

This behaviour is a consequence of how the Dask live logs are collected and exposed by "enriching" individual job logs.

This can be troublesome for very verbose Dask analyses where the workflow contains other non-Dask workflow steps, for example a Snakemake file doing some further pre- or post- processing.

This is fortunately rare, since people relying on Dask would tend to use it for everything, but we have to be ready to tackle this problem.

Expected behaviour

The logs from various workers should not be duplicated.

For example, we could try to avoid duplication by storing logs in one job only.

However, ideally, since Dask cluster is a resource that is alive throughout the workflow execution, the logs should be collected (and exposed) separately from job logs, so that we would have a triad: Workflow logs, Job logs, and Service logs.

Since this would require a larger architectural change, this was postponed to a future refactoring.

This issue is here as an epic to remind doing this in the future.

Notes

Stems from a past pull request review #616 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready for work
Development

No branches or pull requests

1 participant