Skip to content

Commit

Permalink
working on troubleshooting tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
tclose committed Feb 3, 2025
1 parent 30d0a7c commit 54dc092
Showing 1 changed file with 49 additions and 31 deletions.
80 changes: 49 additions & 31 deletions new-docs/source/tutorial/3-troubleshooting.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,24 @@
"avoid common pitfalls."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Things to check if Pydra gets stuck\n",
"\n",
"I There are a number of common gotchas, related to running multi-process code, that can\n",
"cause Pydra workflows to get stuck and not execute correctly. If using the concurrent\n",
"futures worker (e.g. `worker=\"cf\"`), check these issues first before filing a bug report\n",
"or reaching out for help.\n",
"\n",
"### Applying `nest_asyncio` when running within a notebook\n",
"\n",
"When using the concurrent futures worker within a Jupyter notebook you need to apply\n",
"`nest_asyncio` with the following lines"
]
},
{
"cell_type": "code",
"execution_count": 3,
Expand All @@ -25,21 +43,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Enclosing multi-process code within `if __name__ == \"__main__\"`\n",
"\n",
"## Things to check first\n",
"\n",
"### Running in *debug* mode\n",
"\n",
"By default, Pydra will run with the *debug* worker, which executes each task serially\n",
"within a single process without use of `async/await` blocks, to allow raised exceptions\n",
"to propagate gracefully to the calling code. If you are having trouble with a pipeline,\n",
"ensure that `worker=debug` is passed to the submission/execution call (the default).\n",
"\n",
"\n",
"## Enclosing multi-process code within `if __name__ == \"__main__\"`\n",
"\n",
"If using the concurrent futures worker (`worker=\"cf\"`) on macOS or Windows, then you need\n",
"to enclose top-level scripts within `if __name__ == \"__main__\"` blocks, e.g."
"If running a script that executes a workflow with the concurrent futures worker\n",
"(i.e. `worker=\"cf\"`) on macOS or Windows, then the submissing/execution call needs to\n",
"be enclosed within a `if __name__ == \"__main__\"` blocks, e.g."
]
},
{
Expand All @@ -63,7 +71,6 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### Remove stray lockfiles\n",
"\n",
"During the execution of a task, a lockfile is generated to signify that a task is running.\n",
Expand All @@ -77,14 +84,27 @@
"If the `clean_stale_locks` flag is set (by default when using the *debug* worker), locks that\n",
"were created before the outer task was submitted are removed before the task is run.\n",
"However, since these locks could be created by separate submission processes, ``clean_stale_locks`\n",
"is not switched on by default when using production workers (e.g. `cf`, `slurm`, etc...).\n",
"is not switched on by default when using production workers (e.g. `cf`, `slurm`, etc...)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Finding errors\n",
"\n",
"### Running in *debug* mode\n",
"\n",
"By default, Pydra will run with the *debug* worker, which executes each task serially\n",
"within a single process without use of `async/await` blocks, to allow raised exceptions\n",
"to propagate gracefully to the calling code. If you are having trouble with a pipeline,\n",
"ensure that `worker=debug` is passed to the submission/execution call (the default).\n",
"\n",
"## Locating error messages\n",
"### Reading error files\n",
"\n",
"If running in debug mode (the default), runtime exceptions will be raised to the\n",
"call shell or debugger. However, when using asynchronous workers the errors will\n",
"be saved in `_error.pklz` pickle files inside the task's cache directory. For\n",
"example, given the following toy example"
"When a task raises an error, it is captured and saved in pickle file named `_error.pklz`\n",
"within task's cache directory. For example, when calling the toy `UnsafeDivisionWorkflow`\n",
"with a `denominator=0`, the task will fail."
]
},
{
Expand All @@ -93,18 +113,11 @@
"metadata": {},
"outputs": [],
"source": [
"from pydra.tasks.testing import UnsafeDivisionWorkflow\n",
"from pydra.engine.submitter import Submitter\n",
"import nest_asyncio\n",
"\n",
"# This is needed to run parallel workflows in Jupyter notebooks\n",
"nest_asyncio.apply()\n",
"\n",
"# This workflow will fail because we are trying to divide by 0\n",
"failing_workflow = UnsafeDivisionWorkflow(a=10, b=5).split(denominator=[3, 2 ,0])\n",
"wf = UnsafeDivisionWorkflow(a=10, b=5).split(denominator=[3, 2 ,0])\n",
"\n",
"with Submitter(worker=\"cf\") as sub:\n",
" result = sub(failing_workflow)\n",
" result = sub(wf)\n",
" \n",
"if result.errored:\n",
" print(\"Workflow failed with errors:\\n\" + str(result.errors))\n",
Expand All @@ -122,7 +135,12 @@
"the novel nature and of scientific experiments and known artefacts that can occur.\n",
"Therefore, it is always to sanity-check results produced by workflows. When a problem\n",
"occurs in a multi-stage workflow it can be difficult to identify at which stage the\n",
"issue occurred."
"issue occurred.\n",
"\n",
"Currently in Pydra you need to step backwards through the tasks of the workflow, load\n",
"the saved task object and inspect its inputs to find the preceding nodes. If any of the\n",
"inputs that have been generated by previous nodes are not ok, then you should check the\n",
"tasks that generated them in turn."
]
},
{
Expand Down

0 comments on commit 54dc092

Please sign in to comment.