pyiron · jan-janssen · Nov 20, 2024 · Nov 12, 2024 · Nov 12, 2024 · Nov 12, 2024
@@ -0,0 +1,11 @@
+#!/bin/bash
+# execute notebooks
+i=0;
+for notebook in $(ls notebooks/*.ipynb); do
+    papermill ${notebook} ${notebook%.*}-out.${notebook##*.} -k python3 || i=$((i+1));
+done;
-for notebook in $(ls notebooks/*.ipynb); do
-    papermill ${notebook} ${notebook%.*}-out.${notebook##*.} -k python3 || i=$((i+1));
-done;
+# Create a log directory
+mkdir -p logs
+
+# Use shell globbing instead of ls
+for notebook in notebooks/*.ipynb; do
+    if [ ! -f "$notebook" ]; then
+        continue  # Skip if no notebooks found
+    fi
+    
+    echo "Processing: $notebook"
+    base_name=$(basename "$notebook" .ipynb)
+    output_file="${notebook%.*}-out.${notebook##*.}"
+    log_file="logs/${base_name}.log"
+    
+    if ! papermill "$notebook" "$output_file" -k python3 2>&1 | tee "$log_file"; then
+        echo "Failed to execute: $notebook" >&2
+        i=$((i+1))
+    fi
+done
-for notebook in $(ls notebooks/*.ipynb); do
-    papermill ${notebook} ${notebook%.*}-out.${notebook##*.} -k python3 || i=$((i+1));
-done;
+# Create a log directory
+mkdir -p logs
+
+# Use shell globbing instead of ls
+for notebook in notebooks/*.ipynb; do
+    if [ ! -f "$notebook" ]; then
+        continue  # Skip if no notebooks found
+    fi
+    
+    echo "Processing: $notebook"
+    base_name=$(basename "$notebook" .ipynb)
+    output_file="${notebook%.*}-out.${notebook##*.}"
+    log_file="logs/${base_name}.log"
+    
+    if ! papermill "$notebook" "$output_file" -k python3 2>&1 | tee "$log_file"; then
+        echo "Failed to execute: $notebook" >&2
+        i=$((i+1))
+    fi
+done
+
+# push error to next level
+if [ $i -gt 0 ]; then
+    exit 1;
+fi;
-# push error to next level
-if [ $i -gt 0 ]; then
-    exit 1;
-fi;
+# Print execution summary
+total=$(find notebooks -name "*.ipynb" | wc -l)
+successful=$((total - i))
+
+echo "Notebook Execution Summary:"
+echo "-------------------------"
+echo "Total notebooks: $total"
+echo "Successfully executed: $successful"
+echo "Failed executions: $i"
+
+if [ $i -gt 0 ]; then
+    echo "Error: $i notebook(s) failed execution. Check logs directory for details." >&2
+    exit 1
+fi
+
+echo "All notebooks executed successfully!"
-# push error to next level
-if [ $i -gt 0 ]; then
-    exit 1;
-fi;
+# Print execution summary
+total=$(find notebooks -name "*.ipynb" | wc -l)
+successful=$((total - i))
+
+echo "Notebook Execution Summary:"
+echo "-------------------------"
+echo "Total notebooks: $total"
+echo "Successfully executed: $successful"
+echo "Failed executions: $i"
+
+if [ $i -gt 0 ]; then
+    echo "Error: $i notebook(s) failed execution. Check logs directory for details." >&2
+    exit 1
+fi
+
+echo "All notebooks executed successfully!"
@@ -34,4 +34,4 @@ jobs:
         timeout-minutes: 5
         run: >
           flux start
-          papermill notebooks/examples.ipynb examples-out.ipynb -k "python3"
+          .ci_support/build_notebooks.sh
@@ -3,111 +3,122 @@
 [![Coverage Status](https://coveralls.io/repos/github/pyiron/executorlib/badge.svg?branch=main)](https://coveralls.io/github/pyiron/executorlib?branch=main)
 [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pyiron/executorlib/HEAD?labpath=notebooks%2Fexamples.ipynb)
 
-## Challenges
-In high performance computing (HPC) the Python programming language is commonly used as high-level language to
-orchestrate the coupling of scientific applications. Still the efficient usage of highly parallel HPC clusters remains
-challenging, in primarily three aspects:
-
-* **Communication**: Distributing python function calls over hundreds of compute node and gathering the results on a
-  shared file system is technically possible, but highly inefficient. A socket-based communication approach is 
-  preferable.
-* **Resource Management**: Assigning Python functions to GPUs or executing Python functions on multiple CPUs using the 
-  message passing interface (MPI) requires major modifications to the python workflow.
-* **Integration**: Existing workflow libraries implement a secondary the job management on the Python level rather than
-  leveraging the existing infrastructure provided by the job scheduler of the HPC.
-
-### executorlib is ...
-In a given HPC allocation the `executorlib` library addresses these challenges by extending the Executor interface
-of the standard Python library to support the resource assignment in the HPC context. Computing resources can either be
-assigned on a per function call basis or as a block allocation on a per Executor basis. The `executorlib` library
-is built on top of the [flux-framework](https://flux-framework.org) to enable fine-grained resource assignment. In
-addition, [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) is supported as alternative
-queuing system and for workstation installations `executorlib` can be installed without a job scheduler.
-
-### executorlib is not ...
-The executorlib library is not designed to request an allocation from the job scheduler of an HPC. Instead within a given
-allocation from the job scheduler the `executorlib` library can be employed to distribute a series of python
-function calls over the available computing resources to achieve maximum computing resource utilization.
-
-## Example
-The following examples illustrates how `executorlib` can be used to distribute a series of MPI parallel function calls 
-within a queuing system allocation. `example.py`:
+Up-scale python functions for high performance computing (HPC) with executorlib. 
+
+## Key Features
+* **Up-scale your Python functions beyond a single computer.** - executorlib extends the [Executor interface](https://docs.python.org/3/library/concurrent.futures.html#executor-objects)
+  from the Python standard library and combines it with job schedulers for high performance computing (HPC) including 
+  the [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) and [flux](http://flux-framework.org). 
+  With this combination executorlib allows users to distribute their Python functions over multiple compute nodes.
+* **Parallelize your Python program one function at a time** - executorlib allows users to assign dedicated computing
+  resources like CPU cores, threads or GPUs to one Python function call at a time. So you can accelerate your Python 
+  code function by function.
+* **Permanent caching of intermediate results to accelerate rapid prototyping** - To accelerate the development of 
+  machine learning pipelines and simulation workflows executorlib provides optional caching of intermediate results for 
+  iterative development in interactive environments like jupyter notebooks.
+
+## Examples
+The Python standard library provides the [Executor interface](https://docs.python.org/3/library/concurrent.futures.html#executor-objects)
+with the [ProcessPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor) and the 
+[ThreadPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor) for parallel 
+execution of Python functions on a single computer. executorlib extends this functionality to distribute Python 
+functions over multiple computers within a high performance computing (HPC) cluster. This can be either achieved by 
+submitting each function as individual job to the HPC job scheduler - [HPC Submission Mode]() - or by requesting a 
+compute allocation of multiple nodes and then distribute the Python functions within this allocation - [HPC Allocation Mode](). 
+Finally, to accelerate the development process executorlib also provides a - [Local Mode]() - to use the executorlib 
+functionality on a single workstation for testing. Starting with the [Local Mode]() set by setting the backend parameter
+to local - `backend="local"`:
 ```python
-import flux.job
 from executorlib import Executor
 
+
+with Executor(backend="local") as exe:
+    future_lst = [exe.submit(sum, [i, i]) for i in range(1, 5)]
+    print([f.result() for f in future_lst])
+```
+In the same way executorlib can also execute Python functions which use additional computing resources, like multiple 
+CPU cores, CPU threads or GPUs. For example if the Python function internally uses the Message Passing Interface (MPI) 
+via the [mpi4py](https://mpi4py.readthedocs.io) Python libary: 
+```python
+from executorlib import Executor
+
+
 def calc(i):
     from mpi4py import MPI
+
     size = MPI.COMM_WORLD.Get_size()
     rank = MPI.COMM_WORLD.Get_rank()
     return i, size, rank
 
-with flux.job.FluxExecutor() as flux_exe:
-    with Executor(max_cores=2, executor=flux_exe, resource_dict={"cores": 2}) as exe:
-        fs = exe.submit(calc, 3)
-        print(fs.result())
-```
-This example can be executed using:
-```
-python example.py
-```
-Which returns:
-```
->>> [(0, 2, 0), (0, 2, 1)], [(1, 2, 0), (1, 2, 1)]
-```
-The important part in this example is that [mpi4py](https://mpi4py.readthedocs.io) is only used in the `calc()`
-function, not in the python script, consequently it is not necessary to call the script with `mpiexec` but instead
-a call with the regular python interpreter is sufficient. This highlights how `executorlib` allows the users to
-parallelize one function at a time and not having to convert their whole workflow to use [mpi4py](https://mpi4py.readthedocs.io).
-The same code can also be executed inside a jupyter notebook directly which enables an interactive development process.
-
-The interface of the standard [concurrent.futures.Executor](https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures)
-is extended by adding the option `cores_per_worker=2` to assign multiple MPI ranks to each function call. To create two 
-workers the maximum number of cores can be increased to `max_cores=4`. In this case each worker receives two cores
-resulting in a total of four CPU cores being utilized.
-
-After submitting the function `calc()` with the corresponding parameter to the executor `exe.submit(calc, 0)`
-a python [`concurrent.futures.Future`](https://docs.python.org/3/library/concurrent.futures.html#future-objects) is
-returned. Consequently, the `executorlib.Executor` can be used as a drop-in replacement for the
-[`concurrent.futures.Executor`](https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures)
-which allows the user to add parallelism to their workflow one function at a time.
-
-## Disclaimer
-While we try to develop a stable and reliable software library, the development remains a opensource project under the
-BSD 3-Clause License without any warranties::
+
+with Executor(backend="local") as exe:
+    fs = exe.submit(calc, 3, resource_dict={"cores": 2})
+    print(fs.result())
 ```
-BSD 3-Clause License
-
-Copyright (c) 2022, Jan Janssen
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are met:
-
-* Redistributions of source code must retain the above copyright notice, this
-  list of conditions and the following disclaimer.
-
-* Redistributions in binary form must reproduce the above copyright notice,
-  this list of conditions and the following disclaimer in the documentation
-  and/or other materials provided with the distribution.
-
-* Neither the name of the copyright holder nor the names of its
-  contributors may be used to endorse or promote products derived from
-  this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+The additional `resource_dict` parameter defines the computing resources allocated to the execution of the submitted 
+Python function. In addition to the compute cores `cores`, the resource dictionary can also define the threads per core
+as `threads_per_core`, the GPUs per core as `gpus_per_core`, the working directory with `cwd`, the option to use the
+OpenMPI oversubscribe feature with `openmpi_oversubscribe` and finally for the [Simple Linux Utility for Resource 
+Management (SLURM)](https://slurm.schedmd.com) queuing system the option to provide additional command line arguments 
+with the `slurm_cmd_args` parameter - [resource dictionary]().
+
+This flexibility to assign computing resources on a per-function-call basis simplifies the up-scaling of Python programs.
+Only the part of the Python functions which benefit from parallel execution are implemented as MPI parallel Python 
+funtions, while the rest of the program remains serial. 
+
+The same function can be submitted to the [SLURM](https://slurm.schedmd.com) queuing by just changing the `backend` 
+parameter to `slurm_submission`. The rest of the example remains the same, which highlights how executorlib accelerates
+the rapid prototyping and up-scaling of HPC Python programs. 
+```python
+from executorlib import Executor
+
+
+def calc(i):
+    from mpi4py import MPI
+
+    size = MPI.COMM_WORLD.Get_size()
+    rank = MPI.COMM_WORLD.Get_rank()
+    return i, size, rank
+
+
+with Executor(backend="slurm_submission") as exe:
+    fs = exe.submit(calc, 3, resource_dict={"cores": 2})
+    print(fs.result())
 ```
+In this case the [Python simple queuing system adapter (pysqa)](https://pysqa.readthedocs.io) is used to submit the 
+`calc()` function to the [SLURM](https://slurm.schedmd.com) job scheduler and request an allocation with two CPU cores 
+for the execution of the function - [HPC Submission Mode](). In the background the [sbatch](https://slurm.schedmd.com/sbatch.html) 
+command is used to request the allocation to execute the Python function. 
+
+Within a given [SLURM](https://slurm.schedmd.com) allocation executorlib can also be used to assign a subset of the 
+available computing resources to execute a given Python function. In terms of the [SLURM](https://slurm.schedmd.com) 
+commands, this functionality internally uses the [srun](https://slurm.schedmd.com/srun.html) command to receive a subset
+of the resources of a given queuing system allocation. 
+```python
+from executorlib import Executor
+
 
-# Documentation
+def calc(i):
+    from mpi4py import MPI
+
+    size = MPI.COMM_WORLD.Get_size()
+    rank = MPI.COMM_WORLD.Get_rank()
+    return i, size, rank
+
+
+with Executor(backend="slurm_allocation") as exe:
+    fs = exe.submit(calc, 3, resource_dict={"cores": 2})
+    print(fs.result())
+```
+In addition, to support for [SLURM](https://slurm.schedmd.com) executorlib also provides support for the hierarchical 
+[flux](http://flux-framework.org) job scheduler. The [flux](http://flux-framework.org) job scheduler is developed at 
+[Larwence Livermore National Laboratory](https://computing.llnl.gov/projects/flux-building-framework-resource-management)
+to address the needs for the up-coming generation of Exascale computers. Still even on traditional HPC clusters the 
+hierarchical approach of the [flux](http://flux-framework.org) is beneficial to distribute hundreds of tasks within a
+given allocation. Even when [SLURM](https://slurm.schedmd.com) is used as primary job scheduler of your HPC, it is 
+recommended to use [SLURM with flux]() as hierarchical job scheduler within the allocations. 
+
+## Documentation
 * [Installation](https://executorlib.readthedocs.io/en/latest/installation.html)
   * [Compatible Job Schedulers](https://executorlib.readthedocs.io/en/latest/installation.html#compatible-job-schedulers)
   * [executorlib with Flux Framework](https://executorlib.readthedocs.io/en/latest/installation.html#executorlib-with-flux-framework)

@@ -11,3 +11,8 @@ dependencies:
 - flux-pmix =0.5.0
 - versioneer =0.28
 - h5py =3.12.1
+- matplotlib =3.9.2
+- networkx =3.4.2
+- pygraphviz =1.14
+- pysqa =0.2.2
+- ipython =8.29.0
@@ -2,7 +2,9 @@ format: jb-book
 root: README
 chapters:
 - file: installation.md
-- file: examples.ipynb
-- file: development.md
+- file: 1-local.ipynb
+- file: 2-hpc-submission.ipynb
+- file: 3-hpc-allocation.ipynb
 - file: trouble_shooting.md
+- file: 4-developer.ipynb
 - file: api.rst