From 4e9317a2fd28308ad72973a1118000070f242c6a Mon Sep 17 00:00:00 2001 From: Tom White Date: Fri, 19 Jul 2024 15:04:17 +0100 Subject: [PATCH] Document how to run examples on local machine using the `processes` executor --- docs/getting-started/demo.md | 2 +- docs/user-guide/executors.md | 10 +++++++--- examples/README.md | 36 ++++++++++++++++++++++++----------- examples/processes/cubed.yaml | 3 +++ 4 files changed, 36 insertions(+), 15 deletions(-) create mode 100644 examples/processes/cubed.yaml diff --git a/docs/getting-started/demo.md b/docs/getting-started/demo.md index b91b2980..38c2369a 100644 --- a/docs/getting-started/demo.md +++ b/docs/getting-started/demo.md @@ -32,4 +32,4 @@ array([[ 2, 3, 4], [ 8, 9, 10]]) ``` -See the [examples README](https://github.com/tomwhite/cubed/tree/main/examples/README.md) for examples that run on cloud services. +See the [examples README](https://github.com/cubed-dev/cubed/blob/main/examples/README.md) for more examples that run on a single multi-core machine, or in the cloud. diff --git a/docs/user-guide/executors.md b/docs/user-guide/executors.md index 103a2df6..aca02852 100644 --- a/docs/user-guide/executors.md +++ b/docs/user-guide/executors.md @@ -4,12 +4,16 @@ Cubed arrays are backed by Zarr arrays, and every chunk in the Zarr array is com Cubed provides a variety of executors for running the tasks in a computation, which are discussed below. Executors are also sometimes referred to as runtimes. -## Local Python executor +## Local single-machine executors -If you don't specify an executor then the local in-process Python executor is used. This is a very simple, single-threaded executor (called {py:class}`PythonDagExecutor `) that is intended for testing on small amounts of data before running larger computations using a cloud service. +If you don't specify an executor then the local in-process single-threaded Python executor is used. This is a very simple executor (called `single-threaded`) that is intended for testing on small amounts of data before running larger computations using the `processes` executor on a single machine, or a distributed executor in the cloud. + +The `processes` executor runs on a single machine, and uses all the cores on the machine. It doesn't require any set up so it is useful for quickly getting started and running on datasets that don't fit in memory, but can fit on a single machine's disk. ## Which cloud service executor should I use? +When it comes to scaling out, there are a number of executors that work in the cloud. + [**Lithops**](https://lithops-cloud.github.io/) is the executor we recommend for most users, since it has had the most testing so far (~1000 workers). If your data is in Amazon S3 then use Lithops with AWS Lambda, and if it's in GCS use Lithops with Google Cloud Functions. You have to build a runtime environment as a part of the setting up process. @@ -33,4 +37,4 @@ spec = cubed.Spec( ) ``` -A default spec may also be configured using a YAML file. The [examples](https://github.com/tomwhite/cubed/tree/main/examples/README.md) show this in more detail for all of the cloud services described above. +A default spec may also be configured using a YAML file. The [examples](https://github.com/cubed-dev/cubed/blob/main/examples/README.md) show this in more detail for all of the executors described above. diff --git a/examples/README.md b/examples/README.md index bd26f3af..0b7b5a07 100644 --- a/examples/README.md +++ b/examples/README.md @@ -1,6 +1,12 @@ # Examples -## Which executor should I use? +## Running on a local machine + +The `processes` executor is the recommended executor for running on a single machine, since it can use all the cores on the machine. + +## Which cloud service executor should I use? + +When it comes to scaling out, there are a number of executors that work in the cloud. [**Lithops**](https://lithops-cloud.github.io/) is the executor we recommend for most users, since it has had the most testing so far (~1000 workers). If your data is in Amazon S3 then use Lithops with AWS Lambda, and if it's in GCS use Lithops with Google Cloud Functions. You have to build a runtime environment as a part of the setting up process. @@ -13,21 +19,29 @@ If your data is in Amazon S3 then use Lithops with AWS Lambda, and if it's in GC ## Set up -Follow the instructions for setting up Cubed to run on your chosen cloud and executor runtime: +Follow the instructions for setting up Cubed to run on your executor runtime: -| Executor | Cloud | Set up instructions | -|----------|--------|--------------------------------------------------------------| -| Lithops | AWS | [lithops/aws/README.md](lithops/aws/README.md) | -| | Google | [lithops/gcp/README.md](lithops/gcp/README.md) | -| Modal | AWS | [modal/aws/README.md](modal/aws/README.md) | -| | Google | [modal/gcp/README.md](modal/gcp/README.md) | -| Coiled | AWS | [coiled/aws/README.md](coiled/aws/README.md) | -| Beam | Google | [dataflow/README.md](dataflow/README.md) | +| Executor | Cloud | Set up instructions | +|-----------|--------|------------------------------------------------| +| Processes | N/A | N/A | +| Lithops | AWS | [lithops/aws/README.md](lithops/aws/README.md) | +| | Google | [lithops/gcp/README.md](lithops/gcp/README.md) | +| Modal | AWS | [modal/aws/README.md](modal/aws/README.md) | +| | Google | [modal/gcp/README.md](modal/gcp/README.md) | +| Coiled | AWS | [coiled/aws/README.md](coiled/aws/README.md) | +| Beam | Google | [dataflow/README.md](dataflow/README.md) | ## Examples The `add-asarray.py` script is a small example that adds two small 4x4 arrays together, and is useful for checking that the runtime is working. -Export `CUBED_CONFIG` as described in the set up instructions, then run the script. This is for Lithops on AWS: +Export `CUBED_CONFIG` as described in the set up instructions, then run the script. This is for running on the local machine using the `processes` executor: + +```shell +export CUBED_CONFIG=$(pwd)/processes +python add-asarray.py +``` + +This is for Lithops on AWS: ```shell export CUBED_CONFIG=$(pwd)/lithops/aws diff --git a/examples/processes/cubed.yaml b/examples/processes/cubed.yaml new file mode 100644 index 00000000..59508815 --- /dev/null +++ b/examples/processes/cubed.yaml @@ -0,0 +1,3 @@ +spec: + allowed_mem: "2GB" + executor_name: "processes"