Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to run examples on local machine using processes #507

Merged
merged 1 commit into from
Jul 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/getting-started/demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ array([[ 2, 3, 4],
[ 8, 9, 10]])
```

See the [examples README](https://github.com/tomwhite/cubed/tree/main/examples/README.md) for examples that run on cloud services.
See the [examples README](https://github.com/cubed-dev/cubed/blob/main/examples/README.md) for more examples that run on a single multi-core machine, or in the cloud.
10 changes: 7 additions & 3 deletions docs/user-guide/executors.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,16 @@ Cubed arrays are backed by Zarr arrays, and every chunk in the Zarr array is com

Cubed provides a variety of executors for running the tasks in a computation, which are discussed below. Executors are also sometimes referred to as runtimes.

## Local Python executor
## Local single-machine executors

If you don't specify an executor then the local in-process Python executor is used. This is a very simple, single-threaded executor (called {py:class}`PythonDagExecutor <cubed.runtime.executors.python.PythonDagExecutor>`) that is intended for testing on small amounts of data before running larger computations using a cloud service.
If you don't specify an executor then the local in-process single-threaded Python executor is used. This is a very simple executor (called `single-threaded`) that is intended for testing on small amounts of data before running larger computations using the `processes` executor on a single machine, or a distributed executor in the cloud.

The `processes` executor runs on a single machine, and uses all the cores on the machine. It doesn't require any set up so it is useful for quickly getting started and running on datasets that don't fit in memory, but can fit on a single machine's disk.

## Which cloud service executor should I use?

When it comes to scaling out, there are a number of executors that work in the cloud.

[**Lithops**](https://lithops-cloud.github.io/) is the executor we recommend for most users, since it has had the most testing so far (~1000 workers).
If your data is in Amazon S3 then use Lithops with AWS Lambda, and if it's in GCS use Lithops with Google Cloud Functions. You have to build a runtime environment as a part of the setting up process.

Expand All @@ -33,4 +37,4 @@ spec = cubed.Spec(
)
```

A default spec may also be configured using a YAML file. The [examples](https://github.com/tomwhite/cubed/tree/main/examples/README.md) show this in more detail for all of the cloud services described above.
A default spec may also be configured using a YAML file. The [examples](https://github.com/cubed-dev/cubed/blob/main/examples/README.md) show this in more detail for all of the executors described above.
36 changes: 25 additions & 11 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# Examples

## Which executor should I use?
## Running on a local machine

The `processes` executor is the recommended executor for running on a single machine, since it can use all the cores on the machine.

## Which cloud service executor should I use?

When it comes to scaling out, there are a number of executors that work in the cloud.

[**Lithops**](https://lithops-cloud.github.io/) is the executor we recommend for most users, since it has had the most testing so far (~1000 workers).
If your data is in Amazon S3 then use Lithops with AWS Lambda, and if it's in GCS use Lithops with Google Cloud Functions. You have to build a runtime environment as a part of the setting up process.
Expand All @@ -13,21 +19,29 @@ If your data is in Amazon S3 then use Lithops with AWS Lambda, and if it's in GC

## Set up

Follow the instructions for setting up Cubed to run on your chosen cloud and executor runtime:
Follow the instructions for setting up Cubed to run on your executor runtime:

| Executor | Cloud | Set up instructions |
|----------|--------|--------------------------------------------------------------|
| Lithops | AWS | [lithops/aws/README.md](lithops/aws/README.md) |
| | Google | [lithops/gcp/README.md](lithops/gcp/README.md) |
| Modal | AWS | [modal/aws/README.md](modal/aws/README.md) |
| | Google | [modal/gcp/README.md](modal/gcp/README.md) |
| Coiled | AWS | [coiled/aws/README.md](coiled/aws/README.md) |
| Beam | Google | [dataflow/README.md](dataflow/README.md) |
| Executor | Cloud | Set up instructions |
|-----------|--------|------------------------------------------------|
| Processes | N/A | N/A |
| Lithops | AWS | [lithops/aws/README.md](lithops/aws/README.md) |
| | Google | [lithops/gcp/README.md](lithops/gcp/README.md) |
| Modal | AWS | [modal/aws/README.md](modal/aws/README.md) |
| | Google | [modal/gcp/README.md](modal/gcp/README.md) |
| Coiled | AWS | [coiled/aws/README.md](coiled/aws/README.md) |
| Beam | Google | [dataflow/README.md](dataflow/README.md) |

## Examples

The `add-asarray.py` script is a small example that adds two small 4x4 arrays together, and is useful for checking that the runtime is working.
Export `CUBED_CONFIG` as described in the set up instructions, then run the script. This is for Lithops on AWS:
Export `CUBED_CONFIG` as described in the set up instructions, then run the script. This is for running on the local machine using the `processes` executor:

```shell
export CUBED_CONFIG=$(pwd)/processes
python add-asarray.py
```

This is for Lithops on AWS:

```shell
export CUBED_CONFIG=$(pwd)/lithops/aws
Expand Down
3 changes: 3 additions & 0 deletions examples/processes/cubed.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
spec:
allowed_mem: "2GB"
executor_name: "processes"
Loading