Needs for Ramble for Kubernetes #339

vsoch · 2023-12-05T21:57:23Z

vsoch
Dec 5, 2023

@alecbcs and I chat today about needs for Ramble / Benchpark in the context of cloud (Kubernetes) and since we don't have a CLA and cannot contribute (yet!) we want to include some of our notes here.

Workflow Tool or Something Else?

And how ramble might build containers

We talked about "What is ramble" - and we like the idea / use case of being able to run ramble from you local machine that submits jobs to Kubernetes. Part of this might include building containers (e.g., with a spack or other base) and then also submitting Jobs for them. However, what we can't do is submit something to Kubernetes, already having the cluster, and wait for any kind of build. So we are proposing the following idea:

Stage 1: Prepare Container Bases

# init the experiment, assuming one experiment == one container
# This will setup the workspace by way of building a container, with some default base set in settings (ramble.yaml)
ramble build container ghcr.io/dinosaur-is-the-best/my-experiment:latest

Note that this could be run in CI, meaning that we don't have the result / output files for the various configs. The reason we don't have ramble workspace setup is because we aren't going to be saving the experiment yaml config files. ramble build would share logic with ramble workspace setup and a user could do both steps by way of:

ramble workspace setup --build

Also note that we are doing ramble build instead of ramble workspace build because potentially we could build other things.
Potential output:

=> You have successfully built my-experiment into ghcr.io/dinosaur-is-the-best/my-experiment:latest

# Then I (as the user) push to a registry. Note that this could be done in a CI workflow
docker push ghcr.io/dinosaur-is-the-best/my-experiment:latest

Stage 2: Run Workflow on Kubernetes

First do steps to get your cluster. Once you have your cluster nodes, still from our local machine

# Here we've already built our containers in CI, so we don't need to add `--build` or run `ramble build`.
# This would generate configs / some logic TBA for Kubernetes
# Possibly there should be a flag to ask to generate a specific running / executor (e.g., flux or slurm or kubernetes)
ramble workspace setup --container ghcr.io/dinosaur-is-the-best/my-experiment:latest 

# Then ramble on should be able to also take a specific configuration file to run!
# This ui could look like many things this is just one way.
ramble on --executor gke

Ramble would then submit Kubernetes jobs (or other abstractions like operators) associated with each experiment. The containers would be deployed, generate some result, and that would be saved to a mounted RWX storage or some other artifact cache.

Pulling Results

After an experiment has run (that I've submit from my laptop) how do I get results? If the workflow above has a "push" action to some namespaced registry or storage, then ramble could have an equivalent "pull" or some derivative of "ramble analyze" to get the results and then analyze them.

To be clear, in the above:

ramble on is not run in an application container
we can build a pre-cache of ramble application containers, provide configs for them on different platforms
it could work for the compute engine use case with a custom script / logic to submit

The complexity of the above is really the number of different compute APIs (from VMs to Kubernetes) that warrant being submitted to - it's not always just a simple script.

@douglasjacobsen and @pearce8 let us know what you think! We can't contribute directly but if there is an indirect way we can try / test that would be great.

douglasjacobsen · 2023-12-11T22:40:10Z

douglasjacobsen
Dec 11, 2023
Maintainer

@vsoch and @alecbcs: Thanks for putting this together.

I have a few questions:

What is in the container? This looks like you almost build one primary container that contains all of the software for all of the experiments you want, but maybe you are creating a single container per experiment that only has the software needed for that specific experiment? At which point, what about the inputs?
Why are you proposing that ramble on not be run inside the container?
Do you need to cache the application containers? and how is that different from caching the binaries and having the container call:

ramble workspace setup --where '{experiment_index} == <my_index>'
ramble on --where '{experiment_index} == <my_index>'

?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Needs for Ramble for Kubernetes #339

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Needs for Ramble for Kubernetes #339

vsoch Dec 5, 2023

Workflow Tool or Something Else?

Stage 1: Prepare Container Bases

Stage 2: Run Workflow on Kubernetes

Pulling Results

Replies: 1 comment

douglasjacobsen Dec 11, 2023 Maintainer

vsoch
Dec 5, 2023

douglasjacobsen
Dec 11, 2023
Maintainer