Load balancing multiple Flux instances #3656

jameshcorbett · 2021-05-14T19:22:08Z

jameshcorbett
May 14, 2021
Maintainer

While working on Flux/Parsl integration (Parsl is a popular parallel scripting library for Python), the Parsl team expressed interest in taking jobs from a single stream and distributing them to multiple Flux instances. If all jobs and all Flux instances were equal, this would be pretty straightforward: for N flux instances, distribute job K to Flux instance K % N. But I don't think there is any guarantee of either of those two things.

The use-case is that when Parsl is feeling overwhelmed it will request more nodes from a resource manager and launch itself (and a new Flux instance) within those nodes. So you might end up with a bunch of different allocations and Flux instances of various sizes.

How could someone go about load-balancing multiple Flux instances in an intelligent way? I think the most useful information would be a) whether a Flux instance has a backlog of jobs and b) how long it would take (worst case) to work through all of the current jobs. But I don't think b) can be known since many jobs don't have time limits.

One hacky way of load balancing would be to calculate the percentage of running jobs relative to the number of incomplete jobs. Like len(job for job in jobs where job.state == RUNNING) / len(job for job in jobs where job.state in (RUNNING, PENDING, NEW)). But that could get very inaccurate in certain cases.

grondo · 2021-05-14T21:04:08Z

grondo
May 14, 2021
Maintainer

Hm, I don't have a good answer. However, you could more quickly calculate the stats you are talking about using the flux.job.JobStats object (basically the result of the job-list.job-stats RPC).

Without knowing the time limit of the active jobs, though, it would be difficult to make accurate predictions.

I wonder if we could do some groundwork to expose a busyness score (perhaps provided by the scheduler), maybe something like a Unix system load average?

0 replies

SteVwonder · 2021-05-14T22:44:02Z

SteVwonder
May 14, 2021
Maintainer

b) how long it would take (worst case) to work through all of the current jobs. But I don't think b) can be known since many jobs don't have time limits.

I agree that if you had timelimits, it would then just be keeping the number of node-hours uniform:

pending_node_hour_reqs = sum([job.nnodes * (job.timelimit.hours - job.elapsed.hours) for job in active_jobs])

If you don't have timelimits on the jobs, another thought would be to try and keep the number of nodes either in use or requested by jobs in each instance roughly the same. Psuedo-code:

pending_node_reqs = sum([job.nnodes for job in active_jobs])

Another thought is, can Parsl use a "pull" model (where Flux instances notify when they can handle another unit of work) as opposed to a "push"model (where Parsl has to send a job somewhere the moment it is created)?

3 replies

benclifford May 18, 2021

(i'm a parsl developer)

can Parsl use a "pull" model (where Flux instances notify when they can handle another unit of work) as opposed to a "push"model (where Parsl has to send a job somewhere the moment it is created)?

The layering we have is the parsl core picks an "executor" (a module implemented in python) which it then sends (python) tasks to as soon as it decides they are ready to run (for example, all dependencies are satisfied): those "executors" are responsible as far as the parsl core is concerned with getting the python task run however they want to.

We have one executor implementation using Work Queue - https://cctools.readthedocs.io/en/latest/work_queue/ - where the executor code manages a queue of tasks for Work Queue, and the code running on each node gets tasks when that specific node is ready to run code.

So that pull model is absolutely possible (and in use regularly) - but it isn't the responsibility of the parsl core code. Instead it is the responsibility of whatever lies the other side of the executor API.

This multi-batch-job submission isn't a requirement of parsl to run jobs though - it's just something that's often been useful on batch systems when running with parsl: if it's something the specific executor doesn't support, then so be it - different executor implementations are expected to have different properties (that's why you'd pick one over another in any given situation).

dongahn May 18, 2021
Maintainer

This might be a good use case for Flux elasticity? We don't have a production solution just yet but when the additional resources are allocated, they can be attached to the existing flux instance as a part of its grow operation. Then, the jobs will still come to the initial Flux instance and additional load balancing of dealing with multiple flux instances isn't needed. This would be similar to cloud bursting.

jameshcorbett May 18, 2021
Maintainer Author

As far as I can tell, that would be a perfect fit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load balancing multiple Flux instances #3656

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Load balancing multiple Flux instances #3656

jameshcorbett May 14, 2021 Maintainer

Replies: 2 comments · 3 replies

grondo May 14, 2021 Maintainer

SteVwonder May 14, 2021 Maintainer

benclifford May 18, 2021

dongahn May 18, 2021 Maintainer

jameshcorbett May 18, 2021 Maintainer Author

jameshcorbett
May 14, 2021
Maintainer

Replies: 2 comments 3 replies

grondo
May 14, 2021
Maintainer

SteVwonder
May 14, 2021
Maintainer

dongahn May 18, 2021
Maintainer

jameshcorbett May 18, 2021
Maintainer Author