Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider running a separate qmanager for each queue #1258

Open
trws opened this issue Jul 29, 2024 · 5 comments
Open

Consider running a separate qmanager for each queue #1258

trws opened this issue Jul 29, 2024 · 5 comments

Comments

@trws
Copy link
Member

trws commented Jul 29, 2024

@garlick had a really interesting idea to run separate instances of our modules to handle separate queues. I had originally thought about this in terms of the resource module, because it would greatly reduce the complexity there, but it also has a severe downside there in that queues could never share resources. On the other hand, running multiple instances of qmanager would simplify things slightly less, but still some, and it would mean all the queues could process requests in parallel. There's no shared state required between queues, so as far as I can think all it would cost is a (pretty small) amount of ram and some TIDs. Would love thoughts on this. If we do the protocol tweaking we've discussed we could even have it be that each queue is assumed to be an instance of a service, which would make it easy to load others side-by-side.

@grondo
Copy link
Contributor

grondo commented Jul 30, 2024

@ryanday36 might want to comment. I think we're using overlapping queues in production, but I don't know if both are ever active at the same time, so maybe it doesn't matter.

@ryanday36
Copy link

The overlapping queues are ideally never active at the same time, though if this were to be implemented it would be good to also put some controls in flux-core to ensure that a queue can't be activated if it would overlap with an already active queue.

The main door that this would close would be the ability to use overlapping queues to implement different policies (limits, priorities, preemption, etc. The things needed for 'standby', 'exempt', and 'expedite') for different sets of users on the same nodes a la LSF. Slurm keeps those concerns largely separate (via QoS), and an approach like that probably makes sense for how Flux is architected. Policy things like limits and priorities should probably be wholly contained in flux-accounting with sched just concerned with matching jobs to resources.

That said, it does still make me a little bit uncomfortable to close off paths before the design work for things like flux-framework/flux-core#5739, flux-framework/flux-core#5205, and flux-framework/flux-core#4306 has been done. It seems like those things should be doable in flux-core and flux-accounting with non-overlapping queues, but I'd feel better about it if we had a more concrete idea of what that will end up looking like.

@ryanday36
Copy link

A couple more small, but important questions. As @grondo mentioned above, we don't have overlapping queues active at the same time, but they can be enabled at the same time to that users can submit jobs to inactive queues that will be run later. Would running separate qmanagers still allow us to do this? Also, when we switch active queues, it doesn't currently affect running jobs (they continue to run until they complete or hit their time limit). Would you still be able to allow jobs in one queue to continue if you switched to a separate queue with a different qmanager?

@garlick
Copy link
Member

garlick commented Jul 30, 2024

Great points @ryanday36 and I agree we should not be too hasty here and should revisit those use cases.

It is sort of appealing to have a queue be directly associated (1:1) with a scheduler instance though.

One thought regarding overlapping queues is maybe we could have each scheduler instance "acquire" the full resource set but only mark the queue's configured set "up" in each scheduler instance/queue. Then when the queue configuration changes dynamically, mark them "down" in the donor, but not "up" in the recipient until they are no longer allocated.

@trws
Copy link
Member Author

trws commented Jul 30, 2024

There's a good bit to unpack and explain here, sorry if this ends up being
verbose. There are two different but related suggestions going on here. We
currently have a system like this:

flowchart LR
  subgraph qmanager-module
    queue1
    queue2
    queue3
  end
  subgraph resource-module
    subgraph properties
    q1
    q2
    q3
    end
  end
  queue1-->q1
  queue2-->q2
  queue3-->q3
Loading

The original suggestion, which would make dealing with queue constraints
completely go away in resource was to split resources between different
instances of the resource module so it would look like this:

flowchart LR
  subgraph qmanager-module
    queue1
    queue2
    queue3
  end
  subgraph resource-module-1
    subgraph properties-1
    q1
    end
  end
  subgraph resource-module-2
    subgraph properties-2
    q2
    end
  end
  subgraph resource-module-3
    subgraph properties-3
    q3
    end
  end
  queue1-->q1
  queue2-->q2
  queue3-->q3

Loading

This means there is no shared knowledge of resource status between queues, and
would force us to only allow a resource to exist in one queue as a result.
It would be a significant benefit in matching complexity, but might not be worth
it.

What I meant to be proposing here was this split instead, or at least as a first
step:

flowchart LR
  subgraph qmanager-module-1
    queue1
  end
  subgraph qmanager-module-2
    queue2
  end
  subgraph qmanager-module-3
    queue3
  end
  subgraph resource-module
    subgraph properties
    q1
    q2
    q3
    end
  end
  queue1-->q1
  queue2-->q2
  queue3-->q3
Loading

The multi-qmanager version still has a single source of truth for whether a resource is allocated or not, so there is no need to worry about resources being shared between queues (well, there are still some problems there but no fundamental problems). As it is, the queues share no state, so this should be a matter of checking and possibly tweaking the protocol to make sure each queue module gets its messages and that's about it. It wouldn't prevent us from having overlapping queues, though it might prevent us from enforcing some constraints on ordering or fairness in terms of which of the overlapping queues gets to free resources first.

On the other hand, the resource split would prevent everything you're worried about, that's why I'm not really sure that's a good idea. @garlick's last suggestion would help with that a lot actually, so that might make it feasible, but I don't think we would want to go that far, at least not until we have a better handle on consequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants