Flux Emulator Revival Questions #6466

washwor1 · 2024-12-02T19:00:33Z

Hello, I am currently working on getting the Flux emulator (originally simulator) from @SteVwonder (#2561) up and running with the latest version of Flux core. I got the code working with the old Flux core version and am now working on merging the code into the newest version of core. While I am doing that, I figured I would ask a few questions. I discussed these issues with @grondo and @wihobbs at the Flux coffee hour on Nov 22, and they suggested opening an issue so @garlick and others could weigh in. Here are the questions:

Looking at the code from [WIP] Flux simulator #2561 , are there any sections that look like they will need to be completely rewritten because of changes in Flux since original development? From what I can tell, it seems like most things are fairly decoupled and there should only be minor modifications. However, I am fairly new to Flux, so maybe I am missing something.
The original code is missing the ability to handle jobs that are unsatisfiable (Line 259 in flux_simulator.py hangs). I was wondering what the recommended method/tool to implement this would be? From the coffee hour meeting, I was recommended to use either the jobtap plugin or wait on cancel exceptions through RPC (or a combination).
Are there any other important features that should be added that would be useful to implement that would be useful for users of the finished emulator? I am currently wanting to expand on the post-sim analysis a bit and make sure that job timeouts work properly.

Thank you for the advice.

garlick · 2024-12-02T20:08:39Z

Great that you're trying to push this forward! The simulator has a lot of exciting possibilities for enabling scheduling research (the slurm simulator seems to make regular appearances) and helping us understand and improve flux's scheduler, write test cases, etc..

Have you gotten the old branch working and are you able to run simulations? Doing this and may be taking a stab at a draft description of how it works currently might be a helpful starting point for reviewing the approach in the context of today's Flux.

It will probably be a bit annoying to forward port after 5 years of flux development, but I'm not sure anything substantial has really changed in the interfaces between the job manager, the exec system, and the scheduler. There will be lots of little changes though.

trws · 2024-12-03T00:09:17Z

One thing I recall we discussed as this was getting started is this is after the work to port the simulator to the new exec system, so it may not be too bad. I imagine a lot of it is going to be things like handling unsatisfiable jobs or other states that didn't exist yet but need to be factored in.

As for 3, I'd say probably yes but don't worry about that yet. We'll have to see how the whole thing ties together to get an idea of what "just works" because of how it's implemented and what we'll want to be able to tweak.

trws · 2024-12-03T00:18:14Z

Looking it over, if the calls from job-manager can become a jobtap plugin (not sure but it looks like it at first glance), that would definitely help. The main thing that might need some thought is how to define "busy" and "quiescent" callbacks in fluxion after everything went asynchronous. It's not quite as easy as it used to be (we used to just process everything in-order, so when the callback ran, it was time) but now it will have to detect when the sched loop has no further work to do. That's actually possible, it puts the event loop to sleep in that case until something comes in, but it's a bit more work. There are also some strange states we didn't really have before, like there being jobs that are satisfiable but not reservable.

washwor1 · 2024-12-05T22:02:54Z

Thank you for the responses.

@garlick I was able to get the old branch up and running with the original sharness tests as test cases:

malformed input
single node w/ 10 jobs
10 Nodes w/ 10 jobs

All of these seem to be producing the intended behavior. Then, yes that is what I've been able to observe as well while working through the code. It doesn't seem as if there are any major surprises for me updating this code as-is 🎉. I'm also currently working on materials describing the function of the emulator which, as you mentioned, will be a very helpful resource. Whenever I'm done with that I can send it to you for review if you would like?

@trws Taking that into consideration, my plan is to first update the emulator with the functionality as-is. Then, I will work on designing the changes required to support missing functionality such as unsatisfiable/other states. Additionally, I will at this time work on reproducing old work done with the previous emulator design (PRIONN/CanarIO) to see if there are any gaps within the functionality that we are missing. From that information, we could take an informed approach towards major changes (jobtap/etc) and see how those work out. The emulator also does not currently work with Fluxion at all, so that will be something to get running. Currently it is using sched-simple.

As far as my development/timeline, it's a bit stalled right now because of SC + Thanksgiving holiday + Class Finals, but when I get my classes finished with, I'll be able to get some significant work done and report back on how everything is going.

washwor1 · 2025-01-08T17:48:37Z

Hello @garlick @trws. I hope you both (and anyone else reading) had a great holiday. I wanted to let the both of you know how things are going and ask a couple questions regarding issues I am having getting the emulator at 1:1 functionality.

So to start, I've been able to get the emulator to run properly with a single node configuration. Among many little changes to naming conventions and things of that nature, the major changes I made were:

Changed idle and busy callbacks in libschedutil to use the new "ops" member
Rewriting the logic that fakes resources because of the introduction of the resource module
Changed the job completion reentry into the advance loop to use the job event journal instead of the original pubsub method that doesnt exist anymore
Due to a synchronization issue, add a dummy event upon job completion to keep the emulator from exiting when one job ends and the other hasn't started yet

As of now, there is another change that needs to be made during the post-sim stage regarding collecting job event logs. I have a couple questions that I will put in another comment.

washwor1 · 2025-01-08T17:56:50Z

The first question/issue that I am a bit stuck on is on the resource specification. I have rewritten it to use Rlist from the Python bindings to put R into the KVS and then I reload the resource module and the scheduler so it takes the fake resources. This is working just fine for a single node, but it is not working when I try to add multiple nodes. I have tried several different things, but this is the code I am working with right now:

`
def insert_resource_data(flux_handle, num_ranks, cores_per_rank, hostname_pattern="node{rank}"):
if num_ranks <= 0 or cores_per_rank <= 0:
raise ValueError("Number of ranks and cores per rank must be positive integers")

rlist = Rlist()

for rank in range(num_ranks):
    core_range = f'0-{cores_per_rank - 1}' if cores_per_rank > 1 else '0'
    hostname = hostname_pattern.format(rank=rank)
    rlist.add_rank(rank, hostname=hostname, cores=core_range)

rlist_str = rlist.encode()
rlist_json = json.loads(rlist_str)

kvs_key = "resource.R"
print(rlist_json)
put_rc = flux.kvs.put(flux_handle, kvs_key, rlist_json)
if put_rc is not None:
    raise ValueError(f"Error inserting resource data into KVS, rc={put_rc}")

commit_rc = flux.kvs.commit(flux_handle)
if commit_rc is not None:
    raise ValueError(f"Error committing resource data to KVS, rc={commit_rc}") `

On a configuration with 30 nodes of 16 cores each, I get this output for R:
{"resources":{"version":1,"execution":{"R_lite":[{"rank":"0-29","children":{"core":"0-15"}}],"starttime":0.0,"expiration":0.0,"nodelist":["node[0-29]"]}},"up":"0"}

As far as I can tell, this looks right, but the broker log says it has available 16 of 480 cores/only node 0.
Jan 07 20:00:40.880788 UTC sched-simple.debug[0]: ready: 16 of 480 cores: rank0/core[0-15]

Does anybody have a clue what the problem is here?

washwor1 · 2025-01-08T18:08:08Z

The second question is regarding what I mentioned about the "dummy" event. Essentially, after changing the complete job callback to use the job event journal, I was running into an issue where the emulator would exit prematurely whenever one job finished and the next job was needing to be started. The advance() function would be called before the next job was started, so it would not be in the eventlist and the advance() function would think that there was nothing left to do and exit.

A solution that I found to work was to insert a "dummy" job into the eventlist that did nothing in time+1e09s. This would force the emulator to query the scheduler for quiescence again and wait until the job was running.

My question is whether this is acceptable as it seems like a sort of band-aid solution. I was thinking it would be alright for now especially if we decided to rewrite the emulator to interact with flux as a jobtap plugin as we would be able to get better control over the job lifecycle at that point.

Regarding that, I had a conversation before holiday with @cmoussa1 about jobtap, and I think it would be a great idea to make the emulator a jobtap plugin as it seems like we would get much simpler control over the job lifecycle. This would make the design simpler and also likely entirely decouple the emulator from the rest of flux core (assuming that there isn't some aspect of functionality that I am forgetting).

washwor1 · 2025-01-08T18:12:53Z

One last thing. I am working out of this branch: https://github.com/TauferLab/flux-core-gclab/blob/emulator-jay/src/cmd/flux-emulator.py

grondo · 2025-01-08T18:25:13Z

I have rewritten it to use Rlist from the Python bindings to put R into the KVS and then I reload the resource module and the scheduler so it takes the fake resources.

When reloading the core resource module, you may have to supply the monitor-force-up argument to the module to force ranks other than 0 online.

washwor1 · 2025-01-08T21:37:19Z

@grondo That did the trick. Thanks!

grondo · 2025-01-15T18:17:47Z

A solution that I found to work was to insert a "dummy" job into the eventlist that did nothing in time+1e09s. This would force the emulator to query the scheduler for quiescence again and wait until the job was running.

My question is whether this is acceptable as it seems like a sort of band-aid solution. I was thinking it would be alright for now especially if we decided to rewrite the emulator to interact with flux as a jobtap plugin as we would be able to get better control over the job lifecycle at that point.

Not knowing the internals of the emulator at all, this solution sounds fine to me, assuming the eventlist is internal to the emulator code itself. Perhaps to make it seem less like a kludge, you could come up with some abstraction that isn't a "job" which could satisfy the same use case (i.e. a sentinel or some other placeholder that has the same effect)? Of course, my suggestion may make no sense because I'm not familiar with the code in question. (Perhaps @trws or @garlick or someone else will have more informed opinions)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flux Emulator Revival Questions #6466

Flux Emulator Revival Questions #6466

washwor1 commented Dec 2, 2024

garlick commented Dec 2, 2024

trws commented Dec 3, 2024

trws commented Dec 3, 2024

washwor1 commented Dec 5, 2024

washwor1 commented Jan 8, 2025

washwor1 commented Jan 8, 2025 •

edited

Loading

washwor1 commented Jan 8, 2025

washwor1 commented Jan 8, 2025 •

edited

Loading

grondo commented Jan 8, 2025

washwor1 commented Jan 8, 2025

grondo commented Jan 15, 2025

Flux Emulator Revival Questions #6466

Flux Emulator Revival Questions #6466

Comments

washwor1 commented Dec 2, 2024

garlick commented Dec 2, 2024

trws commented Dec 3, 2024

trws commented Dec 3, 2024

washwor1 commented Dec 5, 2024

washwor1 commented Jan 8, 2025

washwor1 commented Jan 8, 2025 • edited Loading

washwor1 commented Jan 8, 2025

washwor1 commented Jan 8, 2025 • edited Loading

grondo commented Jan 8, 2025

washwor1 commented Jan 8, 2025

grondo commented Jan 15, 2025

washwor1 commented Jan 8, 2025 •

edited

Loading

washwor1 commented Jan 8, 2025 •

edited

Loading