Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux Emulator Revival Questions #6466

Open
washwor1 opened this issue Dec 2, 2024 · 11 comments
Open

Flux Emulator Revival Questions #6466

washwor1 opened this issue Dec 2, 2024 · 11 comments

Comments

@washwor1
Copy link

washwor1 commented Dec 2, 2024

Hello, I am currently working on getting the Flux emulator (originally simulator) from @SteVwonder (#2561) up and running with the latest version of Flux core. I got the code working with the old Flux core version and am now working on merging the code into the newest version of core. While I am doing that, I figured I would ask a few questions. I discussed these issues with @grondo and @wihobbs at the Flux coffee hour on Nov 22, and they suggested opening an issue so @garlick and others could weigh in. Here are the questions:

  1. Looking at the code from [WIP] Flux simulator #2561 , are there any sections that look like they will need to be completely rewritten because of changes in Flux since original development? From what I can tell, it seems like most things are fairly decoupled and there should only be minor modifications. However, I am fairly new to Flux, so maybe I am missing something.

  2. The original code is missing the ability to handle jobs that are unsatisfiable (Line 259 in flux_simulator.py hangs). I was wondering what the recommended method/tool to implement this would be? From the coffee hour meeting, I was recommended to use either the jobtap plugin or wait on cancel exceptions through RPC (or a combination).

  3. Are there any other important features that should be added that would be useful to implement that would be useful for users of the finished emulator? I am currently wanting to expand on the post-sim analysis a bit and make sure that job timeouts work properly.

Thank you for the advice.

@garlick
Copy link
Member

garlick commented Dec 2, 2024

Great that you're trying to push this forward! The simulator has a lot of exciting possibilities for enabling scheduling research (the slurm simulator seems to make regular appearances) and helping us understand and improve flux's scheduler, write test cases, etc..

Have you gotten the old branch working and are you able to run simulations? Doing this and may be taking a stab at a draft description of how it works currently might be a helpful starting point for reviewing the approach in the context of today's Flux.

It will probably be a bit annoying to forward port after 5 years of flux development, but I'm not sure anything substantial has really changed in the interfaces between the job manager, the exec system, and the scheduler. There will be lots of little changes though.

@trws
Copy link
Member

trws commented Dec 3, 2024

One thing I recall we discussed as this was getting started is this is after the work to port the simulator to the new exec system, so it may not be too bad. I imagine a lot of it is going to be things like handling unsatisfiable jobs or other states that didn't exist yet but need to be factored in.

As for 3, I'd say probably yes but don't worry about that yet. We'll have to see how the whole thing ties together to get an idea of what "just works" because of how it's implemented and what we'll want to be able to tweak.

@trws
Copy link
Member

trws commented Dec 3, 2024

Looking it over, if the calls from job-manager can become a jobtap plugin (not sure but it looks like it at first glance), that would definitely help. The main thing that might need some thought is how to define "busy" and "quiescent" callbacks in fluxion after everything went asynchronous. It's not quite as easy as it used to be (we used to just process everything in-order, so when the callback ran, it was time) but now it will have to detect when the sched loop has no further work to do. That's actually possible, it puts the event loop to sleep in that case until something comes in, but it's a bit more work. There are also some strange states we didn't really have before, like there being jobs that are satisfiable but not reservable.

@washwor1
Copy link
Author

washwor1 commented Dec 5, 2024

Thank you for the responses.

@garlick I was able to get the old branch up and running with the original sharness tests as test cases:

  1. malformed input
  2. single node w/ 10 jobs
  3. 10 Nodes w/ 10 jobs

All of these seem to be producing the intended behavior. Then, yes that is what I've been able to observe as well while working through the code. It doesn't seem as if there are any major surprises for me updating this code as-is 🎉. I'm also currently working on materials describing the function of the emulator which, as you mentioned, will be a very helpful resource. Whenever I'm done with that I can send it to you for review if you would like?

@trws Taking that into consideration, my plan is to first update the emulator with the functionality as-is. Then, I will work on designing the changes required to support missing functionality such as unsatisfiable/other states. Additionally, I will at this time work on reproducing old work done with the previous emulator design (PRIONN/CanarIO) to see if there are any gaps within the functionality that we are missing. From that information, we could take an informed approach towards major changes (jobtap/etc) and see how those work out. The emulator also does not currently work with Fluxion at all, so that will be something to get running. Currently it is using sched-simple.

As far as my development/timeline, it's a bit stalled right now because of SC + Thanksgiving holiday + Class Finals, but when I get my classes finished with, I'll be able to get some significant work done and report back on how everything is going.

@washwor1
Copy link
Author

washwor1 commented Jan 8, 2025

Hello @garlick @trws. I hope you both (and anyone else reading) had a great holiday. I wanted to let the both of you know how things are going and ask a couple questions regarding issues I am having getting the emulator at 1:1 functionality.

So to start, I've been able to get the emulator to run properly with a single node configuration. Among many little changes to naming conventions and things of that nature, the major changes I made were:

  • Changed idle and busy callbacks in libschedutil to use the new "ops" member
  • Rewriting the logic that fakes resources because of the introduction of the resource module
  • Changed the job completion reentry into the advance loop to use the job event journal instead of the original pubsub method that doesnt exist anymore
  • Due to a synchronization issue, add a dummy event upon job completion to keep the emulator from exiting when one job ends and the other hasn't started yet

As of now, there is another change that needs to be made during the post-sim stage regarding collecting job event logs. I have a couple questions that I will put in another comment.

@washwor1
Copy link
Author

washwor1 commented Jan 8, 2025

The first question/issue that I am a bit stuck on is on the resource specification. I have rewritten it to use Rlist from the Python bindings to put R into the KVS and then I reload the resource module and the scheduler so it takes the fake resources. This is working just fine for a single node, but it is not working when I try to add multiple nodes. I have tried several different things, but this is the code I am working with right now:

`
def insert_resource_data(flux_handle, num_ranks, cores_per_rank, hostname_pattern="node{rank}"):
if num_ranks <= 0 or cores_per_rank <= 0:
raise ValueError("Number of ranks and cores per rank must be positive integers")

rlist = Rlist()

for rank in range(num_ranks):
    core_range = f'0-{cores_per_rank - 1}' if cores_per_rank > 1 else '0'
    hostname = hostname_pattern.format(rank=rank)
    rlist.add_rank(rank, hostname=hostname, cores=core_range)

rlist_str = rlist.encode()
rlist_json = json.loads(rlist_str)

kvs_key = "resource.R"
print(rlist_json)
put_rc = flux.kvs.put(flux_handle, kvs_key, rlist_json)
if put_rc is not None:
    raise ValueError(f"Error inserting resource data into KVS, rc={put_rc}")

commit_rc = flux.kvs.commit(flux_handle)
if commit_rc is not None:
    raise ValueError(f"Error committing resource data to KVS, rc={commit_rc}") `

On a configuration with 30 nodes of 16 cores each, I get this output for R:
{"resources":{"version":1,"execution":{"R_lite":[{"rank":"0-29","children":{"core":"0-15"}}],"starttime":0.0,"expiration":0.0,"nodelist":["node[0-29]"]}},"up":"0"}

As far as I can tell, this looks right, but the broker log says it has available 16 of 480 cores/only node 0.
Jan 07 20:00:40.880788 UTC sched-simple.debug[0]: ready: 16 of 480 cores: rank0/core[0-15]

Does anybody have a clue what the problem is here?

@washwor1
Copy link
Author

washwor1 commented Jan 8, 2025

The second question is regarding what I mentioned about the "dummy" event. Essentially, after changing the complete job callback to use the job event journal, I was running into an issue where the emulator would exit prematurely whenever one job finished and the next job was needing to be started. The advance() function would be called before the next job was started, so it would not be in the eventlist and the advance() function would think that there was nothing left to do and exit.

A solution that I found to work was to insert a "dummy" job into the eventlist that did nothing in time+1e09s. This would force the emulator to query the scheduler for quiescence again and wait until the job was running.

My question is whether this is acceptable as it seems like a sort of band-aid solution. I was thinking it would be alright for now especially if we decided to rewrite the emulator to interact with flux as a jobtap plugin as we would be able to get better control over the job lifecycle at that point.

Regarding that, I had a conversation before holiday with @cmoussa1 about jobtap, and I think it would be a great idea to make the emulator a jobtap plugin as it seems like we would get much simpler control over the job lifecycle. This would make the design simpler and also likely entirely decouple the emulator from the rest of flux core (assuming that there isn't some aspect of functionality that I am forgetting).

@washwor1
Copy link
Author

washwor1 commented Jan 8, 2025

One last thing. I am working out of this branch: https://github.com/TauferLab/flux-core-gclab/blob/emulator-jay/src/cmd/flux-emulator.py

@grondo
Copy link
Contributor

grondo commented Jan 8, 2025

I have rewritten it to use Rlist from the Python bindings to put R into the KVS and then I reload the resource module and the scheduler so it takes the fake resources.

When reloading the core resource module, you may have to supply the monitor-force-up argument to the module to force ranks other than 0 online.

@washwor1
Copy link
Author

washwor1 commented Jan 8, 2025

@grondo That did the trick. Thanks!

@grondo
Copy link
Contributor

grondo commented Jan 15, 2025

A solution that I found to work was to insert a "dummy" job into the eventlist that did nothing in time+1e09s. This would force the emulator to query the scheduler for quiescence again and wait until the job was running.

My question is whether this is acceptable as it seems like a sort of band-aid solution. I was thinking it would be alright for now especially if we decided to rewrite the emulator to interact with flux as a jobtap plugin as we would be able to get better control over the job lifecycle at that point.

Not knowing the internals of the emulator at all, this solution sounds fine to me, assuming the eventlist is internal to the emulator code itself. Perhaps to make it seem less like a kludge, you could come up with some abstraction that isn't a "job" which could satisfy the same use case (i.e. a sentinel or some other placeholder that has the same effect)? Of course, my suggestion may make no sense because I'm not familiar with the code in question. (Perhaps @trws or @garlick or someone else will have more informed opinions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants