sched: Support POSIX's SCHED_RR scheduling policy #1338

nmeum · 2024-11-11T15:58:35Z

Previously, pull request #1223 added support for the SCHED_FIFO but didn't implement the SCHED_RR policy. This PR attempts to follow-up on that by proposing an implementation of SCHED_RR.

Time slicing support, as mandated by SCHED_RR, is implemented though the set_realtime_time_slice(duration) API added in the aforementioned pull request. Within the scheduler, the amount of time the thread ran so far is tracked if it exceeds the set duration, the thread is preempted (if there is a runnable thread of same or higher priority). The thread's run time is reset on preemption.

This is my first OSv pull request, I added some basic tests which I can expand further. Additionally, I checked that the tests added #1223 still pass. Further testing is definitely needed, hence this is a draft PR for now.

Let me know what you think.

Previously, pull request cloudius-systems#1223 added support for the SCHED_FIFO but didn't implement SCHED_RR. This PR attempts to follow-up on that by proposing an implementation of SCHED_RR. Time slicing support, as mandated by SCHED_RR, is implemented though the `set_realtime_time_slice(duration)` API added in the aforementioned pull request. Within the scheduler, the amount of time the thread ran so far is tracked if it exceeds the set duration, the thread is preempted (if there is a runnable thread of same or higher priority). The thread's run time is reset on preemption. Signed-off-by: Sören Tempel <[email protected]>

core/sched.cc

nmeum · 2024-11-11T16:00:49Z

core/sched.cc

+        // p is no longer running, if it has a realtime slice reset it.
+        if (p->_realtime.has_slice()) {
+            p->_realtime.reset_slice();
+        }


It is not entirely clear to me if the slice should be reset if the thread is no longer runnable (e.g. because of blocking I/O). POSIX does not explicitly describe when the slice should be reset.

Yes, I'm also not sure, but think this if is right. I think the idea of the time slice is to make sure that a single thread never in its priority group never runs more than 1ms (for example) without letting other threads in its group run. But if the thread blocks or yields voluntarily (I believe this if covers both cases, right?), then it gives some other thread a chance to run and it too has a chance to run for a whole time-slice, so it's only fair that this thread's time slice is reset to zero. I think.
I tried searching if anybody mentions this question, and couldn't find such a discussion.

wkozaczuk

Your code change looks good. I did ask some questions though.

wkozaczuk · 2024-11-22T21:09:43Z

core/sched.cc

+                // If the threads have the same realtime priority, then only reschedule
+                // if the currently executed thread has exceeded its time slice (if any).
+                if (t._realtime._priority == p->_realtime._priority &&
+                    ((!p->_realtime.has_slice() || p->_realtime.has_remaining()))) {


So all this means we are going to keep the current thread p running should stay running if _time_slice is 0 (run 'forever' until yields or waits) OR there is still time left to run per its _time_slice, right?

[…] should stay running if _time_slice is 0 (run 'forever' until yields or waits) OR there is still time left to run per its _time_slice, right?

Yes p->_realtime.has_slice() checks if it has a time slice (i.e. _time_slice != 0), p->_realtime.has_remaining() checks if there is still time remaining on the slice (if it has one).

Note that, even if the thread has exceeded its time slice it may still be selected to run again if there is no thread with a higher priority. Hence, the priority comparison in the if condition.

wkozaczuk · 2024-11-22T21:13:59Z

core/sched.cc

+                enqueue(*p);
+                p->_realtime.reset_slice();
+            } else {
+                // POSIX requires that if a real-time thread doesn't yield but


So this means that if the current thread p _time_slice is 0 OR p still has some remaining time to run, we will call enqueue_first_equal(). Is this correct?

Yes, I think it's correct. If we got here it means p was preempted. If it still has remaining time, it means it was preempted by a higher-priority realtime thread but when that higher-priority thread doesn't want to run, this thread p should continue running and continue its current time slice. The documentation says: "A SCHED_RR thread that has been preempted by a higher priority thread and subsequently resumes execution as a running thread will complete the unexpired portion of its round-robin time quantum.". It should be the first one in its priority group to run (and therefore enqueue_first_equal()) just like when no time slices existed.

core/sched.cc

wkozaczuk · 2024-11-22T22:15:33Z

@nyh What do you think?

nyh

Looks very nice to me, and appears correct although I'm a bit worried that I'm rusty in this code and might have missed something. I only left a few minor comments/requests.

I just realized that we never actually implemented the POSIX API for these features ;-) I we think we have such patches in #386 and maybe it will be nice to revive them.

nyh · 2024-11-24T15:24:13Z

core/sched.cc

@@ -276,6 +276,10 @@ void cpu::reschedule_from_interrupt(bool called_from_yield,
    }
    thread* p = thread::current();

+    if (p->_realtime.has_slice()) {


I think that today it is possible to set the realtime time slice without setting realtime priority yet, and I think in this case you don't want to increment _run_time. So maybe you also need to check if realtime.priority is > 0?

I think that today it is possible to set the realtime time slice without setting realtime priority yet […]

Do we want to support setting a time slice without providing a realtime priority? If so I can also adjust the code accordingly.

So maybe you also need to check if realtime.priority is > 0?

Added this for now in 8db3444

core/sched.cc

nyh · 2024-11-24T15:37:13Z

core/sched.cc

-            // rather is preempted by a higher-priority thread, it be
-            // reinserted into the runqueue first, not last, among its equals.
-            enqueue_first_equal(*p);
+            if (p->_realtime.has_slice() && !p->_realtime.has_remaining()) {


Again, maybe we need to check if realtime.priority>0 because maybe has_slice() just records some old setting and it's not in use now?

Or, maybe, for simplicity, we should just ensure that if the real-time priority is ever set to 0, then slice is also set to 0?

nyh · 2024-11-24T15:46:42Z

core/sched.cc

+                enqueue(*p);
+                p->_realtime.reset_slice();
+            } else {
+                // POSIX requires that if a real-time thread doesn't yield but


Yes, I think it's correct. If we got here it means p was preempted. If it still has remaining time, it means it was preempted by a higher-priority realtime thread but when that higher-priority thread doesn't want to run, this thread p should continue running and continue its current time slice. The documentation says: "A SCHED_RR thread that has been preempted by a higher priority thread and subsequently resumes execution as a running thread will complete the unexpired portion of its round-robin time quantum.". It should be the first one in its priority group to run (and therefore enqueue_first_equal()) just like when no time slices existed.

nyh · 2024-11-24T16:17:13Z

core/sched.cc

+        // p is no longer running, if it has a realtime slice reset it.
+        if (p->_realtime.has_slice()) {
+            p->_realtime.reset_slice();
+        }


Yes, I'm also not sure, but think this if is right. I think the idea of the time slice is to make sure that a single thread never in its priority group never runs more than 1ms (for example) without letting other threads in its group run. But if the thread blocks or yields voluntarily (I believe this if covers both cases, right?), then it gives some other thread a chance to run and it too has a chance to run for a whole time-slice, so it's only fair that this thread's time slice is reset to zero. I think.
I tried searching if anybody mentions this question, and couldn't find such a discussion.

core/sched.cc

tests/tst-thread-realtime.cc

nyh · 2024-11-24T16:36:52Z

tests/tst-thread-realtime.cc

+    long prev_switches = -1;
+    for (int i = 0; i < num_threads; i++) {
+        long switches = threads[i]->stat_switches.get();
+        if (prev_switches != -1 && prev_switches != switches) {


Am I correct that you want all threads to have exactly the same number of context switches? How can we be confident of this - can't one happen to have one more than the others because of some inaccuracy or something?

Am I correct that you want all threads to have exactly the same number of context switches?

Yes. My thinking was: If we assign each thread a time slice of size N and then wait N * NUM_THREADS * EXPECTED_SWITCHES time units then (on a single core machine under a realtime scheduling policy), we would expect each thread to be preempted after N seconds. As such, each thread should have EXPECTED_SWITCHES amount of context switches.

Maybe I am missing something obvious, but I haven't seen this test fail yet. However, since this is not a hard-realtime operating systems I assume we could see delays here and there in the scheduler? We can also make the comparison fuzzy, allowing the amount of expected context switches to be off by 1 or 2?

nyh · 2024-11-24T16:39:26Z

tests/tst-thread-realtime.cc

+    // Since both threads are pinned to the CPU and the higher priority
+    // thread is always runnable, the lower priority thread should starve.
+    bool ok = high_prio->thread_clock().count() > 0 &&
+        low_prio->thread_clock().count() == 0;


I'm a bit worried that it's theoretically possible that although you start()ed the high prio thread first and only then the low_prio one, maybe the low prio one got to run for a microsecond before the high prio one so its thread clock will not be exactly zero. But maybe in practice it doesn't happen...

This test can also have the opposite problem. If I understand correctly your TIME_SLICE is absolutely tiny, 0.1 milliseconds, and after starting high_prio and low_prio you only sleep 3 times that, i.e., 0.3 milliseconds, so it is theoretically possible that the test will pass even without any realtime priorities or anything, just because we let the first-ran highprio thread run for 0.3 milliseconds straight.

If I understand correctly your TIME_SLICE is absolutely tiny, 0.1 milliseconds

I increased it further in 3c27113

maybe the low prio one got to run for a microsecond before the high prio one

Note: They are both pinned to same CPU so the higher prio one should also be first in the CPU runqueue and should always be runnable so I am not sure under which scenario the lower prio one would get the CPU. However, I believe you have more expertise with the scheduling code. We can also discard this test case.

What test cases would you like to see instead for SCHED_RR?

nmeum · 2024-12-04T13:37:18Z

Thanks a lot to both of you for the detailed comments and feedback! I made some minor changes and left further comments above. I think the main two things that remain to be sorted out are:

Do we want want to support setting a realtime slice without a priority? If not, there are a few places where we should check if priority == 0 as pointed out above.
I believe the tests should be expanded a bit, if you have ideas regarding useful test cases for SCHED_RR let me know. Maybe there are also tests for this scheduling policy in some POSIXy open source operating system of choice that we can use as a source of inspiration.

nmeum commented Nov 11, 2024

View reviewed changes

core/sched.cc Show resolved Hide resolved

nmeum commented Nov 11, 2024

View reviewed changes

wkozaczuk reviewed Nov 22, 2024

View reviewed changes

nyh requested changes Nov 24, 2024

View reviewed changes

fixup! sched: Support POSIX's SCHED_RR scheduling policy

2535721

nmeum added 5 commits December 4, 2024 20:47

fixup! sched: Support POSIX's SCHED_RR scheduling policy

e9ea1a0

fixup! sched: Support POSIX's SCHED_RR scheduling policy

3c27113

fixup! sched: Support POSIX's SCHED_RR scheduling policy

002e0a9

fixup! sched: Support POSIX's SCHED_RR scheduling policy

8db3444

fixup! sched: Support POSIX's SCHED_RR scheduling policy

b2f1957

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sched: Support POSIX's SCHED_RR scheduling policy #1338

sched: Support POSIX's SCHED_RR scheduling policy #1338

nmeum commented Nov 11, 2024 •

edited

Loading

nmeum Nov 11, 2024

nyh Nov 24, 2024

wkozaczuk left a comment

wkozaczuk Nov 22, 2024

nmeum Dec 4, 2024

wkozaczuk Nov 22, 2024

nyh Nov 24, 2024

wkozaczuk commented Nov 22, 2024

nyh left a comment

nyh Nov 24, 2024

nmeum Dec 4, 2024 •

edited

Loading

nyh Nov 24, 2024

nyh Nov 24, 2024

nyh Nov 24, 2024

nyh Nov 24, 2024

nmeum Dec 4, 2024

nyh Nov 24, 2024

nyh Nov 24, 2024

nmeum Dec 4, 2024

nmeum commented Dec 4, 2024

sched: Support POSIX's SCHED_RR scheduling policy #1338

Are you sure you want to change the base?

sched: Support POSIX's SCHED_RR scheduling policy #1338

Conversation

nmeum commented Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkozaczuk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkozaczuk commented Nov 22, 2024

nyh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nmeum Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nmeum commented Dec 4, 2024

nmeum commented Nov 11, 2024 •

edited

Loading

nmeum Dec 4, 2024 •

edited

Loading