Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedule: add Tasks With Budget scheduler type #9075

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

abonislawski
Copy link
Member

@abonislawski abonislawski commented Apr 23, 2024

Tasks With Budget Scheduler

TWB scheduler is a scheduler that creates a separate preemptible Zephyr thread for each SOF task that has pre-allocated MCPS budget renewed with every system tick. When the task is ready to run, then depending on the budget left in the current system tick, either MEDIUM_PRIORITY or LOW_PRIORITY is assigned to task thread. The latter allows for opportunistic execution if there is no other ready task with a higher priority while the budget is already spent.

Examples of tasks with budget: Ipc Task, Idc Task.

Task with Budget (TWB) has two key parameters assigned:

  • cycles granted: the budget per system tick,
  • cycles consumed: number of cycles consumed in a given system_tick for task execution

The number of cycles consumed is being reset to 0 at the beginning of each system_tick, renewing TWB budget. When the number of cycles consumed exceed cycles granted, the task is switched from MEDIUM to LOW priority. When the task with budget thread is created the MPP Scheduling is responsible to set thread time slice equal to task budget along with setting callback on time slice timeout. Thread time slicing guarantee that Zephyr scheduler will interrupt execution when the budget is spent, so MPP Scheduling timeout callback can re-evaluate task priority.

If there is a budget left in some system tick (task spent less time or started executing close to the system tick that preempts execution), it is reset and not carried over to the next tick.

Details: https://thesofproject.github.io/latest/architectures/firmware/sof-zephyr/mpp_layer/mpp_scheduling.html

Zephyr required patch:

Todo:

  • remove fixed numbers
  • switch IPC task to TWB

Copy link
Contributor

@marcinszkudlinski marcinszkudlinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First part of review

@abonislawski abonislawski force-pushed the twb branch 4 times, most recently from ffff647 to e8ff738 Compare April 23, 2024 16:03
@@ -210,6 +210,10 @@ if (CONFIG_SOC_SERIES_INTEL_ADSP_ACE)
${SOF_SRC_PATH}/schedule/zephyr_dp_schedule.c
)

zephyr_library_sources_ifdef(CONFIG_ZEPHYR_TWB_SCHEDULER
${SOF_SRC_PATH}/schedule/zephyr_twb_schedule.c
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this down to src/schedule/CMakeLists.txt, see recent example in 0419b7c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonislawski we need to make this move since this Makefile will go away.

Copy link
Member Author

@abonislawski abonislawski Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its done for kconfig but for CMake it requires more general work unrelated to this PR because src/schedule/CMakeLists.txt is not compatible with zephyr build

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we can get to this later.

@plbossart
Copy link
Member

plbossart commented Apr 23, 2024

Examples of tasks with budget: Ipc Task, Idc Task.

@abonislawski can you explain how one would set a "budget" for things that are dependent on the host interaction.
IPC are notoriously difficult to plan. IIRC in past solutions the "budget" was an arbitrary 10% or something allocated to IPC, because we couldn't figure things in more details.

And second, where is this budget stored? In the host driver, topology, something else?

@marc-hb marc-hb marked this pull request as draft April 23, 2024 18:57
@abonislawski
Copy link
Member Author

abonislawski commented Apr 24, 2024

@plbossart my understanding is this is "general budget" for all IPC's, not for a specific IPC (like create pipeline), let's say we don't want to block other tasks (like DP) for more than 10% of systick time so there is no need to calculate it in great detail.
"Out of budget" task is still running, the only thing changing is thread priority for the rest of systick time, so it's more like warranty for other tasks and their opportunity to run.

And second, where is this budget stored? In the host driver, topology, something else?

I didn't switched IPC task to TWB yet but based on the above (single general budget) Im expecting single define somewhere in FW in the first version

@mwasko please comment from the architecture perspective if my understanding is wrong

@mwasko
Copy link
Contributor

mwasko commented Apr 24, 2024

@plbossart my understanding is this is "general budget" for all IPC's, not for a specific IPC (like create pipeline), let's say we don't want to block other tasks (like DP) for more than 10% of systick time so there is no need to calculate it in great detail. "Out of budget" task is still running, the only thing changing is thread priority for the rest of systick time, so it's more like warranty for other tasks and their opportunity to run.

And second, where is this budget stored? In the host driver, topology, something else?

I didn't switched IPC task to TWB yet but based on the above (single general budget) Im expecting single define somewhere in FW in the first version

@mwasko please comment from the architecture perspective if my understanding is wrong

That is correct. Today in SOF we must decide if IPCs have higher priority then data processing modules and risk that sequence of host IPCs will cause audio glitches or the other way around and risk IPC timeouts due to heavy DSP load. Task with Budget plays a role of a guard that guarantee in each processing cycle a budget of MCPS that can be used for IPC processing in high priority and if it is exceeded the priority is lowered to let the critical audio processing to complete.

@abonislawski
Copy link
Member Author

Pushed small update with:

  • thread priorities (ll, twb, dp, edf) organized in kconfig.threads_prio
  • added check in schedule_task to determine if it needs to start thread first time or just resume it

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it sounds like we need a test pipeline ? @abonislawski what do you have in mind for this ?

@abonislawski
Copy link
Member Author

abonislawski commented May 6, 2024

@lgirdwood I will switch IPC or IDC task to TWB in this PR so it should be heavily tested in current CI

There is also one timeslice change required in Zephyr, already merged here (waiting for west update):

@lgirdwood
Copy link
Member

@lgirdwood I will switch IPC or IDC task to TWB in this PR so it should be heavily tested in current CI

There is also one timeslice change required in Zephyr, already merged here (waiting for west update):

Ok, IPC should be a good test. One thing to check for IDC is that this change wont impact timing around Zephyr mutex() APIs that use IDC on multicore configs.

Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need an explanation why Zephyr slicing isn't enough


config LL_THREAD_PRIORITY
int "LL thread cooperative high priority"
default -16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a risk that if someone changes CONFIG_NUM_COOP_PRIORITIES we might forget to change this one. Can we at least add an assert somewhere? In fact maybe even you can use arithmetics here directly and put -NUM_COOP_PRIORITIES? I'm not sure if this would work but at least I did find one example in SOF

config MULTICORE
	bool
	default CORE_COUNT > 1

and one in Zephyr:

config BUILD_OUTPUT_ADJUST_LMA
	depends on SECOND_CORE_MCUX && SOC_LPC54114_M0
	default "-0x20010000+\
	  $(dt_chosen_reg_addr_hex,$(DT_CHOSEN_Z_CODE_CPU1_PARTITION))"

and an even more elaborate calculation in Zephyr:

FLASH_CHOSEN := zephyr,flash
FLASH_BASE := $(dt_chosen_reg_addr_hex,$(FLASH_CHOSEN))
FLEXSPI_BASE := $(dt_node_reg_addr_hex,/soc/spi@402a8000,1)
config BUILD_OUTPUT_ADJUST_LMA
	default "$(FLEXSPI_BASE) - $(FLASH_BASE)" if NXP_IMX_RT_ROM_RAMLOADER

So, maybe that would work! But at the very least we need asserts!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to do it this way but it failed when I added minus sign to default value.

But based on your examples it should work so I will check again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonislawski just want to confirm we still have a thread level with higher priority than LL ? i.e. in case we need some special task in the future that is needed to preempt LL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood no, currently we have LL on -CONFIG_NUM_COOP_PRIORITIES so highest possible priority but we can easily change that

task->state != SOF_TASK_STATE_COMPLETED) {
scheduler_twb_unlock(lock_key);
return -EINVAL;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put the validity check at the top as an if, not as an else if and you don't need to lock yet, only after line 252, when you start accessing global data

Copy link
Collaborator

@marc-hb marc-hb May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Early return"

  if(bad)
     return -EINVAL; // or goto fail:

  // Entering main, non-indented, "safe" code
  do_real_work()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reversed the logic but not sure about the lock @lyakh ? In DP we set lock at the very beginning so just before checking task->state

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonislawski I think the Linux kernel provides a good example of a healthy approach to locking. There every single lock must have a well defined purpose. They have a check (is it by checkpatch?) that verifies, that there's a comment accompanying every single lock definition. I.e. whenever a new lock is added it mush define what it is for. And specifically it must say what data it's protecting. And it should be clearly understood what contexts can raise for access to that data.
Same here - it should be well defined and understood what data this lock is protecting. Since this is a scheduler level lock, I assume, that it protects some scheduler global data, not every task state?

if (pdata->cycles_granted) {
// Temporary fixed numbers, will be removed when PR ready to merge
if (pdata->cycles_consumed < pdata->cycles_granted*3200) {
budget_left = (pdata->cycles_granted*3200 - pdata->cycles_consumed)*0.0003125;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you explain? I read both the TWB and the Zephyr thread and scheduler slicing documentation and I don't understand where these calculations come into play in them? Could you maybe give an example why the simple native Zephyr slicing isn't enough? You say, that IPC and IDC are two examples of tasks, that need TWB. So, I assume, you want to handle cases when an IPC task blocks IDCs from being handled on this core or the other way round? Wouldn't just direct Zephyr thread time slicing solve this problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do this with simple time slicing (with priority flipping in cb) but this PR is about implementing architecture described here for TWB:
https://thesofproject.github.io/latest/architectures/firmware/sof-zephyr/mpp_layer/mpp_scheduling.html
and this clearly is much more advanced logic which requires cycles calculations and some task running per LL tick.
Please take a look at figure 46, 47 and 48 which shows TWB flow, where we need to update cycles and what to do in LL tick.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then maybe that should be explained instead of using IDC and IPC as examples?

if (thread_priority < 0) {
tr_err(&twb_tr, "scheduler_twb_task_init(): non preemptible priority");
return -EINVAL;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you also check that TWB hasn't been initialised on this core yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should but I checked other schedulers and they are not checking this so probably because scheduler init fail should block core init

@abonislawski
Copy link
Member Author

CI failed on MTL because it is using debug build and assert failed:

ASSERTION FAIL [!arch_is_in_isr()] @ ZEPHYR_BASE/kernel/thread.c:662
	Threads may not be created in ISRs

@nashif how hard is this limitation? Because this code actually works in normal build (without asserts)

@abonislawski
Copy link
Member Author

abonislawski commented Aug 5, 2024

For debug build problems:

Zephyr fix provided to unblock priority setting in ISR:
zephyrproject-rtos/zephyr#76522

The remaining problem for Threads may not be created in ISRs is creating LL task which creates another thread in ISR and actually this LL task triggers the assert.

@lgirdwood
Copy link
Member

For debug build problems:

Zephyr fix provided to unblock priority setting in ISR: zephyrproject-rtos/zephyr#76522

The remaining problem for Threads may not be created in ISRs is creating LL task which creates another thread in ISR and actually this LL task triggers the assert.

@abonislawski did you get a chance to resolve the issue ?

@marc-hb marc-hb dismissed their stale review September 12, 2024 18:30

Leaving SOF

@abonislawski abonislawski force-pushed the twb branch 2 times, most recently from 4962e0c to 7bb3c21 Compare February 10, 2025 07:15
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonislawski good stuff, this is looking on track to be ready in next release, still a few opens.


config LL_THREAD_PRIORITY
int "LL thread cooperative high priority"
default -16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonislawski just want to confirm we still have a thread level with higher priority than LL ? i.e. in case we need some special task in the future that is needed to preempt LL

This will allow to organize thread priorities

Signed-off-by: Adrian Bonislawski <[email protected]>
@abonislawski abonislawski force-pushed the twb branch 2 times, most recently from 8d23266 to ad67a4e Compare February 24, 2025 11:53
@abonislawski abonislawski changed the title [DNM] schedule: add Tasks With Budget scheduler type schedule: add Tasks With Budget scheduler type Feb 24, 2025
@abonislawski abonislawski marked this pull request as ready for review February 24, 2025 12:46
@abonislawski
Copy link
Member Author

Removed hardcoded numbers + new kconfig CONFIG_TWB_IPC_TASK to select scheduler type for IPC so we can now merge TWB scheduler with base functionality

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few minor things from me.
@marcinszkudlinski can you review new version. Thanks !

@@ -210,6 +210,10 @@ if (CONFIG_SOC_SERIES_INTEL_ADSP_ACE)
${SOF_SRC_PATH}/schedule/zephyr_dp_schedule.c
)

zephyr_library_sources_ifdef(CONFIG_ZEPHYR_TWB_SCHEDULER
${SOF_SRC_PATH}/schedule/zephyr_twb_schedule.c
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonislawski we need to make this move since this Makefile will go away.

TwB for medium priority processing (e.g., IPC/IDC message handling),
- each TwB is scheduled as a separate preemptive thread,
- TwB has assigned budget for processing that is
  refreshed in each sys tick,
- TwB priority is dropped to low when budget is consumed

More details:
https://thesofproject.github.io/latest/architectures/firmware/sof-zephyr/mpp_layer/mpp_scheduling.html

Signed-off-by: Adrian Bonislawski <[email protected]>
Enable TWB for ACE1.5

Signed-off-by: Adrian Bonislawski <[email protected]>
This will add kconfig option to switch scheduler type for IPC task

Signed-off-by: Adrian Bonislawski <[email protected]>
@@ -210,6 +210,10 @@ if (CONFIG_SOC_SERIES_INTEL_ADSP_ACE)
${SOF_SRC_PATH}/schedule/zephyr_dp_schedule.c
)

zephyr_library_sources_ifdef(CONFIG_ZEPHYR_TWB_SCHEDULER
${SOF_SRC_PATH}/schedule/zephyr_twb_schedule.c
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we can get to this later.


default:
/* illegal state, serious defect, won't happen */
k_panic();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a panic with disabled interrupts is rather nasty - no panic dump in the log. Please drop the lock before that

}

/* must be called on the same core the task will be binded to */
assert(cpu_get_id() == core);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this an error?

@lgirdwood
Copy link
Member

@abonislawski I think we are almost there, one other thing that may help (due to current CI stability) would be to enable the TWB IPC on MTL and LNL too, this way we will get wider coverage in the CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants