Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spin_lock_irqsave+sched_lock #14578

Merged
merged 3 commits into from
Jan 23, 2025
Merged

spin_lock_irqsave+sched_lock #14578

merged 3 commits into from
Jan 23, 2025

Conversation

hujun260
Copy link
Contributor

@hujun260 hujun260 commented Oct 31, 2024

Summary

1 Accelerated the implementation of sched_lock, remove enter_critical_section in sched_lock and
only enter_critical_section when task scheduling is required.
2 we add sched_lock_wo_note/sched_unlock_wo_note and it does not perform instrumentation logic
3 We aim to replace big locks with smaller ones. So we will use spin_lock_irqsave extensively to
replace enter_critical_section in the subsequent process. We imitate the implementation of Linux
by adding sched_lock to spin_lock_irqsave in order to address scenarios where sem_post occurs
within spin_lock_irqsave, which can lead to spinlock failures and deadlocks.

The entire implementation process includes:

1 spin_lock_irqsave + sched_lock
2 spin_lock/rw/spin_trylock + sched_lock
3 enter_critical_section + sched_lock
We are currently implementing the first step.

Impact

spinlock and sched_lock

Testing

Build Host:

  • OS: Ubuntu 20.04
  • CPU: x86_64
  • Compiler: GCC 9.4.0

Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic
-machine virt,virtualization=on,gic-version=3
-net none -chardev stdio,id=con,mux=on -serial chardev:con
-mon chardev=con,mode=readline -kernel ./nuttx

@github-actions github-actions bot added Area: OS Components OS Components issues Size: L The size of the change in this PR is large labels Oct 31, 2024
@github-actions github-actions bot added the Arch: xtensa Issues related to the Xtensa architecture label Oct 31, 2024
include/nuttx/spinlock.h Outdated Show resolved Hide resolved
sched/sched/sched.h Outdated Show resolved Hide resolved
include/sched.h Outdated Show resolved Hide resolved
sched/sched/sched.h Outdated Show resolved Hide resolved
sched/sched/sched_lock.c Outdated Show resolved Hide resolved
sched/sched/sched_lock.c Outdated Show resolved Hide resolved
sched/sched/sched_lock.c Outdated Show resolved Hide resolved
sched/sched/sched_unlock.c Outdated Show resolved Hide resolved
sched/sched/sched_unlock.c Outdated Show resolved Hide resolved
sched/sched/sched_unlock.c Outdated Show resolved Hide resolved
include/nuttx/spinlock.h Outdated Show resolved Hide resolved
@xiaoxiang781216 xiaoxiang781216 linked an issue Nov 18, 2024 that may be closed by this pull request
@xiaoxiang781216
Copy link
Contributor

@patacongo please review this patch which fix the long issue about sched lock.

sched/sched/sched_lock.c Outdated Show resolved Hide resolved
sched/sched/sched_lock.c Outdated Show resolved Hide resolved
sched/sched/sched_lock.c Outdated Show resolved Hide resolved
@hujun260 hujun260 force-pushed the apache_3 branch 2 times, most recently from 87f9b88 to cde599d Compare November 20, 2024 01:38
@github-actions github-actions bot added the Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture label Nov 20, 2024
include/nuttx/spinlock.h Outdated Show resolved Hide resolved
sched/sched/sched_lock.c Outdated Show resolved Hide resolved
sched/sched/sched_lock.c Show resolved Hide resolved
sched/sched/sched_lock.c Outdated Show resolved Hide resolved
sched/sched/sched_lock.c Show resolved Hide resolved
sched/sched/sched_lock.c Show resolved Hide resolved
sched/sched/sched_lock.c Show resolved Hide resolved
sched/sched/sched_unlock.c Outdated Show resolved Hide resolved
sched/sched/sched_unlock.c Show resolved Hide resolved
sched/sched/sched_unlock.c Outdated Show resolved Hide resolved
sched/sched/sched_unlock.c Show resolved Hide resolved
sched/sched/sched_unlock.c Outdated Show resolved Hide resolved
@hujun260 hujun260 force-pushed the apache_3 branch 2 times, most recently from 5e196c5 to 9cf3e7f Compare January 23, 2025 02:35
sched/sched/sched_unlock.c Show resolved Hide resolved
sched/sched/sched_unlock.c Outdated Show resolved Hide resolved
sched/sched/sched_unlock.c Show resolved Hide resolved
sched/sched/sched_unlock.c Outdated Show resolved Hide resolved
sched/sched/sched_unlock.c Show resolved Hide resolved
…ed_[un]lock

reason:
1 Accelerated the implementation of sched_lock, remove enter_critical_section in sched_lock and
only enter_critical_section when task scheduling is required.
2 we add sched_lock_wo_note/sched_unlock_wo_note and it does not perform instrumentation logic

Signed-off-by: hujun5 <[email protected]>
reason:
We aim to replace big locks with smaller ones. So we will use spin_lock_irqsave extensively to
replace enter_critical_section in the subsequent process. We imitate the implementation of Linux
by adding sched_lock to spin_lock_irqsave in order to address scenarios where sem_post occurs
within spin_lock_irqsave, which can lead to spinlock failures and deadlocks.

Signed-off-by: hujun5 <[email protected]>
@xiaoxiang781216 xiaoxiang781216 merged commit b69111d into apache:master Jan 23, 2025
39 checks passed
@anchao
Copy link
Contributor

anchao commented Jan 23, 2025

@hujun260 Great work, I think we can prepare a report showing the benefits of SMP mode after all spinlock changes

@hujun260 hujun260 deleted the apache_3 branch January 24, 2025 02:05
anchao added a commit to anchao/nuttx that referenced this pull request Jan 24, 2025
after below change merge to kernel, spin_lock() will turn off preemption by default,
but this change is not applicable to all scenarios. The locations in the kernel that
use spin_lock() extensively only require short critical sections and do not trigger
scheduling, which leads to serious performance degradation of NuttX in AMP mode.

In this PR, I try to expose similar problems and hope that each subsystem will carefully check the code coverage

apache#14578
|commit b69111d
|Author: hujun5 <[email protected]>
|Date:   Thu Jan 23 16:14:18 2025 +0800
|
|    spinlock: add sched_lock to spin_lock_irqsave
|
|    reason:
|    We aim to replace big locks with smaller ones. So we will use spin_lock_irqsave extensively to
|    replace enter_critical_section in the subsequent process. We imitate the implementation of Linux
|    by adding sched_lock to spin_lock_irqsave in order to address scenarios where sem_post occurs
|    within spin_lock_irqsave, which can lead to spinlock failures and deadlocks.
|
|    Signed-off-by: hujun5 <[email protected]>

Signed-off-by: chao an <[email protected]>
xiaoxiang781216 pushed a commit that referenced this pull request Jan 24, 2025
after below change merge to kernel, spin_lock() will turn off preemption by default,
but this change is not applicable to all scenarios. The locations in the kernel that
use spin_lock() extensively only require short critical sections and do not trigger
scheduling, which leads to serious performance degradation of NuttX in AMP mode.

In this PR, I try to expose similar problems and hope that each subsystem will carefully check the code coverage

#14578
|commit b69111d
|Author: hujun5 <[email protected]>
|Date:   Thu Jan 23 16:14:18 2025 +0800
|
|    spinlock: add sched_lock to spin_lock_irqsave
|
|    reason:
|    We aim to replace big locks with smaller ones. So we will use spin_lock_irqsave extensively to
|    replace enter_critical_section in the subsequent process. We imitate the implementation of Linux
|    by adding sched_lock to spin_lock_irqsave in order to address scenarios where sem_post occurs
|    within spin_lock_irqsave, which can lead to spinlock failures and deadlocks.
|
|    Signed-off-by: hujun5 <[email protected]>

Signed-off-by: chao an <[email protected]>
@eren-terzioglu
Copy link
Contributor

Hi,

Seems this PR is crashing most of our devices. My local tests and internal tests seems failing due to b69111d16a2a330fa272af8175c832e08881844b commit. Could you have a look it?

Here is the open issue #15688

@tmedicci
Copy link
Contributor

Hi,

Seems this PR is crashing most of our devices. My local tests and internal tests seems failing due to b69111d16a2a330fa272af8175c832e08881844b commit. Could you have a look it?

Here is the open issue #15688

Is this affecting all Espressif devices, @eren-terzioglu ?

Can you please take a look as soon as possible, @hujun260 ?

@eren-terzioglu
Copy link
Contributor

eren-terzioglu commented Jan 24, 2025

Hi,
Seems this PR is crashing most of our devices. My local tests and internal tests seems failing due to b69111d16a2a330fa272af8175c832e08881844b commit. Could you have a look it?
Here is the open issue #15688

Is this affecting all Espressif devices, @eren-terzioglu ?

Can you please take a look as soon as possible, @hujun260 ?

As far as I saw failing defconfigs are:

Xtensa

  • esp32 - Somehow ethernet-kit defconfigs seems fine. For devkitc defconfigs: ble, psram, smp and some other defconfigs are failing
  • esp32s2 - Except mcuboot_nsh defconfig, all failing.
  • esp32s3 - Except mcuboot_nsh defconfig, all failing.

Risc-V

  • esp32c3 - Except mcuboot_nsh defconfig, all failing.
  • esp32c6 - Except mcuboot_nsh defconfig, all failing.
  • esp32h2 - Except mcuboot_nsh defconfig, all failing.

@hujun260
Copy link
Contributor Author

ok, i will look into this issue

hujun260 added a commit to hujun260/nuttx that referenced this pull request Jan 25, 2025
…69111d of apache#14578

reason:
Due to the addition of sched_lock in the spinlock, using a spinlock in the *cpustart file during the boot phase
is quite special. CPU0 waits for CPU1 to start up, using a spinlock as a multi-core synchronization strategy.
However, the matching calls are not made within the same task,
resulting in a mismatch in the scheduler lock count and preventing the system from booting.
The sequence is:
CPU0 spin_lock, spin_lock, spin_unlock;
CPU1 spin_unlock.
CPU0 and CPU1 are running different tasks.

Signed-off-by: hujun5 <[email protected]>
hujun260 added a commit to hujun260/nuttx that referenced this pull request Jan 25, 2025
…69111d of apache#14578

reason:
Due to the addition of sched_lock in the spinlock, using a spinlock in the *cpustart file during the boot phase
is quite special. CPU0 waits for CPU1 to start up, using a spinlock as a multi-core synchronization strategy.
However, the matching calls are not made within the same task,
resulting in a mismatch in the scheduler lock count and preventing the system from booting.
The sequence is:
CPU0 spin_lock, spin_lock, spin_unlock;
CPU1 spin_unlock.
CPU0 and CPU1 are running different tasks.

Signed-off-by: hujun5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arch: xtensa Issues related to the Xtensa architecture Area: Drivers Drivers issues Area: OS Components OS Components issues Size: L The size of the change in this PR is large
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Thread holding spinlock can be blocked. Thread can be unexpectedly suspended within a critical section.
7 participants