Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched/semaphore: change semcount type to int #14625

Closed
wants to merge 1 commit into from

Conversation

zyfeier
Copy link
Contributor

@zyfeier zyfeier commented Nov 4, 2024

Summary

Fix the issue introduced by bug #14465.
Some memory on the ESP32 requires aligned access,
so the semcount type has been changed to int.

Impact

semaphore

Testing

bes board test pass

@github-actions github-actions bot added Area: OS Components OS Components issues Size: XS The size of the change in this PR is very small labels Nov 4, 2024
Some memory on the ESP32 requires aligned access,
so the semcount type has been changed to int.

Signed-off-by: zhangyuan29 <[email protected]>
@github-actions github-actions bot added Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Arch: xtensa Issues related to the Xtensa architecture Size: S The size of the change in this PR is small and removed Size: XS The size of the change in this PR is very small labels Nov 4, 2024
@nuttxpr
Copy link

nuttxpr commented Nov 4, 2024

[Experimental Bot, please feedback here]

No, this PR does not fully meet the NuttX requirements. While it provides some information, it lacks crucial details. Here's a breakdown:

  • Insufficient Summary: While it mentions the bug fix and the affected code, it's too brief. It needs to explain how changing semcount to int resolves the alignment issue. What was the original type? Why did that type cause a problem on the ESP32? What specific memory areas require aligned access? A link to bug sem: change sem wait to atomic operation #14465 is needed.

  • Incomplete Impact Assessment: Simply stating "semaphore" is not enough. Think about the broader implications:

    • Is new feature added? Is existing feature changed?: Existing feature changed (semaphore implementation).
    • Impact on user: Potentially YES if user code relies on the previous size of semcount. This needs investigation and explanation.
    • Impact on build: Potentially YES if the change affects structure packing or memory layout. This needs to be addressed.
    • Impact on hardware: YES, specifically ESP32.
    • Impact on documentation: Potentially YES if the API changes or behavior is modified.
    • Impact on security: Potentially YES if changing the size of semcount leads to overflows or other vulnerabilities. This needs careful consideration.
    • Impact on compatibility: Potentially YES, backward compatibility might be affected. Needs investigation and explicit mention.
  • Inadequate Testing Information: "bes board test pass" is too vague.

    • Build Host(s): Missing. Provide details about the development environment.
    • Target(s): While "bes board" is mentioned, the specific configuration is missing. Provide the full board:config string.
    • Testing logs before change: Missing entirely. Include logs demonstrating the issue caused by bug sem: change sem wait to atomic operation #14465.
    • Testing logs after change: Missing any meaningful detail. Show logs demonstrating that the problem is resolved and that the semaphore functionality still works correctly.

In short, the PR needs significantly more detail and evidence to meet the NuttX requirements. It needs to thoroughly explain the problem, the solution, and the impact of the change, supported by clear testing evidence.

@tmedicci
Copy link
Contributor

tmedicci commented Nov 5, 2024

Testing...

@tmedicci
Copy link
Contributor

tmedicci commented Nov 5, 2024

Testing...

Unfortunately, the tests' results are the same... still failing to boot...

@zyfeier
Copy link
Contributor Author

zyfeier commented Nov 6, 2024

Testing...

Unfortunately, the tests' results are the same... still failing to boot...

Hi, @tmedicci, we only have the ESP32-S3 boards. Can we reproduce this issue using the ESP32S3? Just to confirm, this issue only occurs when using the IRAM heap, correct?

@tmedicci
Copy link
Contributor

tmedicci commented Nov 6, 2024

Hi @zyfeier ,

Just to confirm, this issue only occurs when using the IRAM heap, correct?
Yes, exactly.

Hi, @tmedicci, we only have the ESP32-S3 boards. Can we reproduce this issue using the ESP32S3?

Unfortunately not. ESP32-S3 uses SRAM1 (which is accessible using either the instruction and data bus): there isn't a separate heap for loading apps (we just need to access the heap using its data address when copying data to avoid non-aligned access).

For ESP32, the text heap is used from SRAM0 (here). This memory bank is only accessible using the instruction bus and any non-aligned word would cause an exception. In fact, we capture this exception and repeat the operation using a aligned access. Check esp32_user.c: something in this PR broke this logic...

@tmedicci
Copy link
Contributor

Hi @zyfeier ,

Just to confirm, this issue only occurs when using the IRAM heap, correct?
Yes, exactly.

Hi, @tmedicci, we only have the ESP32-S3 boards. Can we reproduce this issue using the ESP32S3?

Unfortunately not. ESP32-S3 uses SRAM1 (which is accessible using either the instruction and data bus): there isn't a separate heap for loading apps (we just need to access the heap using its data address when copying data to avoid non-aligned access).

For ESP32, the text heap is used from SRAM0 (here). This memory bank is only accessible using the instruction bus and any non-aligned word would cause an exception. In fact, we capture this exception and repeat the operation using a aligned access. Check esp32_user.c: something in this PR broke this logic...

Hi @zyfeier , were you able to verify anything further?

@zyfeier
Copy link
Contributor Author

zyfeier commented Nov 13, 2024

aligned

Hi @zyfeier ,

Just to confirm, this issue only occurs when using the IRAM heap, correct?
Yes, exactly.

Hi, @tmedicci, we only have the ESP32-S3 boards. Can we reproduce this issue using the ESP32S3?

Unfortunately not. ESP32-S3 uses SRAM1 (which is accessible using either the instruction and data bus): there isn't a separate heap for loading apps (we just need to access the heap using its data address when copying data to avoid non-aligned access).
For ESP32, the text heap is used from SRAM0 (here). This memory bank is only accessible using the instruction bus and any non-aligned word would cause an exception. In fact, we capture this exception and repeat the operation using a aligned access. Check esp32_user.c: something in this PR broke this logic...

Hi @zyfeier , were you able to verify anything further?

Hi, @tmedicci , plz help to know, after this PR, is the address of semcount in the semaphore 4-byte aligned?

@yamt
Copy link
Contributor

yamt commented Nov 13, 2024

see also #14755

@zyfeier
Copy link
Contributor Author

zyfeier commented Nov 14, 2024

Hi, @tmedicci We get esp32 board, and will check soon.

@chirping78
Copy link
Contributor

As indicated in #14465 (comment) ,
this is not an alignment issue, but the iram and s32c1i compatibility issue.

@xiaoxiang781216
Copy link
Contributor

As indicated in #14465 (comment) , this is not an alignment issue, but the iram and s32c1i compatibility issue.

@tmedicci two solution we can come up:

  1. Switch to the irq disable/enable implementation like arch_atomic : Introduce CONFIG_LIBC_ARCH_ATOMIC #14803
  2. move mm_heap_s out of the special memory region to avoid the atomic operation

which one do you like?

yamt added a commit to yamt/incubator-nuttx that referenced this pull request Nov 15, 2024
This reverts commit befe298.

Because a few regressions have been reported and
it likely will take some time to fix them:

* for some configurations, semaphore can be used on the special
  memory region, where atomic access is not available.
  cf. apache#14625

* include/nuttx/lib/stdatomic.h is not compatible with
  the C11 semantics, which the change in question relies on.
  cf. apache#14755
@xiaoxiang781216
Copy link
Contributor

here is the patch for the first solution: #14625

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Nov 15, 2024

let's close this pr since it can't fix the problem.

xiaoxiang781216 pushed a commit that referenced this pull request Nov 20, 2024
This reverts commit befe298.

Because a few regressions have been reported and
it likely will take some time to fix them:

* for some configurations, semaphore can be used on the special
  memory region, where atomic access is not available.
  cf. #14625

* include/nuttx/lib/stdatomic.h is not compatible with
  the C11 semantics, which the change in question relies on.
  cf. #14755
JaeheeKwon pushed a commit to JaeheeKwon/nuttx that referenced this pull request Nov 28, 2024
This reverts commit befe298.

Because a few regressions have been reported and
it likely will take some time to fix them:

* for some configurations, semaphore can be used on the special
  memory region, where atomic access is not available.
  cf. apache#14625

* include/nuttx/lib/stdatomic.h is not compatible with
  the C11 semantics, which the change in question relies on.
  cf. apache#14755
linguini1 pushed a commit to CarletonURocketry/nuttx that referenced this pull request Jan 15, 2025
This reverts commit befe298.

Because a few regressions have been reported and
it likely will take some time to fix them:

* for some configurations, semaphore can be used on the special
  memory region, where atomic access is not available.
  cf. apache#14625

* include/nuttx/lib/stdatomic.h is not compatible with
  the C11 semantics, which the change in question relies on.
  cf. apache#14755
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Arch: xtensa Issues related to the Xtensa architecture Area: OS Components OS Components issues Size: S The size of the change in this PR is small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants