Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement core1 #112

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

Implement core1 #112

wants to merge 27 commits into from

Conversation

mingpepe
Copy link
Contributor

@mingpepe mingpepe commented Dec 12, 2022

#109

Test ok with pico-example code/multicore

Interpolator/Divider/Spinlock are not tested or implemented since I have no suitable code to verify.

@mingpepe mingpepe changed the title Implement core1 #109 Implement core1 Dec 12, 2022
@urish
Copy link
Contributor

urish commented Dec 12, 2022

Thank you!

These is a pretty list big of changes, so it'll probably take me some time to go over everything.

Regarding Interpolator/Divider - as far as I remember, the only thing that we need is to have different one per core, so if two cores are using the divider / interpolator at the same time, they don't interfere with eachother:

image

So ideally, we'd want a set of Interpolator per core. It might make more sense to also move the divider logic into its own object, and then it'd be easier to create a dedicated copy of it for each core.

@mingpepe
Copy link
Contributor Author

mingpepe commented Dec 13, 2022

If interpolator and divider work properly for single core, I will work on it later to let each core has individual function.

But I do not have multi-core code about them to test, only single core(divider, interp).

@urish
Copy link
Contributor

urish commented Jan 1, 2023

Thanks again! Is this ready for review or not yet?

@mingpepe
Copy link
Contributor Author

mingpepe commented Jan 1, 2023

Yes, it's ready.

@urish
Copy link
Contributor

urish commented Jan 16, 2023

Now I just need to find a few hours to go through it :)

src/rp2040.ts Outdated Show resolved Hide resolved
@mingpepe
Copy link
Contributor Author

mingpepe commented Feb 6, 2023

@urish Do you have time to check this PR?

@urish
Copy link
Contributor

urish commented Feb 11, 2023

I added it to my todo list, I will make an effort to allocate some time this week or the next week.

Ideally, I'd also love to have the multicore test running as part of the CI. I've set up a GitHub action to run MicroPython tests two weeks ago, and hope to get some pico-sdk examples to run as part of the CI too.

}

stop() {
this.stopped = true;
this.core0.stopped = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, stopped is a bit of a kludge for controlling the emulation machinery (that's running via setTimeout()) from outside code; it's not a feature of the actual RP2040 hardware. Since the cores are not running as callbacks individually, there's probably no need to give each core a stopped attribute.

That said, I'd just remove the setTimeout() stuff from rp2040js altogether; it seems to be some leftover from Wokwi. It'd probably be better to just have a rp2040.step() function and leave everything else to the calling code. (but this might be outside the scope of this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stopped flag for each core is necessary. Take pico-debug as an example, it use one core to debug another core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pico-debug project uses the SYSCFG DBGFORCE register, basically rerouting the SWD functionality built into the (real, hardware) Cortex cores via USB. This is unrelated to stopped, and SWD functionality isn't implemented at all in rp2040js.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I want to say is the 2 cores may be 1 running and 1 stopped. And you are right, pico-debug use SWD which is not related to our stopped flag. But you can check gdb server in this project which only stop core0 while debugging.

if (!this.core0.stopped && !this.core0.waiting) {
idle = false;
this.isCore0Running = true;
this.core0.executeInstruction();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This eventually gets the cores out of sync. Imagine core0 executing a number of 5 cycle instructions and core 1 executing a number of 1 cycle instructions, let things run a while, then look at the cycles of each core (which will have diverged a lot).

You'll need something like https://github.com/c1570/rp2040js/blob/3a754941c169d7215d8d97112c01a9d1f9d4e2c9/demo/dual-mcu.ts#L102 here.

Copy link
Contributor Author

@mingpepe mingpepe Mar 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the two cores execution cycle may diverge if they run a while and the stopped or waiting flag raised. Currently I do not have good solution for it. But what you did may not be helpful(if I understand correctly).

If one core is stopped or waiting, these state will not changed until the next setTimeout call since there is only 1 thread in our execution environment. Then your code to call core.step actually do nothing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this is needed in cortex-m0-core.ts:

  executeInstruction() {
    if (this.waiting) {
      this.cycles++;
      return;
    }
    ...

...and something like this in rp2040.ts (this is my local version with #117 already merged):

  step() {
    let core0StartCycles = this.core0.cycles;
    this.isCore0Running = true;
    this.core0.executeInstruction();
    this.isCore0Running = false;
    while(this.core1.cycles < this.core0.cycles) {
      this.core1.executeInstruction();
    }
    for(let cycle = core0StartCycles; cycle < this.core0.cycles; cycle++) {
      this.pio[0].step();
      this.pio[1].step();
    }
  }

@@ -48,7 +49,8 @@ export class RP2040 {
readonly usbDPRAM = new Uint8Array(4 * KB);
readonly usbDPRAMView = new DataView(this.usbDPRAM.buffer);

readonly core = new CortexM0Core(this);
readonly core0 = new CortexM0Core(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using an array/list for the cores? Seems that would make quite a few places of the code simpler, and RP4040 might be just around the corner anyways. ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the beginning, I use array of cores, but found so may cases need to handle for each core(like notify another core interrupt or operations no SIO) which makes the benefit of array less and make the code not easy to read(compare to use 2 variable). It's just my opionion.

@c1570
Copy link
Contributor

c1570 commented Mar 12, 2023

Edit:

  • cmake -DPICO_COPY_TO_RAM=1 .. will break core1, even for programs using only core0. Not sure why yet.

@c1570
Copy link
Contributor

c1570 commented Mar 14, 2023

If I understand correctly, the bootloader leaves core1 in waiting state, but emulator-run.ts jumps into crt0 immediately, bypassing the bootloader. This makes core1 reach the code at https://github.com/raspberrypi/pico-sdk/blob/f396d05f8252d4670d4ea05c8b7ac938ef0cd381/src/rp2_common/pico_standard_link/crt0.S#L315 where things go wrong (possibly because core0 is still copying things to RAM at that point).

Adding mcu.core1.waiting = true; to emulator-run seems to fix the issue. :)

@urish
Copy link
Contributor

urish commented Mar 14, 2023

Thanks for reviewing this @c1570! Unfortunately, I hadn't had a chance yet to review it myself.

I got the Pico SDK CI to work: https://github.com/wokwi/rp2040js/blob/master/.github/workflows/ci-pico-sdk.yml, which can be a good starting point for adding a multicore test to the CI.

@c1570
Copy link
Contributor

c1570 commented Mar 14, 2023

@urish How about just adding the needed example .hex files and use those? This would make tests easier/faster to run for users and allow testing things like -DPICO_COPY_TO_RAM more easily.

@mingpepe
Copy link
Contributor Author

mingpepe commented Mar 18, 2023

@1570 Where is the code to bypass bootloader in emulator-run.ts? I did not ever use -DPICO_COPY_TO_RAM before.

@c1570
Copy link
Contributor

c1570 commented Mar 18, 2023

@1570 Where is the code to bypass bootloader in emulator-run.ts? I did not ever use -DPICO_COPY_TO_RAM before.

mcu.core.PC = 0x10000000;
jumps to 0x10000000 (the hex file contents) but the bootrom is at 0x0 basically (if I understand correctly).

@mingpepe
Copy link
Contributor Author

@1570 Where is the code to bypass bootloader in emulator-run.ts? I did not ever use -DPICO_COPY_TO_RAM before.

mcu.core.PC = 0x10000000;

jumps to 0x10000000 (the hex file contents) but the bootrom is at 0x0 basically (if I understand correctly).

Thanks.

@urish urish self-requested a review March 20, 2023 08:18
@urish urish self-assigned this Mar 20, 2023
@urish urish added the enhancement New feature or request label Mar 20, 2023
Copy link
Contributor

@urish urish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for working on this!

Did a few tests, ran all the multicore examples, and they seem to work well. Also did a quick review of the code, left one minor comment.

Have you looked at adding support for debugging the 2nd core using GDB?

I believe the way to go is to expose each code as a different execution thread to GDB. This is essential so in the future we'll be able to debug issues involving both cores.

@@ -53,9 +54,9 @@ export class RPPPB extends BasePeripheral implements Peripheral {
systickReload = 0;
systickTimer: IClockTimer | null = null;

readUint32(offset: number) {
readUint32ViaCore(offset: number, _core: Core) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please call it coreIndex (or similar) instead of _core (same for writeUint32ViaCore)

@mingpepe
Copy link
Contributor Author

@urish. To support for 2nd core with GDB, I just slightly traced code. Here are some issues to deal with.

  1. Which gdb server configuration do we need?
    a) The official config use smp and hwthread which shows only 1 TCP port to gdb client
    b) Expose 2 TCP ports to gdb clients => Two instance of GDBTCPServer with different port in our case.
  2. Currently timers in RealTimeClock do not distinguish for which core owned. When one core is stopped, I'm not sure if it's timers should keep going or should be paused (like DMA).

For gdb server config a), when 1 break point hit, two cores are stopped (I do not know if it can config to let another core free run). I think in this way some gdb commands currently not implemented (but I do not know the details about gdb command yet). This may be complicated compare to config b) but let me debug my code with VSCode extension which is good for me. For config b), I'm not sure if VSCode extension can achieve connecting 2 gdb server.

@urish
Copy link
Contributor

urish commented Mar 21, 2023

Thanks for looking into this! Here are my thoughts:

  1. Which gdb server configuration do we need? single GDB port, multiple hw threads.
  2. From my experience, pausing everything (both cores & all timers) makes life much easier when debugging. You want to have everything as predictable as possible, so if you set a breakpoint and step over, you want to get consistent results regardless of how fast you're stepping over.

So config a), and stop everything when we hit a breakpoint (or debugger sends a break command).

@mingpepe
Copy link
Contributor Author

If 2 cores always stopped simultaneously, do we need multi-thread to implement this feature? As I understanding, one thread is enough.

@urish
Copy link
Contributor

urish commented Mar 21, 2023

You want some way to show GDB the state of each core - so it would show you the call stack of each core, and let you step the core that you are interested it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants