Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sync): Sync refactor #495

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open

feat(sync): Sync refactor #495

wants to merge 33 commits into from

Conversation

cchudant
Copy link
Member

@cchudant cchudant commented Feb 14, 2025

Sync is the service that is responsible for importing blocks into the node. It is currently only run in full-node mode, as we do not need to sync any blocks when running in sequencer (block production) mode.

Motivation

Peer-to-peer

Peer-to-peer block sync is a feature that will be required for all starknet full nodes to implement. We are already very late to the party, as pathfinder, juno and papyrus have had p2p sync capabilities for more than 6 months now.
This is also of great interest to us at KasarLabs for our upcoming launch of the Quaza, where we want to showcase peer-to-peer.

Our peer-to-peer implementation is still experimental, which is why we are not yet considering getting it merged upstream into Madara. Since it touches a big part of the core, we need to merge its architectural changes in a separate PR to keep the P2P feature merge-ready for the Madara Core codebase at any time. With that in mind, the Madara alliance has agreed on a flow for upstreaming stuff into the main branch: if there are any core modifications, we’ll try to merge them upstream. But this is a huge job because we want to minimize breakage, which means the new refactored code should work just like the old one and keep or even improve all the test coverage.

The details of that flow are still being debated, but this pull request could be the experiment that tell us how well that goes :)

As such, this pull request contains

  • removal of the mc-block-import crate and mc-sync in its current form.
  • a new mc-sync crate
  • some changes around the database as to allow storing of partially downloaded blocks.
  • some plumbing around block-production has been lightly changed, and new types in primitives are added to make the code cleaner.

Moreover, the other node teams (including us!) are currently also working on consensus (tendermint) integration, which has a lot of overlap with this work. In madara's case at least, we are integrating InformalSystems's Malachite implementation.

Reorg handling

This refactor is a very big step forward in terms of reorg handling, and has been designed for that from the start. The biggest win here is that the pipeline is now centrally controllable from a single point. This means that reorg-handling can be implemented on top of this pull request without madara becoming a big spaghetti mess. More on that in the Details section of this pull request description.

Performance

The current sync service has okay performance but it can be improved a lot more. One of the biggest slowdowns during sync is the fact that class compilation slows the whole pipeline. Class definitions don't happen at every block, but when they do, they make the pipeline wait on them to finish and you cannot parallelize the compilation of a single class. This results in big spikes in performance and a lot of cpu-underutilization.

This new sync implementation does a much better job at amortizing the cost of class compilation on a lot more blocks - it does this by splitting classes sync on a new different pipeline, which is now completely free to lag behind (up to a point) compared to the other sync pipelines.

I have mesured on my Apple M4 pro, sync to 60k blocks on Sepolia now takes 15m35 = 64.17 blocks/s. That is an immense speedup. We are now limited by the rate-limiting of the starkware gateway (we have a gateway key, but we still hit some sort of limit as syncing from a local gateway can hit much higher blocks/s).
We are also limited on some ranges of Sepolia blocks by the bonsai-trie global trie insertion, which I really want to optimize a lot more at some point. We are still not hitting 100% cpu on every core all of the time yet.

In details

The current sync service works by having 3 tasks:
Untitled-2025-02-04-1118(2)

You can see this in action in mc-sync l2.rs.

This simplistic approach has correct performance, but it is not suitable for peer-to-peer sync. P2p sync has a number of unique challenges:

  • Any peer can be lying to you. You need to check everything, maintain peer scores, and punish/disconnect the peers accordingly.
  • The p2p specs have a single-request multi-response design, where you send a request, and you get a stream of responses for a range of blocks.
  • The p2p request/stream response "endpoints" (let's call them that), are separated into: headers, transactions and receipts, state diff, classes, events.

Moreover, you cannot verify that eg. an event is correct unless you have already received the header for that block. Classes are different; they cannot be checked using the class commitment in the header, as there is no such field. Instead, you have to check them using the fields in the block state diff, which in turn you can check using the state diff commitment in the header.

The dependency diagram for verification in the p2p specs looks like this

Untitled-2025-02-04-1118(3)

Note: Backward sync is needed as it is not guaranteed, as per the specs, that the signature field will be populated for old blocks that are not yet on L1. This has to do with the fact that the starkware feeder-gateway does not have a static pubkey.

Second note: When tendermint consensus will be integrated, this signature verification step (forward sync) will check consensus signatures instead

As you can see, our requirements for p2p sync are quite a bit more complex than the the current madara sync, and it cannot fit into the existing structure. Thus, we decided to refactor all of it into nice abstractions that supports both peer-to-peer sync as well as gateway sync.

In this refactor, a Pipeline refers to a Parallel-over-N-blocks step + Sequential one block at a time step combination.
Our peer-to-peer forward sync implementation (not present in this pull request) has a pipeline for each of the "p2p stream endpoints" (mentioned earlier), and a pipeline for Apply State that only has a sequential step, which is when we update the global tries.

This PipelineController abstraction supports retries in the sequential step, which, while it is not used by the Gateway sync implementation present in this pull request, is very important for peer to peer as there are cases (particularily in the headers pipeline) where we cannot know whether a peer has returned wrong data until the sequential step. In that case, we want to reschedule a parallel step for that block and start over with a new peer. This is the main reason this abstraction looks the way it does.

On top of that, we have a ForwardSync abstraction, which is responsible for responsible for following L1, and also running a "probe" - a small future that returns the highest known block on the network/gateway.

This is how ForwardSync is used by the new Gateway sync. As you can see, the abstractions fit together neatly and it becomes easy to drive all of the pipelines. I believe this is a big deal, as this makes the Reorg support easy to support in the future: we have a single centralized place that has direct control over all of pipelines, and we can abort any fetch or task cleanly there at will.

Other details include:

  • The backend now has different sync-tip (latest block_n counter) for every pipelines.
  • The deletion of the block-import implies that the global tries are updated elsewhere now. They're updated in mc-db - the code is cleaner now as more of the logic is shared with block-production.
  • In addition, closing a block (going from pending -> closed block) is now implemented in primitives. This makes block-production a lot cleaner as well. Moreover, we use PendingBlocks a lot more as a general construct for a "non-closed block".

Here is the implementation of ForwardSync in our p2p sync (not in this PR) implementation, if you are curious.

TODO

Few things that I am still missing before this pull request can be merged, but it can still be reviewed.

  • More testing: re-add the feeder gateway mocking test.
  • Fix pre-v0.12.3 block hash verification.
  • Add the --n-blocks-to-sync -> --sync-stop-at breaking change in the CHANGELOG.md file

@Trantorian1 Trantorian1 added the feature Request for new feature or enhancement label Feb 14, 2025
@Trantorian1 Trantorian1 added node Related to the full node implementation research Research and exploration required before implementation labels Feb 14, 2025
@cchudant
Copy link
Member Author

cchudant commented Feb 14, 2025

Breaking changes:

  • DB schema is different.
  • I have removed the --n-blocks-to-sync <number of blocks> cli arg, because I believe this arg is better expressed as --sync-stop-at <block_n>. I believe sync-stop-at is the objectively better version of that cli argument. What do you think? I do not see any use-case for n-blocks-to-sync, but if anyone really wants it I can add it back and we can have both.
  • I have removed --no-sync-polling because it doesn't quite make sense in this new architecture.

@antiyro antiyro changed the title Sync refactor feat(sync): Sync refactor Feb 17, 2025
@cchudant cchudant marked this pull request as ready for review February 17, 2025 17:02
@Trantorian1 Trantorian1 self-requested a review February 18, 2025 15:52
@@ -123,7 +110,8 @@ impl MadaraBackend {

#[tracing::instrument(skip(self), fields(module = "BlockDB"))]
pub fn get_latest_block_n(&self) -> Result<Option<u64>> {
get_latest_block_n(&self.db)
Ok(self.head_status().latest_full_block_n())
// get_latest_block_n(&self.db)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// get_latest_block_n(&self.db)

Comment on lines +10 to +13
pub enum RawDbBlockId {
Pending,
Number(u64),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not implement this as a wrapper or alias around DbBlockId?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that you can write RawDbBlockId::Pending and DbBlockId::Pending

Ok(Arc::new(backend))
}

pub async fn on_block(&self, block_n: u64) -> anyhow::Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a doc comemnt explaining what this function does in more details? Seems pretty vague atm.

Comment on lines 73 to 80
impl fmt::Display for ClassType {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::Sierra => write!(f, "Sierra"),
Self::Legacy => write!(f, "Legacy"),
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just rely on Debug?

Felt::from_bytes_checked(&unpacked)
}

fn from_bytes_checked(b: &[u8; 32]) -> Result<Felt, MalformatedFelt> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn from_bytes_checked(b: &[u8; 32]) -> Result<Felt, MalformatedFelt> {
fn from_bytes_be_checked(b: &[u8; 32]) -> Result<Felt, MalformatedFelt> {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should and tests for the slice and bytes conversions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah uh this is only going to be used for p2p, i'll remove it from this PR i think
but I am pretty sure I had tests for that - if it's not in that PR I don't know where they went :)

transaction_hash,
actual_fee,
messages_sent,
events,
execution_resources,
execution_result,
message_hash: message_hash.unwrap(), // it's a safe unwrap because it would've panicked earlier if it was Err
// This should not panic unless blockifier gives a garbage receipt.
// TODO: we should have a soft error here just in case.
Copy link
Collaborator

@Trantorian1 Trantorian1 Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make this a soft error as part of this PR then no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not, or it's going to get painful when charpa updates blockifier.

@@ -22,7 +22,7 @@ impl Default for RayonPool {
impl RayonPool {
pub fn new() -> Self {
let n_cores = thread::available_parallelism().expect("Getting the number of cores").get();
let max_tasks = n_cores * 2;
let max_tasks = n_cores * 128;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the sudden change to this constant? It feels like there should be more info around this magic number.

Copy link
Member Author

@cchudant cchudant Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly just vibes 🤷
i do not know why I changed that, to be perfectly honest. I think I wanted to test the change and did not find any evidence that it's faster or slower, and forgot to remove it
the only important thing here is that it's bounded, i don't think the actual number of the bound matters that much

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean then again with that high a number of tasks performance might not have changed because no more than 2 * cores tasks were being processed at once so this was still under the bound of the previous limit.

It feels like 128 * cores leaves more for DOS though. Especially if this didn't result in any noticeable change in performance maybe we might want to revert this change. Wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true

Copy link
Collaborator

@Trantorian1 Trantorian1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize in advance, most of this review is about documentation X) (among a few clarifying questions and some nitpicks).

My worry is that since this PR introduces some pretty large and sweeping architectural changes that it will be easy for new and existing contributors alike to get lost in the codebase unless they have done a full review of this PR themselves. This is a lot of context to require, and it feels like we can make this easier by documenting the new sync architecture as part of the code. IMO this issue's opening comment should be added to the codebase, or at least linked, as it is an excelled overview of these changes.

The code seems good to me, though it is hard to review logic at a very granular level due to the size of this PR, but changes seem consistent with what was already present in the previous sync model.

I definitely feel like this makes tests all the more important, and there are some very important areas of the codebase which I think need more of those (though I think we are testing all the sync pipelines? Still feels like some db methods around verify and update in particular should be tested in isolation + maybe test more failing cases?). Please let me know what you think.

Overall, this is definitely an amazing improvement to the current sync pipeline and some very impressive work :)

"--n-blocks-to-sync",
"20",
"--sync-stop-at",
"19",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we still sync block 20 with --sync-stop-at 19 or that the previous version would actually stop at block 19?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the previous version stops at 19, since there are 20 blocks between 0..=19, as we have to import block 0 too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is much clearer then

@Trantorian1
Copy link
Collaborator

Well, for some reason github running at 1fps decided to split my review into two, strange 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Request for new feature or enhancement node Related to the full node implementation research Research and exploration required before implementation
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

2 participants