Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics and tracing for p2p stages #2050

Open
Mirko-von-Leipzig opened this issue Jun 3, 2024 · 0 comments
Open

Metrics and tracing for p2p stages #2050

Mirko-von-Leipzig opened this issue Jun 3, 2024 · 0 comments
Milestone

Comments

@Mirko-von-Leipzig
Copy link
Contributor

Mirko-von-Leipzig commented Jun 3, 2024

p2p sync is essentially a tree of processing tasks called stages, operating concurrently and connected via SPSC channels of some capactiy. Adding metrics and tracing to stages will greatly simplify debugging and identification of bottlenecks.

What will (presumably) occur is that some stages will be the slow point, causing its input channel to block the system. Knowing which stages are slow will show where to add parallelisation and/or increasing the channel capacity.


My take on this

Disclaimer - this is just my opinion without having attempted this, you might come to a different conclusion.

We are interested in at least three pieces of information for each stage

  1. Which stage is this?
  2. Processing time
  3. Channel fullness

(1) - Use a &'static str to identify each stage. We can add this to the Stage trait, but this fails to uniquely ID a stage if there are duplicates involved. One alternative is to add it as an additional input parameter to the pipe function. This works, though it would also be nice if one could include some tree-like ID that would allow a system diagram UI to be drawn - but this is completely unecessary, just nice to explain visually what's going on. One could manually assign these IDs within stage names, but it should also be possible to do this at compile time if one adds functionality to the channel type (to pass on this type info somehow). But this is overkill.

(2) - this can just be a simple timer inside the pipe function which measures the execution time of the Stage in each iteration. Only issue is that some stages occur after try_buffer calls which means they execute over a vector of items, making the processing times incomparable. It would be possible to account for this by creating a BufferedReceiver type, but now we're adding more "duplicate" types just so we can log a bit better. I would hesitate to do this until the sync framework has proven mature. We might need to add many such types. Or none at all. Or maybe its trivial to perform this with a wrapper type and deref..

(3) - A channel's "fullness" can be determined using the capacity and max_capacity methods.

I'm unsure about the trace level - probably debug? You might also want to select certain stages.

DEBUG stage=block_hash_verification time=10ms in_queue=3/10 Item processed

We should also create a template to display these stats, probably on three line-charts (one per information piece).

@Mirko-von-Leipzig Mirko-von-Leipzig added this to the P2P sync milestone Jun 3, 2024
@Mirko-von-Leipzig Mirko-von-Leipzig self-assigned this Jun 4, 2024
@Mirko-von-Leipzig Mirko-von-Leipzig removed their assignment Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant