Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

litep2p: Provide partial results to speedup GetRecord queries #7099

Merged
merged 9 commits into from
Jan 15, 2025

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Jan 9, 2025

This PR provides the partial results of the GetRecord kademlia query.

This significantly improves the authority discovery records, from ~37 minutes to ~2/3 minutes.
In contrast, libp2p discovers authority records in around ~10 minutes.

The authority discovery was slow because litep2p provided the records only after the Kademlia query was completed. A normal Kademlia query completes in around 40 seconds to a few minutes.
In this PR, partial records are provided as soon as they are discovered from the network.

Testing Done

Started a node in Kusama with --validator and litep2p backend.
The node discovered 996/1000 authority records in ~ 1 minute 45 seconds.

Screenshot 2025-01-09 at 12 26 08

Before & After

In this image, on the left side is libp2p, in the middle litep2p without this PR, on the right litep2p with this PR

Screenshot 2025-01-07 at 17 57 56

Closes: #7077

cc @paritytech/networking

@lexnv lexnv added T0-node This PR/Issue is related to the topic “node”. I5-enhancement An additional feature request. labels Jan 9, 2025
@lexnv lexnv self-assigned this Jan 9, 2025
Cargo.toml Outdated
@@ -850,7 +850,7 @@ linked-hash-map = { version = "0.5.4" }
linked_hash_set = { version = "0.1.4" }
linregress = { version = "0.5.1" }
lite-json = { version = "0.2.0", default-features = false }
litep2p = { version = "0.8.4", features = ["websocket"] }
litep2p = { git = "https://github.com/paritytech/litep2p.git", branch = "lexnv/kad-found-records", features = ["websocket"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will do one soon, wanted to double-check first on our Kusama validator that this indeed solves the problem 🙏

@lexnv
Copy link
Contributor Author

lexnv commented Jan 10, 2025

CPU

Litep2p without this PR was running before ~7:15 (spike—left side). On the right side is Litep2p with this PR.

The CPU consumption looks overall stable, with some fluctuations at ~10-minute intervals.
The peak CPU consumption has increased by ~6%, from 171% before to 182% current.

Considering the network's randomness and the ~18x speedup improvement in authority-discovery records, this small tradeoff (if any) appears justified.

Screenshot 2025-01-10 at 12 30 51

Triage Report

The Some network error occurred when fetching erasure chunk warning appears in the first 2 minutes of starting the node. In contrast, before this PR the warning appeared for ~30 minutes.

Repo Count Level Triage report
https://github.com/paritytech/polkadot-sdk/ 437 warn_if_frequent Some network error occurred when fetching erasure chunk
https://github.com/paritytech/polkadot-sdk/ 25 warn Recovery of available data failed.
https://github.com/paritytech/polkadot-sdk/ 25 warn Data unavailable for candidate .*
https://github.com/paritytech/polkadot-sdk/ 12 warn Report .: . to .. Reason: .. Banned, disconnecting. ( A collator provided a collation for the wrong para. Banned, disconnecting.)
https://github.com/paritytech/polkadot-sdk/ 4 warn fetch_pov_job
https://github.com/paritytech/polkadot-sdk/ 3 warn Importing locally an already known assignment
https://github.com/paritytech/polkadot-sdk/ 1 warn Report .: . to .. Reason: .. Banned, disconnecting. ( Same block request multiple times. Banned, disconnecting.)

@lexnv lexnv requested a review from a team January 15, 2025 11:42
@lexnv lexnv added the A4-needs-backport Pull request must be backported to all maintained releases. label Jan 15, 2025
@lexnv lexnv enabled auto-merge January 15, 2025 16:18
@lexnv lexnv added this pull request to the merge queue Jan 15, 2025
Merged via the queue into master with commit 77c78e1 Jan 15, 2025
249 of 261 checks passed
@lexnv lexnv deleted the lexnv/kad-found-records branch January 15, 2025 17:27
@paritytech-cmd-bot-polkadot-sdk

Created backport PR for stable2407:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-7099-to-stable2407
git worktree add --checkout .worktree/backport-7099-to-stable2407 backport-7099-to-stable2407
cd .worktree/backport-7099-to-stable2407
git reset --hard HEAD^
git cherry-pick -x 77c78e1561bbe5ee0ecf414312bae82396ae6d11
git push --force-with-lease

@paritytech-cmd-bot-polkadot-sdk

Created backport PR for stable2409:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-7099-to-stable2409
git worktree add --checkout .worktree/backport-7099-to-stable2409 backport-7099-to-stable2409
cd .worktree/backport-7099-to-stable2409
git reset --hard HEAD^
git cherry-pick -x 77c78e1561bbe5ee0ecf414312bae82396ae6d11
git push --force-with-lease

@paritytech-cmd-bot-polkadot-sdk

Created backport PR for stable2412:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-7099-to-stable2412
git worktree add --checkout .worktree/backport-7099-to-stable2412 backport-7099-to-stable2412
cd .worktree/backport-7099-to-stable2412
git reset --hard HEAD^
git cherry-pick -x 77c78e1561bbe5ee0ecf414312bae82396ae6d11
git push --force-with-lease

ordian added a commit that referenced this pull request Jan 16, 2025
* master: (33 commits)
  Implement `pallet-asset-rewards` (#3926)
  [pallet-revive] Add host function `to_account_id` (#7091)
  [pallet-revive] Remove revive events (#7164)
  [pallet-revive] Remove debug buffer (#7163)
  litep2p: Provide partial results to speedup GetRecord queries (#7099)
  [pallet-revive] Bump asset-hub westend spec version (#7176)
  Remove 0 as a special case in gas/storage meters (#6890)
  [pallet-revive] Fix `caller_is_root` return value (#7086)
  req-resp/litep2p: Reject inbound requests from banned peers (#7158)
  Add "run to block" tools (#7109)
  Fix reversed error message in DispatchInfo (#7170)
  approval-voting: Make importing of duplicate assignment idempotent (#6971)
  Parachains: Use relay chain slot for velocity measurement (#6825)
  PRDOC: Document `validate: false` (#7117)
  xcm: convert properly assets in xcmpayment apis (#7134)
  CI: Only format umbrella crate during umbrella check (#7139)
  approval-voting: Fix sending of assignments after restart (#6973)
  Retry approval on availability failure if the check is still needed (#6807)
  [pallet-revive-eth-rpc] persist eth transaction hash (#6836)
  litep2p: Sufix litep2p to the identify agent version for visibility (#7133)
  ...
EgorPopelyaev added a commit that referenced this pull request Jan 17, 2025
…ecord queries (#7192)

Backport #7099 into `stable2412` from lexnv.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Alexandru Vasile <[email protected]>
Co-authored-by: Alexandru Vasile <[email protected]>
Co-authored-by: Egor_P <[email protected]>
Nathy-bajo pushed a commit to Nathy-bajo/polkadot-sdk that referenced this pull request Jan 21, 2025
…tech#7099)

This PR provides the partial results of the `GetRecord` kademlia query.

This significantly improves the authority discovery records, from ~37
minutes to ~2/3 minutes.
In contrast, libp2p discovers authority records in around ~10 minutes. 

The authority discovery was slow because litep2p provided the records
only after the Kademlia query was completed. A normal Kademlia query
completes in around 40 seconds to a few minutes.
In this PR, partial records are provided as soon as they are discovered
from the network.

### Testing Done

Started a node in Kusama with `--validator` and litep2p backend.
The node discovered 996/1000 authority records in ~ 1 minute 45 seconds.

![Screenshot 2025-01-09 at 12 26
08](https://github.com/user-attachments/assets/b618bf7c-2bba-43a0-a021-4047e854c075)


### Before & After

In this image, on the left side is libp2p, in the middle litep2p without
this PR, on the right litep2p with this PR

![Screenshot 2025-01-07 at 17 57
56](https://github.com/user-attachments/assets/a8d467f7-8dc7-461c-bcff-163b94d01ae8)



Closes: paritytech#7077

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A4-needs-backport Pull request must be backported to all maintained releases. I5-enhancement An additional feature request. T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

litep2p authority discovery is very slow 5m vs 1h
3 participants