Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

authority-discovery: Ignore multi-addresses without IP from DHT records #6545

Open
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Nov 19, 2024

Kusama has several authority-discovery records published without a proper mutlti-address format.

authority="CzHcp6nEsT4NJuAbf4yCS2J8KKYpvuAe63bkWrPWhjaaT6w" 
..
addresses={.., /p2p/12D3KooWFegWJubWCeJCbWpDEAwukQmXKaWTyRgRRZ8NNdzmuxSr, ..}

In the previous example, an multiaddress of form /p2p/12D3KooWFegWJubWCeJCbWpDEAwukQmXKaWTyRgRRZ8NNdzmuxSr cannot establish a connection to the authority.

This PR changes the following:

  • publishing DHT records no longer includes multi addresses that are empty or just contain /p2p/[peer] format
  • received DHT records are filtered in a similar manner
  • litep2p and libp2p backends verify discovered external addresses before pushing them to other subcomponents

Next Steps

  • versi-net burnin with detailed logs
  • kusama nodes (litep2p + libp2p) to ensure the warnings are not excessive

Closes: #6518

cc @paritytech/networking

@lexnv lexnv added T0-node This PR/Issue is related to the topic “node”. I2-bug The node fails to follow expected behavior. labels Nov 19, 2024
@lexnv lexnv self-assigned this Nov 19, 2024
Signed-off-by: Alexandru Vasile <[email protected]>
lexnv added 14 commits November 20, 2024 19:04
Signed-off-by: Alexandru Vasile <[email protected]>
@lexnv
Copy link
Contributor Author

lexnv commented Nov 22, 2024

In kusama we also receive addresses of form:

/ip4/198.244.230.235/p2p/12D3KooWGJnJ3eA71NPV1bmMys2UWXQgoesVVD9bmXhi8aGKgjQz

Have added extra checks to accommodate this 🙏

if !address.iter().any(|protocol| std::matches!(protocol, Protocol::P2p(_))) {
address.push(Protocol::P2p(litep2p::PeerId::from(peer).into()));
if !address.is_external_address_valid(peer) {
log::warn!(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will downgrade to debug after more testing, here and below

Signed-off-by: Alexandru Vasile <[email protected]>
);
continue
};
if self.litep2p.add_known_address(peer.into(), iter::once(address.clone().into())) == 0usize {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR deployed, we receive the following warnings for adding addresses (only on Kusama validators):

sub-libp2p: couldn't add known address (/ip4/127.0.0.1/tcp/30333/p2p/12D3KooWRjoe4poRBkBtuJ5JJGYanS7PWTfPc8sN4zvQ3gFEfSP6) for PeerId("12D3KooWRjoe4poRBkBtuJ5JJGYanS7PWTfPc8sN4zvQ3gFEfSP6"), unsupported transport

All warnings coming from this are related to addresses pointing to the localhost.

Will let this deployment run for a bit and then I believe its safe to turn these into debug

@lexnv
Copy link
Contributor Author

lexnv commented Nov 25, 2024

Triage Report Versi-Network

https://github.com/paritytech/polkadot-sdk/ | 5919       | warn       | cannot query the runtime API version: .*
https://github.com/paritytech/polkadot-sdk/ | 141        | warn       | Report .*: .* to .*. Reason: .*. Banned, disconnecting. ( Invalid justification. Banned, disconnecting.)
https://github.com/paritytech/polkadot-sdk/ | 31         | warn       | Re-finalized block #.* \(.*\) in the canonical chain, current best finalized is #.*
https://github.com/paritytech/polkadot-sdk/ | 16         | warn       | Trying to remove unknown reserved node .* from .*.
https://github.com/paritytech/polkadot-sdk/ | 5          | warn       | Timeout while trying to acquire a write lock for the shared trie cache
https://github.com/paritytech/polkadot-sdk/ | 1          | warn       | Report .*: .* to .*. Reason: .*. Banned, disconnecting. ( Peer disconnected with inflight after backoffs. Banned, disconnecting.)
https://github.com/paritytech/polkadot-sdk/ | 1          | error      | 💔 Called `on_validated_block_announce` with a bad peer ID .*

Debug Logs

# Publish
2024-11-25 18:30:35.292 DEBUG tokio-runtime-worker sub-authority-discovery: Publishing authority DHT record peer_id='12D3KooWKew6C37uUVcTHKBTvHaCE8v5G2Kv5TzhVP2GdYgWDRBS' with addresses='["/ip4/../tcp/30333"]'    

# Fetching
2024-11-25 18:30:33.829 DEBUG tokio-runtime-worker sub-authority-discovery: Found addresses for authority Public(f8694fbdf03070992f07164cccf7298c42994add1d8d01e6bf785abd778ae060 (16ciAFQU...)): {"/ip4/../tcp/30333/p2p/12D3KooWD8SxxoLeeMh8iADpj3nQosizZBWpjZZ6m9nKnpe7SsHJ", "/ip4/../tcp/30333/p2p/12D3KooWD8SxxoLeeMh8iADpj3nQosizZBWpjZZ6m9nKnpe7SsHJ"}  

Would love another look over this 🙏

// Verify the external address is valid.
let multiaddr: sc_network_types::multiaddr::Multiaddr = addr.clone().into();
if multiaddr.is_external_address_valid() &&
multiaddr.ensure_peer_id(self.local_peer_id.into()).is_some()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just cross-checking if the following is intended:
ensure_peer_id modifies multiaddr, but the new value (which may contain p2p part) may get skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, unfortunately libp2p API forces us to pass the event with a reference to the multiaddr (&'a multiaddr). I think this should be fine as long as we validate the address here. We also re-add /p2p/[peer] in the authority-discovery where this matters in publishing records 🤔

@lexnv lexnv enabled auto-merge November 27, 2024 13:32
@lexnv
Copy link
Contributor Author

lexnv commented Nov 27, 2024

/cmd prdoc --audience node_dev --bump patch --force

@lexnv lexnv disabled auto-merge November 27, 2024 15:54
@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12471715250
Failed job name: test-linux-stable

@bkchr
Copy link
Member

bkchr commented Dec 24, 2024

Looks like the undying and adder collator integration tests are running forever. Maybe because they use in memory addresses and are rejected by this pr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. T0-node This PR/Issue is related to the topic “node”.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

network: Authoriy discovery records published without IP addresses
5 participants