Peer is flooded with new ingress streams #734

geigerzaehler · 2021-07-26T10:21:28Z

When connecting to either setzling or sprout we see the following log line every 200ms

Jul 26 12:15:18.816  INFO accept{peer_id=hyb79bxwfz8mbmqarrok53fj9cnmmey6oujg1yfoysjjo1rmzojuhw}:
  disco:discovered{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc}:
  incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.141.175.140:12345}:
  librad::net::protocol::io::streams: new ingress stream

In addition, we also see the following log line for a while until it goes away (maybe after replicating a first project)

Jul 26 12:07:56.240  WARN radicle_daemon::peer::run_state::running_waiting_room:
  waiting room error error=MissingUrn(Urn { id: Oid(1183cf230ec31058fa955e69b4172086161641de), path: Some(RefLike("rad/id")) })

This happens even if I start Upstream with a clean slate and I don’t replicate any projects.

I’ve noticed that once my peer is connected to multiple peers I’m seeing this log line for every peer. This puts significant pressure on the CPU resulting in 50% utilization. This is a high severity issue for Upstream users since the app and their machines become unresponsive

Steps to reproduce

Start the most recent version (0.2.9 or master) of Upstream from the command line with a fresh profile using RAD_HOME
Create a profile
Log at the command line output

The text was updated successfully, but these errors were encountered:

kim · 2021-07-27T14:00:50Z

Is it the same remote_addr always?

geigerzaehler · 2021-07-27T15:50:51Z

Is it the same remote_addr always?

Yes it is. Once I’m connected to more peers this happens with every peer. Also the Oid for the waiting room warning is always the same.

kim · 2021-07-27T17:50:41Z

I don’t know of anything which would trigger packets at a 200ms interval, except perhaps an amplification effect of something malfunctioning with the “waiting room”

xla · 2021-07-27T18:57:04Z

@geigerzaehler If you could conjure up debug logs from seed that correspond to the upstream peer that would help with narrowing down the interaction that is causing this.

cloudhead · 2021-07-27T21:26:02Z

I’ve noticed that once my peer is connected to multiple peers I’m seeing this log line for every peer. This puts significant pressure on the CPU resulting in 50% utilization. This is a high severity issue for Upstream users since the app and their machines become unresponsive

Does this then rule out a potential issue in the seed node/library?

geigerzaehler · 2021-07-28T10:21:55Z

@geigerzaehler If you could conjure up debug logs from seed that correspond to the upstream peer that would help with narrowing down the interaction that is causing this.

Here’s the seed log just before the issue starts. Afterwards the log basically repeats itself.

seed.log

geigerzaehler · 2021-07-28T16:28:39Z

A couple of things I noticed when I investigated this and that may contribute to the issue

We persist the waiting room whenever it’s state changes. This may happen quite frequently, for example on every Have message received by the protocol, even the ones that are uninteresting. Persistance is probably not cheap.
Are the calls to kv for persistence blocking on IO? If so, this might be a problem because we don’t spawn them on a separate thread?
We’re using tokio::task::spawn_blocking instead of unblocking throughout the daemon codebase
daemon::peer::announcement::run does read and write to the store but calls are not synchronized. It is possible that two calls of run() execute concurrently and mess with each other.

cloudhead · 2021-07-29T08:30:58Z

Seems like radicle-dev/radicle-client-services#5 is related, which doesn't have a waiting room or complex behavior. It seems like peers are establishing large amounts of connections to other peers, something we've been seeing on the seed node for a while.

It could either be malicious behavior, or some kind of feedback loop..

In any case it might help to limit the amount of ingress streams per-peerid, as a mitigation.

FintanH · 2021-07-29T09:16:09Z

Something I find weird about this log is that a peer keeps advertising the same Have repeatedly:

    24 Jul 28 10:00:52.008 DEBUG accept{peer_id=hybz9gfgtd9d4pd14a6r66j5hz6f77fed4jdu7pana4fxaxbt369kg}:incoming:incoming{remote_id=hyd7cb39wws6jz5wmifzwtjn8u57siw6knipqttk6nhc657r1fozaw remote_addr=188.120.235.3:12345}:tock{tock=AttemptSend { to: GenericPeerInfo { peer_id: hyd7cb39wws6jz5wmifzwtjn8u57siw6knipqttk6nhc657r1fozaw, advertised_info: PeerAdvertisement { listen_addrs: Within { inner: [188.120.235.3:12345], min: 0, max: 16 }, capabilities: {} }, seen_addrs: Within { inner: [188.120.235.3:12345], min: 0, max: 16 } }, message: Membership(Neighbour { info: PeerAdvertisement { listen_addrs: Within { inner: [10.156.15.198:12345, 127.0.0.1:12345], min: 0, max: 16 }, capabilities: {} }, need_friends: None }) }}:incoming{remote_id=hyd7cb39wws6jz5wmifzwtjn8u57siw6knipqttk6nhc657r1fozaw remote_addr=188.120.235.3:12345}:tock{tock=SendConnected { to: hybdj8d47ew35471nu4gs7pco8p9xib1iksdmtygy43ghyu5ypaumg, message: Gossip(Have { origin: GenericPeerInfo { peer_id: hydd5idud5sbtkdmb8cmuprpf3rsj4ajper3j4738f9efw7nunpgxe, advertised_info: PeerAdvertisement { listen_addrs: Within { inner: [127.0.0.1:37109, 192.168.4.41:37109], min: 0, max: 16 }, capabilities: {} }, seen_addrs: Within { inner: [], min: 0, max: 16 } }, val: Payload { urn: Urn { id: Oid(1183cf230ec31058fa955e69b4172086161641de), path: Some(RefLike("rad/id")) }, rev: Some(Git(b804589da7f790d471c8c1870d880f7ac6b471d9)), origin: None } }) }}: librad::net::protocol::tick: tock

Doing rg b804589da7f790d471c8c1870d880f7ac6b471d9 seed.log | wc will bring up 64 occurrences. They also happen very close to each other.

FintanH · 2021-07-29T09:21:19Z

In any case it might help to limit the amount of ingress streams per-peerid, as a mitigation.

But radicle-link recently added rate limiting, so this is even more curious 🤔

kim · 2021-07-29T09:46:55Z

Rate limiting is limited due to the stream-per-message model. That was a bad idea, I'm changing it. At the end of the day, a peer flooding the network cannot really be defended against, unless we auto-ban them permanently.

geigerzaehler · 2021-08-16T14:43:54Z

Any update on this? This issue often makes Upstream unusable :(

FintanH · 2021-08-16T14:53:14Z

Any update on this? This issue often makes Upstream unusable :(

I thought the last thing we talked about was Upstream updating librad to this commit cb91ccd and seeing how that might improve usage?

kim · 2021-08-16T15:24:28Z

Also heard from @cloudhead on matrix that it was sprout misbehaving. Did no further action follow this statement?

geigerzaehler · 2021-08-17T08:19:10Z

I thought the last thing we talked about was Upstream updating librad to this commit cb91ccd and seeing how that might improve usage?

That didn’t help unfortunately. We’re still seeing this issue.

Also heard from @cloudhead on matrix that it was sprout misbehaving.

I was also seeing this issue with setzling. And this was after I restarted it and was the only peer connected to it.

FintanH · 2021-08-17T08:40:03Z

I was also seeing this issue with setzling. And this was after I restarted it and was the only peer connected to it.

And there was no sign of some peer repeatedly making requests?

kim · 2021-08-17T08:47:22Z

I thought the last thing we talked about was Upstream updating librad to this commit [1]cb91ccd and seeing how that might improve usage? That didnât help unfortunately. Weâre still seeing this issue.

Yeah no, the effect of this can only kick in when most peers are running on that version. Only then can we tighten the flow control limits to drop packets from a logorrheic peer on the floor.

Also heard from ***@***.*** on matrix that it was sprout misbehaving. I was also seeing this issue with setzling. And this was after I restarted it and was the only peer connected to it.

So, what are we seeing exactly? That `seed` nodes are flooding, or that `upstream` nodes are flooding as soon as they are connected to a `seed`? Is it perhaps that the `seed` should check if the project is already tracked here [0], because the call to `track` is idempotent? [0]: https://github.com/radicle-dev/radicle-link/blob/master/seed/src/lib.rs#L402

geigerzaehler · 2021-08-18T15:23:44Z

I’m not able to reproduce this issue anymore if I’m the only one connected to a seed node. But I still see this issue if other peers are connected. After adding some logging statements I see the following when I connect to sprout.

Aug 18 16:54:25.173  INFOlibrad::net::peer::storage: gossip received, has: Payload { urn: Urn { id: Oid(8cc143b5b061d9882726564702017b5f94bb0f1a), path: Some(RefLike("rad/id")) }, rev: Some(Git(f2949476d30de0ee650598ba16aee4a4601157da)), origin: None }, provider: hyde43uaaqz5bnh8kpqfwiwx7inxzdmmtiy1ndtf6zazfrcny3ossr
    at /home/thomas/code/radicle-link/librad/src/net/peer/storage.rs:189
    in librad::net::peer::storage::put with has: Payload { urn: Urn { id: Oid(8cc143b5b061d9882726564702017b5f94bb0f1a), path: Some(RefLike("rad/id")) }, rev: Some(Git(f2949476d30de0ee650598ba16aee4a4601157da)), origin: None }
    in librad::net::protocol::broadcast::apply with remote_id: hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc, Have { origin: GenericPeerInfo { peer_id: hyde43uaaqz5bnh8kpqfwiwx7inxzdmmtiy1ndtf6zazfrcny3ossr, advertised_info: PeerAdvertisement { listen_addrs: Within { inner: [127.0.0.1:57843, 192.168.1.108:57843], min: 0, max: 16 }, capabilities: {} }, seen_addrs: Within { inner: [], min: 0, max: 16 } }, val: Payload { urn: Urn { id: Oid(8cc143b5b061d9882726564702017b5f94bb0f1a), path: Some(RefLike("rad/id")) }, rev: Some(Git(f2949476d30de0ee650598ba16aee4a4601157da)), origin: None } }
    in librad::net::protocol::io::streams::incoming with remote_id: hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc, remote_addr: 35.187.60.160:12345
    in librad::net::protocol::io::discovered with remote_id: hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc
    in librad::net::protocol::accept::disco
    in librad::net::protocol::accept with peer_id: hyb9fda6y8tuwqn73wfnh1qf4m9cau714o5xf14t7wgwbkdhs138ay

This log line is repeated every couple of milliseconds. As you can see, the message arrives from the seed but originates from another node.

Could one way to limit the impact of this be to have the waiting room not react to uninteresting messages?

FintanH · 2021-08-18T15:55:24Z

Could one way to limit the impact of this be to have the waiting room not react to uninteresting messages?

The thing is that if you've requested an Urn you're not guaranteed to know which PeerId it'll come from so that counts as "uninteresting" in the gossip world, however, it's interesting to the waiting room because it found a (possibly) valid peer to replicate from.

Why do you think this could be the waiting room anyway? 🤔 Maybe I'm missing something you see in the logs there? 👀

geigerzaehler · 2021-08-18T16:07:03Z

Why do you think this could be the waiting room anyway?

I’m not saying the waiting room causes the issue. But it seems odd to me that the messages put so much pressure on the CPU. I was just thinking about limiting the impact of messages that don’t have an effect.

cloudhead · 2021-08-19T07:55:18Z

So, what are we seeing exactly? That seed nodes are flooding, or that upstream nodes are flooding as soon as they are connected to a seed? Is it perhaps that the seed should check if the project is already tracked here [0], because the call to track is idempotent?

If anything this is probably the source of some unwanted traffic and we should definitely fix it.

When calling `Node::track_project`, it might very well be that the project/peer is already tracked, which would result in unnecessary broadcast traffic. Thus, don't emit any events in this case. Also, change the order of `track` and `replicate` to ensure we do actually fetch the newly-tracked peer's view. Additionally, we can skip `replicate` if we already tracked the peer, as subsequent broadcast messages won't be considered uninteresting. Ref #734 Signed-off-by: Kim Altintop <[email protected]>

kim · 2021-08-19T08:43:00Z

I must say that this waiting room logic escapes me mostly. It is probably the case that the bandaid of screaming "have" into the forest should just be removed -- unfortunately, #653 is still stalled, because replication v3 is more important right now.

In the spirit of graceful centralisation, I could imagine the following workaround: since most clients (upstream clients for sure) bootstrap through a "central" seed, we could have some kind of no-forward flag on wants/haves, which simply cause the seed to respond if they have the wanted data, but not forward the message.

When calling `Node::track_project`, it might very well be that the project/peer is already tracked, which would result in unnecessary broadcast traffic. Thus, don't emit any events in this case. Also, change the order of `track` and `replicate` to ensure we do actually fetch the newly-tracked peer's view. Additionally, we can skip `replicate` if we already tracked the peer, as subsequent broadcast messages won't be considered uninteresting. Ref #734 Signed-off-by: Kim Altintop <[email protected]>

cloudhead · 2021-08-19T09:24:19Z

That's kind of interesting - is this a soft-fork?

kim · 2021-08-19T10:15:08Z

is this a soft-fork?

I'm not sure what you mean by that.

cloudhead · 2021-08-19T10:57:52Z

Hehe I mean is it a backwards compatible change?

kim · 2021-08-19T11:20:05Z

Yeah sure. It would be an optional field, which should just be ignored by older clients.

geigerzaehler · 2021-08-19T13:29:57Z

Don’t we have some rate limiting in place that should prevent a peer being flooded by messages from another peer?

kim · 2021-08-19T13:35:15Z

Donât we have some rate limiting in place that should prevent a peer being flooded by messages from another peer?

Yes we do, and I saw in one of the earlier log captures you posted that it kicks in. Might be worth playing with the thresholds.

geigerzaehler changed the title ~~Seed is flooding peers with new ingress streams~~ Peer is flooded with new ingress streams Jul 26, 2021

This was referenced Jul 26, 2021

Request amplification bug radicle-dev/radicle-upstream#1855

Closed

Cycle plan: July 26th - August 27th radicle-dev/radicle-upstream#2146

Closed

cloudhead mentioned this issue Jul 28, 2021

radicle-org-node memory leak radicle-dev/radicle-client-services#5

Closed

kim mentioned this issue Aug 19, 2021

seed: consider if tracking relationship was already established #762

Merged

kim mentioned this issue Aug 19, 2021

radicle-org-node gets stuck, ignoring traffic, spinning 2 cpu cores #765

Closed

kermit4 mentioned this issue Aug 19, 2021

radicle-org-node gets stuck, ignoring traffic, spinning 2 cpu cores radicle-dev/radicle-client-services#11

Closed

kim mentioned this issue Aug 24, 2021

daemon: remove sync #770

Merged

rudolfs mentioned this issue Oct 1, 2021

Project replication not working radicle-dev/radicle-upstream#2449

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peer is flooded with new ingress streams #734

Peer is flooded with new ingress streams #734

geigerzaehler commented Jul 26, 2021 •

edited

Loading

kim commented Jul 27, 2021

geigerzaehler commented Jul 27, 2021

kim commented Jul 27, 2021

xla commented Jul 27, 2021

cloudhead commented Jul 27, 2021

geigerzaehler commented Jul 28, 2021

geigerzaehler commented Jul 28, 2021

cloudhead commented Jul 29, 2021

FintanH commented Jul 29, 2021

FintanH commented Jul 29, 2021

kim commented Jul 29, 2021

geigerzaehler commented Aug 16, 2021

FintanH commented Aug 16, 2021

kim commented Aug 16, 2021 via email

geigerzaehler commented Aug 17, 2021

FintanH commented Aug 17, 2021

kim commented Aug 17, 2021 via email

geigerzaehler commented Aug 18, 2021

FintanH commented Aug 18, 2021

geigerzaehler commented Aug 18, 2021

cloudhead commented Aug 19, 2021

kim commented Aug 19, 2021 •

edited

Loading

cloudhead commented Aug 19, 2021

kim commented Aug 19, 2021 via email

cloudhead commented Aug 19, 2021

kim commented Aug 19, 2021 via email

geigerzaehler commented Aug 19, 2021

kim commented Aug 19, 2021 via email

Peer is flooded with new ingress streams #734

Peer is flooded with new ingress streams #734

Comments

geigerzaehler commented Jul 26, 2021 • edited Loading

kim commented Jul 27, 2021

geigerzaehler commented Jul 27, 2021

kim commented Jul 27, 2021

xla commented Jul 27, 2021

cloudhead commented Jul 27, 2021

geigerzaehler commented Jul 28, 2021

geigerzaehler commented Jul 28, 2021

cloudhead commented Jul 29, 2021

FintanH commented Jul 29, 2021

FintanH commented Jul 29, 2021

kim commented Jul 29, 2021

geigerzaehler commented Aug 16, 2021

FintanH commented Aug 16, 2021

kim commented Aug 16, 2021 via email

geigerzaehler commented Aug 17, 2021

FintanH commented Aug 17, 2021

kim commented Aug 17, 2021 via email

geigerzaehler commented Aug 18, 2021

FintanH commented Aug 18, 2021

geigerzaehler commented Aug 18, 2021

cloudhead commented Aug 19, 2021

kim commented Aug 19, 2021 • edited Loading

cloudhead commented Aug 19, 2021

kim commented Aug 19, 2021 via email

cloudhead commented Aug 19, 2021

kim commented Aug 19, 2021 via email

geigerzaehler commented Aug 19, 2021

kim commented Aug 19, 2021 via email

geigerzaehler commented Jul 26, 2021 •

edited

Loading

kim commented Aug 19, 2021 •

edited

Loading