WIP - Reduce the Load Induced by the Polling Service #211

nepet · 2023-07-20T15:41:10Z

We had some trouble on large nodes sending out all poll messages (see #185), maybe because we tried to send them out concurrently. This PR updates the polling service to use a sequential, time-ordered model to send out the poll messages.
Therefore we add a queue which elements are time ordered and by time we mean the timestamp at which we should send the next message to the peer.

Our main loop now is tick based and checks the following every other second:

Does the first peer in queue need a new poll msg sent out (is the timestamp past now)
If so -> send msg and re-queue with new timestamp
If not -> wait for next tick and check again

This way we reduce the stress that we put on the node and the network connections.
If this was the cause for the disappearing peers, this resolves #185

Signed-off-by: Peter Neuroth <[email protected]>

We use the queue to determine which peer is the next to be sent the poll message. Signed-off-by: Peter Neuroth <[email protected]>

tickDuration is now used to cycle through the poll queue. Signed-off-by: Peter Neuroth <[email protected]>

We remove the potentially heavy-load functions that ask to send out messages in parallel. Signed-off-by: Peter Neuroth <[email protected]>

We listen for peers to connect or disconnect to the node running peerswap and Add or Remove them from the poll service accordingly Signed-off-by: Peter Neuroth <[email protected]>

This is mainly to make the tests more stable as we don't have blocktime lag in regtest scenarios. But it does not hurt to have the extra check in real life also. Signed-off-by: Peter Neuroth <[email protected]>

We only want to add the peer to the queue if we do not know about it already. Signed-off-by: Peter Neuroth <[email protected]>

wtogami · 2023-07-20T19:45:48Z

#211 is running on Blockstream Store now. Please report if you continue to see it disappear from peerswap-listpeers. @grubles

For now don't run this yourself yet. Perhaps test this after 12 hours of success to see if it works both ways.

wtogami · 2023-07-20T20:02:29Z

Testing: I want to see what happens to the timing of polls if networking is delayed by 10 seconds. In that case does 10 polls happen simultaneously or does it happen one at a time separated by at least 1 second between each poll?

grubles · 2023-07-21T16:20:35Z

It finally disappeared about 5 minutes ago for me running 725ca2c. The patches at least seem to have extended the amount of time it takes for the node to disappear in some cases? In the past it would only take 3-4 hours to drop from peerswap-listpeers.

wtogami · 2023-07-23T09:22:50Z

Summary of discussion

This reduces the chance of polls conflicting so it happens less often, but it still can disappear.
The issue is custommessage is fire-and-forget with no delivery guarantees. Random network delays (like Tor) can mean multiple custommessage happen at the same time and somehow none are delivered due to a conflict.
Warren thinks we didn't have the disappearing peer problem prior to Silence noisy poll log #189 because polling happened a lot more often. In that case it mattered less that some of them wouldn't be delivered.
Short-term Fix: Poll more often (with redundant logging still silenced). Keep the WIP - Reduce the Load Induced by the Polling Service #211 approach but perhaps use a longer delay like 3 seconds?

wtogami · 2023-07-30T10:19:33Z

@grubles reported when he runs #211 on his node, Blockstream Store is not visible in peerswap-listpeers. It sounds like #211 as of d9d7d39 doesn't fix the original problem and somehow makes it worse when both sides are running it.

wtogami · 2023-07-30T13:25:45Z

Blockstream Store experienced another curious problem where an incoming swap-in attempt got stuck in state State_SwapInReceiver_AwaitTxConfirmation. The requesting node successfully ended in State_ClaimedCsv so no funds were lost. Unclear if this weird state is caused by this issue but we haven't seen this problem prior to now.

For the moment I recommend not testing #211. It needs work.

wtogami · 2023-08-04T06:29:32Z

CLN devs say custommsg delivery is guaranteed unless there is a disconnect so the original purpose of WIP - Reduce the Load Induced by the Polling Service #211 turned out to be untrue. We don't think there were disconnects so we are uncertain as to why delivery was not working.
In any case Fix Possible Polling Deadlock #213 fixed a regression introduced in Silence noisy poll log #189 solving this bug, so closing this PR.

nepet added 6 commits July 20, 2023 11:54

poll: Add concurrency safe queue for peers to be polled

c41b95a

Signed-off-by: Peter Neuroth <[email protected]>

poll: Add poll queue to poll service

229ded6

We use the queue to determine which peer is the next to be sent the poll message. Signed-off-by: Peter Neuroth <[email protected]>

cmd: Set new tick duration

c222d7a

tickDuration is now used to cycle through the poll queue. Signed-off-by: Peter Neuroth <[email protected]>

poll: Remove heavy-load functions

0b25c95

We remove the potentially heavy-load functions that ask to send out messages in parallel. Signed-off-by: Peter Neuroth <[email protected]>

cmd: Add peer listeners that call poll service

56d737e

We listen for peers to connect or disconnect to the node running peerswap and Add or Remove them from the poll service accordingly Signed-off-by: Peter Neuroth <[email protected]>

swap: Add check for current blockheight before pay

8f0f2ba

This is mainly to make the tests more stable as we don't have blocktime lag in regtest scenarios. But it does not hurt to have the extra check in real life also. Signed-off-by: Peter Neuroth <[email protected]>

nepet requested review from wtogami and grubles July 20, 2023 15:41

poll: Move enqueueing into conditional

d9d7d39

We only want to add the peer to the queue if we do not know about it already. Signed-off-by: Peter Neuroth <[email protected]>

wtogami marked this pull request as draft July 30, 2023 13:25

wtogami changed the title ~~Reduce the Load Induced by the Polling Service~~ WIP - Reduce the Load Induced by the Polling Service Jul 30, 2023

wtogami closed this Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP - Reduce the Load Induced by the Polling Service #211

WIP - Reduce the Load Induced by the Polling Service #211

nepet commented Jul 20, 2023

wtogami commented Jul 20, 2023

wtogami commented Jul 20, 2023

grubles commented Jul 21, 2023 •

edited

Loading

wtogami commented Jul 23, 2023 •

edited

Loading

wtogami commented Jul 30, 2023

wtogami commented Jul 30, 2023

wtogami commented Aug 4, 2023

WIP - Reduce the Load Induced by the Polling Service #211

WIP - Reduce the Load Induced by the Polling Service #211

Conversation

nepet commented Jul 20, 2023

wtogami commented Jul 20, 2023

wtogami commented Jul 20, 2023

grubles commented Jul 21, 2023 • edited Loading

wtogami commented Jul 23, 2023 • edited Loading

Summary of discussion

wtogami commented Jul 30, 2023

wtogami commented Jul 30, 2023

wtogami commented Aug 4, 2023

grubles commented Jul 21, 2023 •

edited

Loading

wtogami commented Jul 23, 2023 •

edited

Loading