Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GossipSub v1.4: Message preamble + IMReceiving notification to considerably reduce bandwidth & latency for large messages #654

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ufarooqstatus
Copy link

@ufarooqstatus ufarooqstatus commented Dec 16, 2024

This extension considerably reduces bandwidth utilization and network-wide message dissemination time for large messages.

Problem with existing approach (GossipSub v1.2):

  1. Peers are unaware of the msgID during download and may generate many IWANT requests for the same message.
  2. Mesh members are not aware if a peer is receiving a message and may start sending the same message to that peer (IDONTWANT can only be transmitted after downloading the message)

Solution (Proposed extension):

  1. Prepend message preamble (carrying msgID + length) to large messages, to be processed immediately by the receiver.
  2. Receiver defers IWANT requests for messages it is receiving
  3. Limit outstanding IWANT requests for a large message to one (responding to IWANTs is mandatory)
  4. Receivers now use a new control message, called IMReceiving, to notify their mesh that they are in the process of receiving a message. So, the mesh peers defer sending that message.

More context available here

@ufarooqstatus
Copy link
Author

The results from experiments in a 1500 peer network. Bandwidth for each peer ranges between 50-150 Mbps. Latency for each link ranges between 40-130 ms. Bandwidth and latency are uniformly distributed in 5 stages. A total of 12 messages were transmitted. IDONTWANT message is used as a preamble.

Average duplicates reduced to less than 2
Significant reduction in latency as well
LatBW Graph

@vyzo
Copy link
Contributor

vyzo commented Dec 16, 2024

Shouldnt this be v1.3?

@ufarooqstatus
Copy link
Author

Shouldnt this be v1.3?

Actually, there was an open PR with v1.3, the idea was to set an appropriate number once its considered ready for merge

@vyzo
Copy link
Contributor

vyzo commented Dec 16, 2024

ok, fair enough.


The purpose of the preamble is to allow receivers to instantly learn about the incoming message.
The preamble must include the message ID and length,
providing receivers with immediate access to critical information about the incoming message.
Copy link

@nisdas nisdas Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue is that as of the protobuf schema is designed, you will have to download the whole message in order to access the preamble. If you look at how control messages are represented in the rpc message:
https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.0.md#protobuf
https://github.com/libp2p/specs/tree/master/pubsub#the-rpc

It is numbered after our full published message. So you would have to download the whole message before you can access the preamble.

Nvm, I misunderstood this. The preamble is a rpc message sent separately beforehand

### IMReceiving Message

The IMReceiving message serves a distinct purpose compared to the IDONTWANT message.
An IDONTWANT can only be transmitted after receiving the entire message.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the distinction here of IMReceiving compared with IDONTWANT and having this broadcasted earlier, how effective would this be in practice ? One issue we have seen is that an actual control message takes a while to be processed by the gossip router even after it has been received due to HOL blocking. So by the time you process the control message, the actual message might already be sent by your mesh peers.

Copy link

@kaiserd kaiserd Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the scenarios we tested, sending IMReceiving significantly increases the probability of mesh peers being able to stop unnecessary message sends since enough IMReceiving go through in time.
Still, definitely something we will look out for in our experiments, and check for scenarios where HOL might have a severe impact. We also have have Ethereum focused tests and analyses on the roadmap.

With QUIC as a transport and multiplexer, we can further reduce the HOL impact.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the scenarios we tested, sending IMReceiving significantly increases the probability of mesh peers being able to stop unnecessary message sends since enough IMReceiving go through in time.

Is there more information on the scenarios tested ? Ex: How many different topics nodes were subscribed to along with how many messages were being published per second on these topics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the distinction here of IMReceiving compared with IDONTWANT and having this broadcasted earlier, how effective would this be in practice ? One issue we have seen is that an actual control message takes a while to be processed by the gossip router even after it has been received due to HOL blocking. So by the time you process the control message, the actual message might already be sent by your mesh peers.

Yes, that is why we still see duplicates, averaging around 1.8 per peer in the network. Proper prioritization of preamble/IDONTWANTs should further lower the number of duplicates.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there more information on the scenarios tested ? Ex: How many different topics nodes were subscribed to along with how many messages were being published per second on these topics.

All (1500) peers were subscribed to a single topic. Twelve messages were introduced, each by a different publisher, with each publisher waiting 3 seconds before sending the next message. Messages larger than 600 KB take more time to reach all peers, building outgoing message queues at many peers.

Copy link
Author

@ufarooqstatus ufarooqstatus Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the results below, we consider 1500 peers (single topic) with an inter-message spacing of 50 ms, which is roughly 20 messages per second. The message size is 50 KB.
S3 lat BW graph

@cortze
Copy link
Contributor

cortze commented Dec 17, 2024

Peers are unaware of the msgID during download and may generate many IWANT requests for the same message.

The problem with this is that there is no limit on the number of IWANTs you can send for the same message. Thus, you send an IWANT to each of the nodes that send an IHAVE with a message ID that you haven't received (yet).

This should be limited to an alpha parameter like in the Kademlia DHT. It is a simple configuration that can already remove a spike in bandwidth utilisation in some edgy cases. We could even add a second configuration parameter for a wait time or a grace period for the number of milliseconds to wait before sending those IWANTS.

@kaiserd
Copy link

kaiserd commented Dec 18, 2024

Let me suggest this alternative using IDONTWANT instead of introducing the new IAMRECEIVING message:
(just to open this for discussion)

  • Peer A sends a preamble for a large message to B
  • Peer B sends IDONTWANT (instead of IAMRECEIVING) asap when receiving this preamble outside of the typical heartbeat interval
  • mesh peers receiving IDONTWANT cannot defer message sending as they cannot semantically distinguish this IDONTWANT from heartbeat IDONTWANTs, so they will simply stop sending
  • in case B does not receive the message B promised, B will descore A and sends an IWANT for the message

This requires another IWANT in case a message is not delivered.
This case should not happen too often though and can be handled by peer scoring.
It keeps the implementation simpler and does not introduce another message, but it adds to the semantics of IDONTWANT.

@ufarooqstatus
Copy link
Author

The problem with this is that there is no limit on the number of IWANTs you can send for the same message. Thus, you send an IWANT to each of the nodes that send an IHAVE with a message ID that you haven't received (yet).

Yes, that is one big issue!

This should be limited to an alpha parameter like in the Kademlia DHT. It is a simple configuration that can already remove a spike in bandwidth utilisation in some edgy cases. We could even add a second configuration parameter for a wait time or a grace period for the number of milliseconds to wait before sending those IWANTS.

Yes, this is part of the solution, but it also requires that replying to IWANT requests be made mandatory (at least for large messages), and preamble can further limit IWANT requests!

@cortze
Copy link
Contributor

cortze commented Dec 19, 2024

Yes, this is part of the solution, but it also requires that replying to IWANT requests be made mandatory (at least for large messages)

It is already "mandatory": not replying to a received IWANT message penalizes your score.
I'm not against the proposed upgrades, I like the direction. I'm just trying to point out that the current implementation has some low-hanging upgrades that don't change drastically the protocol but can also reduce unnecessary duplicates.

This should be limited to an alpha parameter like in the Kademlia DHT.

I'd be keen to have some small upgrades like this one before jumping into something bigger.

@ufarooqstatus
Copy link
Author

Let me suggest this alternative using IDONTWANT instead of introducing the new IAMRECEIVING message: (just to open this for discussion)

  • Peer A sends a preamble for a large message to B
  • Peer B sends IDONTWANT (instead of IAMRECEIVING) asap when receiving this preamble outside of the typical heartbeat interval
  • mesh peers receiving IDONTWANT cannot defer message sending as they cannot semantically distinguish this IDONTWANT from heartbeat IDONTWANTs, so they will simply stop sending
  • in case B does not receive the message B promised, B will descore A and sends an IWANT for the message

This requires another IWANT in case a message is not delivered. This case should not happen too often though and can be handled by peer scoring. It keeps the implementation simpler and does not introduce another message, but it adds to the semantics of IDONTWANT.

While the fundamental purpose of IDONTWANT messages is:
"Peer X on receiving an IDONTWANT from Y, knows that Y has already received the message, so sending it to Y is unnecessary."

However, the use of IDONTWANT messages can be tailored to serve any of the following two purposes:

  1. As Message Preamble: On receiving an IDONTWANT from Y, X can assume that Y will immediately forward the message to it, eliminating the need for a preamble.

  2. As IMReceiving: On receiving a message preamble, we consider it a definite promise, so IDONTWANT can be issued immediately, serving as IMReceiving. However, in this case, mesh members cannot find if the message was successfully received.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

Successfully merging this pull request may close these issues.

5 participants