Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect usage of timeouts in gossip #70

Open
rahulghangas opened this issue Feb 27, 2021 · 4 comments
Open

Incorrect usage of timeouts in gossip #70

rahulghangas opened this issue Feb 27, 2021 · 4 comments

Comments

@rahulghangas
Copy link
Contributor

While gosspping, two different timeouts are used
- defined by context supplied by user
- defined by value of field in the GossipOptions struct

The first one defies the total time for a round of gossipping, while the second one defines the timeout for sending each message. When a sync message is received, the gossipper tries to propagate the new message by re-gossipping it. However, the gossipper now has no notion of the timeout supplied by the user (and that particular context has probably gone out scope!). Currently we use the second timeout to a call to Gossip, but the gossipping might fail if the first attempt to message a peer fails (since that uses the same timeout)

@jazg
Copy link
Member

jazg commented Mar 1, 2021

@rahulghangas Not sure I fully understand the issue/implication of the dual timeouts. Do you mind listing out a concrete example?

@rahulghangas
Copy link
Contributor Author

The call to gossip has the following signature

func (g *Gossiper) Gossip(ctx context.Context, contentID []byte, subnet *id.Hash)

where the context provided usually has a timeout of, say 5-10 seconds. On the other hand, if you look at didReceiveSync

func (g *Gossiper) didReceiveSync(from id.Signatory, msg wire.Msg) {
...
...
...
ctx, cancel := context.WithTimeout(context.Background(), g.opts.Timeout)
defer cancel()

g.Gossip(ctx, msg.Data, &subnet)

after receiving the sync message, we regossip the message, but using the internal timeout. The internal timeout is supposed to be used to define the time limit of sending a single message, not a whole round of gossipping.

@jazg
Copy link
Member

jazg commented Mar 1, 2021

@rahulghangas Got it. Could we send to the recipients in parallel inside the Gossip function to resolve this? I will check with @loongy to gauge the intended usage of the timeout.

@rahulghangas
Copy link
Contributor Author

That could potentially solve the issue, and the call to gossip wouldn't require a context as input since we'll use the internal timeout to define a context and use it on all messages (in parallel)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants