Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to connect to Leader via all known servers simultaneously #290

Closed
wants to merge 1 commit into from

Conversation

cnnrznn
Copy link
Contributor

@cnnrznn cnnrznn commented Feb 23, 2024

Resolves #134

Inspecting the result of this change may be easier than reading the diff as a first pass.

Summary

  1. Open a goroutine to connect to each known, "one-hop" server
  2. goroutines result in (1) a connection to the leader, (2) a string address of a "two-hop" known leader, or (3) error
  3. "protocols" (connections) are collected on the main thread. If one exists we've found the leader
  4. If one-hop servers return known leaders, try to connect to them one at a time on the main thread.

Design choices

  • Connect to "two-hop" servers on the main thread. Without network partitioning, it is likely every server knows who the current leader is. I think this is the average case. If we follow two-hop links within the goroutines, it is faster but opens up O(n-servers) connections to the leader
  • Don't batch. We could batch requests to servers by status (voter, etc.), but I think this makes the code significantly less readable.

@cnnrznn
Copy link
Contributor Author

cnnrznn commented Feb 23, 2024

@cole-miller @MathieuBordere This PR may pass the tests as-is but would love feedback on the importance of batching.

I will admit I have no idea how large these clusters can be. Reading through the comments on #134, it seems like I either

  1. Should re-use the code to sort the nodes first by role
  2. (1) and funnel server connections through a goroutine pool

@cole-miller cole-miller self-requested a review February 28, 2024 20:51
@cole-miller cole-miller self-assigned this Feb 28, 2024
@cole-miller
Copy link
Contributor

cole-miller commented Feb 29, 2024

Sorry for the delay. Re: batching and prioritization, how would you feel about this:

  • Use a https://godocs.io/golang.org/x/sync/semaphore to limit concurrency for connection attempts.
  • Spawn a goroutine for each candidate node that first Acquires a lease on the semaphore, then reads from a "candidates" channel and attempts to connect to that node (only), writing the protocol to an output channel if it got ahold of the leader.
  • Fill the candidates channel with the list of nodes in sorted order (voters then standbys then spares).
  • Read from the output channel to get a leader connection. This done, cancel the remaining goroutines (wrapping the Context we already have in a WithCancel).
  • Upgrade option: have two candidate channels: one (A) is filled with the original nodes in sorted order, the other (B) starts empty and we add to it whenever some node reports that a different node is the leader. The goroutines should try to connect to an address from channel B if possible and fall back to channel A (I don't know exactly what concurrency pattern to use to achieve this).

(@MathieuBordere does this look reasonable to you?)

@cnnrznn
Copy link
Contributor Author

cnnrznn commented Feb 29, 2024

@cole-miller I agree with most of this. I think it could be just a little simpler, and also more reflective of the original logic.

  1. Sort the list of servers by role
  2. Spawn a goroutine in sorted order with a cancel func for each server
  3. Goroutines do the following
    a. Grab a semaphore lock, defer release
    b. Check if canceled, if so exit
    c. Perform original logic, writing the 1 or 2 hop connection to the output channel
  4. Main thread reads from the output channel. If a connection is found, run all cancel func's

@cole-miller
Copy link
Contributor

Thanks, I think you're right and that's a better plan! It nicely sidesteps the issue of how to prioritize candidates that were generated by asking some server over those taken from the a priori sorted list.

@cnnrznn
Copy link
Contributor Author

cnnrznn commented Feb 29, 2024

@cole-miller this was great feedback. Thanks for taking the time to create it.

Looking forward to implementing these changes

@cnnrznn
Copy link
Contributor Author

cnnrznn commented Mar 17, 2024

Will close in favor of #292

@cnnrznn cnnrznn closed this Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallellize connectAttemptAll
2 participants