Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: reduce identification broadcast from one per farm to one per farmer, also for caches #2945

Conversation

tediou5
Copy link
Contributor

@tediou5 tediou5 commented Jul 24, 2024

This is the first step towards #2900.

Next I'll try to replace the request with a stream(if its still necessary), and I think we can optimise the ids in the farmer (cache), e.g. by incrementing the first id, so that when we send an identify we only need one id, and the rest can be calculated directly from the index. For very large farmers, this should make the identify message much smaller.

The first commit is some moving and renaming(hope I'm right this time.).

Code contributor checklist:

Copy link
Member

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I just had some questions about changing how much detail we log.

They are all optional changes.

crates/subspace-farmer/src/cluster/cache.rs Outdated Show resolved Hide resolved
crates/subspace-farmer/src/cluster/farmer.rs Outdated Show resolved Hide resolved
Copy link
Member

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for those changes, I'm going to leave this to a more experienced reviewer to do the final approval.

Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have tackled the description of the #2900 literally by reducing number of network messages, but that isn't quite the solution I meant.

We still have a similar amount of network bandwidth used and we still process every single identification message just like we did before, so this probably improves situation a bit, but not fundamentally. It also breaks wire messages for those who upgrade components one by one.

What I expected to happen instead is this:

  • farmer and cache application instances generate ephemeral IDs during startup and use them for identification
  • controller keeps track of ephemeral farmers/caches instances and their internal farms/caches
  • when new farmer/cache ephemeral instance is discovered, stream request is made to retrieve the details, but otherwise details are not sent over the wire at all, saving both bandwidth on the network and compute on the controller since controller needs to do a single check for the whole bunch of potential farms

Comment on lines 281 to 282
let ClusterFarmerIdentifyFarmBroadcast { details } = identify_message;
for detail in details {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd extract another function for the loop so the indentation is kept the same before and after for the actual logic

@tediou5
Copy link
Contributor Author

tediou5 commented Jul 29, 2024

I see you have tackled the description of the #2900 literally by reducing number of network messages, but that isn't quite the solution I meant.

We still have a similar amount of network bandwidth used and we still process every single identification message just like we did before, so this probably improves situation a bit, but not fundamentally. It also breaks wire messages for those who upgrade components one by one.

What I expected to happen instead is this:

  • farmer and cache application instances generate ephemeral IDs during startup and use them for identification
  • controller keeps track of ephemeral farmers/caches instances and their internal farms/caches
  • when new farmer/cache ephemeral instance is discovered, stream request is made to retrieve the details, but otherwise details are not sent over the wire at all, saving both bandwidth on the network and compute on the controller since controller needs to do a single check for the whole bunch of potential farms

@nazar-pc Oh, so you're saying that we're actually only checking their identity once by creating a stream, right? And then it keeps the stream connected, and pushes the new state when it changes.

@tediou5
Copy link
Contributor Author

tediou5 commented Jul 29, 2024

Yeah, it's a better way to do it, and it's still forward compatible. Let me turn to him.

@nazar-pc
Copy link
Member

@nazar-pc Oh, so you're saying that we're actually only checking their identity once by creating a stream, right? And then it keeps the stream connected, and pushes the new state when it changes.

Not really. The stream will only be used to send individual farm details. But as long as ephemeral farmer ID doesn't change, we know that farmer application didn't restart and don't need to worry about individual farms in it.

In fact if we derive ephemeral farmer ID from fingerprints of individual farms, we don't need to send individual fingerprints in farm details anymore and we'll be able to support quick farmer restarts without losing state on the controller.

So current farm notifications will be replaced with farmer notifications and farm details will be sent in a stream only if/when necessary.

@tediou5 tediou5 closed this Aug 3, 2024
@tediou5 tediou5 force-pushed the opt/optimize-farm-and-cache-identification branch from 4ac9713 to 5506663 Compare August 3, 2024 18:39
@tediou5
Copy link
Contributor Author

tediou5 commented Aug 4, 2024

In fact if we derive ephemeral farmer ID from fingerprints of individual farms, we don't need to send individual fingerprints in farm details anymore and we'll be able to support quick farmer restarts without losing state on the controller.

@nazar-pc I'm a bit confused here, do you mean that we include farm information (like some hash) in the farmer ID so that the derived id doesn't change every time the farmer is restarted if the configuration and machine don't change?

@nazar-pc
Copy link
Member

nazar-pc commented Aug 5, 2024

@nazar-pc I'm a bit confused here, do you mean that we include farm information (like some hash) in the farmer ID so that the derived id doesn't change every time the farmer is restarted if the configuration and machine don't change?

Exactly. See how fingerprint is derived for farms right now, something similar will need to be done for the whole farmer instead.

@tediou5 tediou5 deleted the opt/optimize-farm-and-cache-identification branch August 19, 2024 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants