Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: smart discovery (dead/solo remedy subnets) #1949

Draft
wants to merge 40 commits into
base: stage
Choose a base branch
from

Conversation

iurii-ssv
Copy link
Contributor

@iurii-ssv iurii-ssv commented Dec 18, 2024

Replaces #1917 additionally improving a bunch of p2p-handling code, with the main contributing factor being "selective discovery"

Before merging:

  • remove/resolve TODOs used for testing
  • (when testing) make sure this PR doesn't introduce memory-leaks or super-high CPU consumption type of issues
  • do we want to set MinConnectivitySubnets to higher-than-0 value (say 3) ? There isn't really a good way to tell how it affects network connectivity (but it's almost certainly gonna improve it, not worsen it); we removed this feature out of this PR for now (for simplicity)

@iurii-ssv iurii-ssv force-pushed the testing/deploy/deadsub-remedy branch 2 times, most recently from 5dac0bc to a364c2c Compare December 18, 2024 13:31
@iurii-ssv
Copy link
Contributor Author

Another thing I've realized is happening - because we only trim once we've reached MaxPeers limit, at some point we might stop trimming completely - for example, we've got to max amount of the incoming connections we allow via 0.5 ratio (say, 30) + we've discovered some outgoing peers but not enough to reach MaxPeers limit,

in other words, we've discovered everything we could (and we keep discovering, just at a very slow rate - due to quite an aggressive filter there isn't many connections we are interested in) but that also means we stopped re-cycling(trimming) our incoming connections periodically because we don't trim incoming/outgoing connections separately

I've added a commit to address that - d230c4a - it's pretty "hacky/ad-hoc" way do the job - ideally (to simplify this) I think we probably want to have separate limits per incoming/outgoing connections and somewhat separate trimming logic, WDYT ?

@iurii-ssv iurii-ssv force-pushed the testing/deploy/deadsub-remedy branch 2 times, most recently from 88a752c to 44665d0 Compare January 9, 2025 13:58
Copy link

codecov bot commented Jan 9, 2025

@iurii-ssv iurii-ssv force-pushed the testing/deploy/deadsub-remedy branch 4 times, most recently from 04f69b3 to 846688f Compare January 17, 2025 16:41
@iurii-ssv iurii-ssv force-pushed the testing/deploy/deadsub-remedy branch from 846688f to 13b33d0 Compare January 27, 2025 10:02
@iurii-ssv iurii-ssv changed the title p2p: dead/solo subnets remedy (enhanced) p2p: smart discovery (dead/solo remedy subnets) Jan 27, 2025
@iurii-ssv iurii-ssv marked this pull request as draft February 26, 2025 13:31
Copy link
Contributor

@y0sher y0sher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small changes :)

@iurii-ssv iurii-ssv force-pushed the testing/deploy/deadsub-remedy branch from 6b34e3a to e4ae99e Compare February 26, 2025 18:36
// start go-routine to choose the best peer(s) from the pool of discovered peers to propose
// these as outbound connections, all discovered peers are kept in a pool so we can choose
// from that same pool across different runs of this interval-func
async.Interval(n.ctx, 15*time.Second, func() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe its time to extract this 200 lines func into its own method 😅

Copy link

@zippoxer zippoxer Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@y0sher @iurii-ssv this is an improvement, but i still think this is just hard to read compared to what it should do

Copy link

@zippoxer zippoxer Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1: there are too many comments and some are way longer than they could be

comments should be added when it's hard to infer what's happening from the code (non-obvious things), or for visual separation of steps (and in the latter case it should be a name/title, not an explanation)

comments are really useful for both cases (and likely more), but when there are too many less useful comments, then the reader just avoids them entirely as they become noise unfortunately :/

(applies to the rest of the PR as well)

Copy link
Contributor Author

@iurii-ssv iurii-ssv Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe its time to extract this 200 lines func into its own method 😅

I don't think "just splitting code up" does anything by itself, it should be a "logical unit" to be a separate meaningful function, do you have a specific suggestion what should be put in a separate func ?

Eg. main func is typically ~1000 lines long and its fine for it to be that long,

as for this function here - all it's doing is choosing/filtering discovered peers (it's 1 single task, there is nothing to split off I don't think)

comments should only be added where it's hard or impossible to infer what's happening from the code, or for visual separation of steps (and in the latter case it should be a name/title, not an explanation)

I try to leave only comments that have "useful" info in them, I think it's better to have more comments than almost no comments

but I agree it's hard to write concise and complete comments, do you have specific suggestions on what to remove/rephrase ?

Copy link
Contributor

@moshe-blox moshe-blox Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2: i might be wrong, but the scoring mechanism looks more complex than it has to be

@y0sher @iurii-ssv curious to hear arguments why it should be more complex than what we had in the simulation?

Copy link
Contributor

@moshe-blox moshe-blox Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "just splitting code up" does anything by itself, it should be a "logical unit" to be a separate meaningful function, do you have a specific suggestion what should be put in a separate func ?

i agree, i also prefer everything under the same purpose to be in the same place (and often get questioned for my big functions)

however in this case we're under the p2p.Start method which does many things, not one, so it'd make sense to extract the connection-making task to its own method, such as the tasks below it already are:

ssv/network/p2p/p2p.go

Lines 475 to 482 in 4895ae1

async.Interval(n.ctx, peersTrimmingInterval, n.peersTrimming(logger))
async.Interval(n.ctx, peersReportingInterval, recordPeerCount(n.ctx, logger, n.host))
async.Interval(n.ctx, peerIdentitiesReportingInterval, recordPeerIdentities(n.ctx, n.host, n.idx))
async.Interval(n.ctx, topicsReportingInterval, recordPeerCountPerTopic(n.ctx, logger, n.topicsCtrl, 2))

so we're not splitting this logic in any sense

wdyt?

Copy link
Contributor Author

@iurii-ssv iurii-ssv Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the scoring mechanism looks more complex than it has to be

The func closest to what we had in simulation is this one -

func (n *p2pNetwork) score(peerID peer.ID, subnet int) float64 {

but it doesn't map in straightforward way onto this solution we've decided to go with (it only works with our own peers to decided if connected peer is good or not):

image

and so it's only used when trimming currently connected peers,

while scoreSubnetSum is the most straightforward implementation of the algo we want for discovery

none of the comments here are needed because they're obvious from reading the variable names

yeah, maybe it's a bit too much, removed redundant comments here - 81c8268

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree splitting out of the Start function could make both the Start function and the scoring function easier to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also on the comments part, I partially agree with @moshe-blox , I think it might be worth to "sum up" some comments so instead of comment every line we can "prepare the reader" to what the upcoming code will do, and trust the the variable names will talk for themselves. unless of course the code is a bit more complex so it needs further explaining.

Copy link
Contributor Author

@iurii-ssv iurii-ssv Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however in this case we're under the p2p.Start method which does many things, not one, so it'd make sense to extract the connection-making task to its own method, such as the tasks below it already are:

ssv/network/p2p/p2p.go

Lines 475 to 482 in 4895ae1

async.Interval(n.ctx, peersTrimmingInterval, n.peersTrimming(logger))
async.Interval(n.ctx, peersReportingInterval, recordPeerCount(n.ctx, logger, n.host))
async.Interval(n.ctx, peerIdentitiesReportingInterval, recordPeerIdentities(n.ctx, n.host, n.idx))
async.Interval(n.ctx, topicsReportingInterval, recordPeerCountPerTopic(n.ctx, logger, n.topicsCtrl, 2))

so we're not splitting this logic in any sense

I wasn't thinking about Start func itself - it's a kind of refactoring I would usually prefer to do on top (later on in a separate PR), but if you think it simplifies (not complicates) review - maybe we can move discovery-related parts into separate func - let me see what I can do about that

update: split off discovery-related parts into separate method - 7b119d6 and also reviewed comments (I think whatever left is useful in case we need to get back to this code and understand it / update it in the future), happy to apply other suggestions if you have any

Comment on lines 368 to 392
const (
duoSubnetPriority = 1
soloSubnetPriority = 3
deadSubnetPriority = 9

maxPossibleScore = commons.SubnetsCount * deadSubnetPriority
)

deadSubnetCnt := 0
soloSubnetCnt := 0
duoSubnetCnt := 0
for i := 0; i < commons.SubnetsCount; i++ {
if s[i] == 0 {
deadSubnetCnt++
}
if s[i] == 1 {
soloSubnetCnt++
}
if s[i] == 2 {
duoSubnetCnt++
}
}
return float64(maxPossibleScore - (deadSubnetPriority*deadSubnetCnt + soloSubnetPriority*soloSubnetCnt + duoSubnetPriority*duoSubnetCnt))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially we can change this func to accept suggested peer for score computation:

		scoreSubnetPeers := func(current SubnetPeers, added SubnetPeers) (score float64) {
			const (
				duoSubnetPriority  = 1
				soloSubnetPriority = 3
				deadSubnetPriority = 9
				maxPossibleScore   = commons.SubnetsCount * deadSubnetPriority
			)
			v := 0
			for i := 0; i < commons.SubnetsCount; i++ {
				if added[i] == 1 {
					switch s[i] {
					case 0:
						v += deadSubnetPriority
					case 1:
						v += soloSubnetPriority
					case 2:
						v += duoSubnetPriority
					}
				}
			}
			return float64(maxPossibleScore - v)
		}

so only if the peer adds to a dead/solo subnet, it would receive more score

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, what you suggest will make it a bit more readable, so I've changed the scoreSubnetPeers func to be a method of SubnetSum:

// scoreSubnetVector estimates the score of subnet-vector v (each byte in v is 0 or 1) based on 
// how many dead/solo/duo subnets it contributes to the current subnet-sum (higher score is better)
func (s *SubnetSum) scoreSubnetVector(v commons.Subnets) (score float64) {
	const (
		duoSubnetPriority  = 1
		soloSubnetPriority = 3
		deadSubnetPriority = 9
	)

	result := 0.0
	for i := 0; i < commons.SubnetsCount; i++ {
		if v[i] != 1 {
			continue
		}
		if s[i] == 0 {
			result += deadSubnetPriority
		}
		if s[i] == 1 {
			result += soloSubnetPriority
		}
		if s[i] == 2 {
			result += duoSubnetPriority
		}
	}
	return result
}

and also refactored the way we work with SubnetSum a bit - 690ee3e

let me know if that's in line with what you had in mind :)

@iurii-ssv iurii-ssv force-pushed the testing/deploy/deadsub-remedy branch from 808b302 to 690ee3e Compare February 28, 2025 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants