Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hotfix(message/validation): optimize signer state memory usage #1874

Open
wants to merge 23 commits into
base: stage
Choose a base branch
from

Conversation

nkryuchkov
Copy link
Contributor

@nkryuchkov nkryuchkov commented Nov 22, 2024

Changes:

Calculations of signer state for 50K validators for MessageCounts:

We store the state for each validator for each of the 6 roles. For each role, we store the state, of each signer, up to 13. For each signer, we store the state of the 64 slots. Then for each slot, we store the signer state. So overall we have up to 50000 * 6 * 13 * 64 ~= 249_600_000 signer states. I'll count the maximal value but the actual value should be lower because the average amount of operators is less than 13.

So if MessageCounts is reduced from 48 bytes to 1 then in such case the max theoretical memory consumption should reduce from ~12 GB to ~250 MB

Changes in Pyroscope:

image

@nkryuchkov nkryuchkov added the optimization Something to make SSV run more efficiently label Nov 22, 2024
@nkryuchkov nkryuchkov changed the title message/validation: optimize signer state memory usage fix(message/validation): optimize signer state memory usage Nov 22, 2024
@nkryuchkov nkryuchkov changed the title fix(message/validation): optimize signer state memory usage message/validation: optimize signer state memory usage Nov 22, 2024
Copy link
Contributor

@iurii-ssv iurii-ssv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@nkryuchkov nkryuchkov changed the title message/validation: optimize signer state memory usage hotfix(message/validation): optimize signer state memory usage Nov 26, 2024
@nkryuchkov nkryuchkov added the critical Needs immediate attention label Nov 26, 2024
Copy link
Contributor

@MatheusFranco99 MatheusFranco99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!!
The issue I raised doesn't regard this PR, so approving :)

# Conflicts:
#	message/validation/consensus_validation.go
#	message/validation/partial_validation.go
@@ -76,15 +69,15 @@ func (os *OperatorState) Set(slot phase0.Slot, epoch phase0.Epoch, state *Signer
}

func (os *OperatorState) MaxSlot() phase0.Slot {
os.mu.RLock()
defer os.mu.RUnlock()
os.mu.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a specific reason for replacing the RWMutex with a write-only Mutex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oleg-ssvlabs RWMutex consumes more memory, and the OperatorState memory consumption seems to be a bottleneck in the exporter. I'd use RWMutex here only if we benchmark it and see a significant improvement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, interesting. Do you happen to have any numbers for comparison? It would be really compelling to see the difference (specifically between mutex and rwMutex)

Copy link
Contributor Author

@nkryuchkov nkryuchkov Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 64 vs 80 bytes for each OperatorState IIRC, not much of a difference, but I just wanted to squeeze everything out of the state structure because we allocate a lot of them: for each validator for each role for each operator. So I guess on mainnet the total difference would be a few tens of megabytes (~60K validators * 4 roles * ~5-6 avg committee size * 64 vs 80), which is not very much.

I agree that RWMutex would reduce mutex block time but I think the difference wouldn't be very big. But generally it looks to me as a trade-off and since we're currently fighting with exporter memory issues, I tend to prefer to reduce the memory use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oleg-ssvlabs actually, perhaps we could remove this mutex as @moshe-blox suggested in #2034 (comment). We have validation lock by msg ID, and the message validation doesn't have concurrent checks, so we shouldn't have any data race in OperatorState and ValidatorState

# Conflicts:
#	message/validation/common_checks.go
#	message/validation/consensus_validation.go
#	message/validation/const.go
#	message/validation/validation.go
Copy link

codecov bot commented Feb 11, 2025

Codecov Report

Attention: Patch coverage is 83.60000% with 41 lines in your changes missing coverage. Please review.

Project coverage is 48.0%. Comparing base (1a5c07e) to head (93bc566).

Files with missing lines Patch % Lines
message/validation/seen_msg_types.go 84.2% 19 Missing ⚠️
message/validation/quorum.go 34.6% 15 Missing and 2 partials ⚠️
message/validation/partial_validation.go 76.9% 1 Missing and 2 partials ⚠️
message/validation/consensus_validation.go 94.7% 0 Missing and 2 partials ⚠️
Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
critical Needs immediate attention optimization Something to make SSV run more efficiently
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants