-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VoteAccount: remove OnceLock around VoteState #2659
Conversation
9f16085
to
3dac136
Compare
If this PR represents a change to the public RPC API:
Thank you for keeping the RPC clients in sync with the server API @alessandrod. |
771c631
to
a93bec3
Compare
The Firedancer team maintains a line-for-line reimplementation of the |
63b2da0
to
ece636d
Compare
This is good with FD |
d93b53f
to
3415feb
Compare
} else { | ||
account.vote_state() | ||
}; | ||
if let Ok(vote_state) = vote_state { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a bit noisy because there are a lot of changes like this: vote_state() used to return Result, but votes that failed to be parsed were never inserted into VoteAccounts. So effectively the Err branches were never entered, this PR removes them, but there's no actual logic change.
3415feb
to
cca25f6
Compare
The lazy-init OnceLock logic wasn't really being used. Execution/perf wise this code doesn't change anything since we were already forcing early deserialization in StakesCache::check_and_store: - // Called to eagerly deserialize vote state - let _res = vote_account.vote_state(); For more context see https://discord.com/channels/428295358100013066/439194979856809985/1268759289531596872 https://discord.com/channels/428295358100013066/439194979856809985/1272661616399224892
cca25f6
to
f1561f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look solid, I just prefer other cleanup to be separate from the main PR if you don't mind.
@@ -34,7 +34,7 @@ pub enum Error { | |||
#[derive(Debug)] | |||
struct VoteAccountInner { | |||
account: AccountSharedData, | |||
vote_state: OnceLock<Result<VoteState, Error>>, | |||
vote_state: VoteState, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In contrast to the PR description, this change is not just removing the OnceLock
but also dropping Result
from the type.
These two changes should definitely have gone in separate PRs.
We really do not gain anything from mixing several independent changes into one PR, other than making it more likely that something subtle is missed out in the review or testing.
Dropping Result
is indeed a functional change because now VoteAccount::try_from(account_shared_data)
can fail in the ways that the previous code was not and this could have downstream consequences.
Is this functional change fine? I don't know, and having the change left out of the PR description and mixed in by other independent code changes does not help with scrutinizing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two changes should definitely have gone in separate PRs.
Once again - we're at 3 PRs out of 3 - a dismissive and condescending initial comment that makes me wonder if you've spent more than 10 seconds looking at the code. I invite you to think about what you're suggesting would have looked like:
- 1 PR with a ~5 line change removing OnceLock from the type declaration and the corresponding OnceLock::new() to initialize
- A second PR with all the changes in this PR minus the 5 lines above
So are you saying that you're not able to follow the bulk of the changes because of 5 lines?
Also it would be great if you could avoid saying "definitely". To me it makes absolutely zero sense to split that way. The whole reason for having Result
is that OnceLock forced us to have Result
. If OnceLock
goes, Result
has absolutely no reason for being there anymore. Just removing OnceLock
but not Result
makes no sense. I'm not doing this change in a vacuum - I could care less about OnceLock
- I need to put VoteState
inside VoteAccountInner
so I can deserializa in place. Maybe I should have added that code in this PR too to communicate the intent better and make reviewing easier! (In fact I did initially, but then I anticipated this conversation with you).
I understand that you seem to have a preference for micro PRs. I don't. In fact I hate when PRs aren't atomic and contain all the logical changes that a changeset is trying to make, because then using the history to understand what some code does, which is something I all day every day, becomes harder.
Dropping Result is indeed a functional change because now VoteAccount::try_from(account_shared_data) can fail in the ways that the previous code was not and this could have downstream consequences.
Is this functional change fine? I don't know
Did you read this? #2659 (review) and this #2271 (comment) ?
You don't know because you're not familiar with the code, or you're not convinced by the claim made above?
Reviews are hard work. I'd love for you to review this and the other PRs I tagged you in. If you don't have time or will to do it that's fine too, but then don't ask me to tag you for review.
@@ -303,8 +320,8 @@ impl TryFrom<AccountSharedData> for VoteAccountInner { | |||
return Err(Error::InvalidOwner(*account.owner())); | |||
} | |||
Ok(Self { | |||
vote_state: VoteState::deserialize(account.data()).map_err(Error::InstructionError)?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behzadnouri brings up a great point that this deserialization could potentially fail in one of the call-sites (Stakes cache I feel fine with but have to dig into rewards code more carefully). I agree with him that it would be best to have this PR store the Result
and follow up with another PR that we can carefully review all the call-sites of the TryFrom
in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you decide to not split this up (honestly fine with me), I'll do another pass tomorrow after another more careful review. I do think the OnceLock
approach caused us to introduce the Result
in the first place so I can see why you coupled that change into this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main concern I had during review was around deserializing VoteAccounts
(done when deserializing bank stakes or epoch stakes) because after this PR we would now reject snapshots that contain vote accounts that aren't deserializable. This is fine as long as only deserializable vote accounts were ever stored in the stakes cache / epoch stakes. In the current code it's not possible to store a vote account that's not deserializable because the stakes cache wouldn't insert it. But let's say though that in the past we did store a vote account which was not deserializable, it looks like it would have only been removed by handle_invalid_keys
(called downstream of update_rewards_with_thread_pool
) if there was a stake account delegated to it because _load_vote_and_stake_accounts
looks for any vote keys for stake delegations that correspond to invalid vote accounts. But if there are no stake delegations to an invalid vote account in the stakes cache, it doesn't look like we would ever remove it.
In order to merge this I think we should probably try loading snapshots with this branch on each cluster to see if there are some ancient invalid vote accounts in there. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this makes sense. As discussed I've added a custom deserializer that discards invalid vote accounts 2dba7d1
For extra peace of mind I've already checked that there are no invalid accounts in mnb, and I'm going to test the other clusters too.
@@ -2280,12 +2280,8 @@ impl Bank { | |||
invalid_vote_keys.insert(vote_pubkey, InvalidCacheEntryReason::WrongOwner); | |||
return None; | |||
} | |||
let Ok(vote_state) = vote_account.vote_state().cloned() else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in the get_vote_account
closure above we now have a redundant check for vote owner and deserializability since that's exactly what VoteAccount::try_from
is doing now. There are some other redundant checks scattered around for things like vote owner as well. Can be cleaned up in a follow up.
let account = self.get_account_with_fixed_root(vote_pubkey)?;
if account.owner() == &solana_vote_program
&& VoteState::deserialize(account.data()).is_ok()
{
vote_accounts_cache_miss_count.fetch_add(1, Relaxed);
}
VoteAccount::try_from(account).ok()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has been dead for a long time I think. We check that VoteAccounts contains all vote accounts at snapshot load time. We have the vote_accounts_cache_miss_count
metric that's never been non-zero anytime I've looked since I started this cursed work.
@@ -101,20 +101,6 @@ impl EpochStakes { | |||
.iter() | |||
.filter_map(|(key, (stake, account))| { | |||
let vote_state = account.vote_state(); | |||
let vote_state = match vote_state.as_ref() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few lines above we would have accumulated total_stake
for invalid vote accounts if they were in the epoch vote accounts map. After this change, invalid vote accounts can't be in the epoch vote accounts map because wouldn't deserialize them from a snapshot and we would never insert them into the stakes cache. So this change might be fixing a bug if there actually was an invalid vote account in the epoch vote accounts.
@@ -303,8 +320,8 @@ impl TryFrom<AccountSharedData> for VoteAccountInner { | |||
return Err(Error::InvalidOwner(*account.owner())); | |||
} | |||
Ok(Self { | |||
vote_state: VoteState::deserialize(account.data()).map_err(Error::InstructionError)?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main concern I had during review was around deserializing VoteAccounts
(done when deserializing bank stakes or epoch stakes) because after this PR we would now reject snapshots that contain vote accounts that aren't deserializable. This is fine as long as only deserializable vote accounts were ever stored in the stakes cache / epoch stakes. In the current code it's not possible to store a vote account that's not deserializable because the stakes cache wouldn't insert it. But let's say though that in the past we did store a vote account which was not deserializable, it looks like it would have only been removed by handle_invalid_keys
(called downstream of update_rewards_with_thread_pool
) if there was a stake account delegated to it because _load_vote_and_stake_accounts
looks for any vote keys for stake delegations that correspond to invalid vote accounts. But if there are no stake delegations to an invalid vote account in the stakes cache, it doesn't look like we would ever remove it.
In order to merge this I think we should probably try loading snapshots with this branch on each cluster to see if there are some ancient invalid vote accounts in there. Does that make sense?
List of all VoteAccount::try_from() callers StakesCache
These are guarded by VoteVersions::is_correct_size_and_initialized. This PR doesn't change behavior since VoteAccount::try_from doesn't get called at all for invalid vote accounts. Bank
|
Another data point:
I've grepped for the warnings printed (and removed by this PR) in case of invalid votes on a validator that's been running for ~3 weeks, and I get 0 hits. |
1e1a56b
to
9c9e2cc
Compare
9c9e2cc
to
1db912c
Compare
Before switching to eager vote account deserialization we were (accidentally) able to load snapshots with invalid vote accounts (because account parsing was deferred). This change ensures that we're still able to parse such snapshots.
1db912c
to
2dba7d1
Compare
VoteAccount is never deserialized individually, but only as part of VoteAccounts, which has a custom deser that doesn't require VoteAccount to implement Deserialize
testnet and devnet have no invalid accounts either |
This should be a follow up. Note that this metric wasn't checking for initialized so it could have false positives when a stake is delegated to a zeroed vote-program owned account. |
* VoteAccount: remove OnceLock around VoteState The lazy-init OnceLock logic wasn't really being used. Execution/perf wise this code doesn't change anything since we were already forcing early deserialization in StakesCache::check_and_store: - // Called to eagerly deserialize vote state - let _res = vote_account.vote_state(); For more context see https://discord.com/channels/428295358100013066/439194979856809985/1268759289531596872 https://discord.com/channels/428295358100013066/439194979856809985/1272661616399224892 * VoteAccounts: discard invalid accounts on deserialization Before switching to eager vote account deserialization we were (accidentally) able to load snapshots with invalid vote accounts (because account parsing was deferred). This change ensures that we're still able to parse such snapshots. * VoteAccount: remove Deserialize impl VoteAccount is never deserialized individually, but only as part of VoteAccounts, which has a custom deser that doesn't require VoteAccount to implement Deserialize
The lazy-init OnceLock logic wasn't really being used. Execution/perf wise this code doesn't change anything since we were already forcing early deserialization in StakesCache::check_and_store:
For more context see
https://discord.com/channels/428295358100013066/439194979856809985/1268759289531596872 https://discord.com/channels/428295358100013066/439194979856809985/1272661616399224892