VoteAccount: remove OnceLock around VoteState #2659

alessandrod · 2024-08-19T16:10:03Z

The lazy-init OnceLock logic wasn't really being used. Execution/perf wise this code doesn't change anything since we were already forcing early deserialization in StakesCache::check_and_store:

-                            // Called to eagerly deserialize vote state
-                            let _res = vote_account.vote_state();

For more context see
https://discord.com/channels/428295358100013066/439194979856809985/1268759289531596872 https://discord.com/channels/428295358100013066/439194979856809985/1272661616399224892

mergify · 2024-08-19T16:46:30Z

If this PR represents a change to the public RPC API:

Make sure it includes a complementary update to rpc-client/ (example)
Open a follow-up PR to update the JavaScript client @solana/web3.js (example)

Thank you for keeping the RPC clients in sync with the server API @alessandrod.

mergify · 2024-08-20T08:34:26Z

The Firedancer team maintains a line-for-line reimplementation of the
native programs, and until native programs are moved to BPF, those
implementations must exactly match their Agave counterparts.
If this PR represents a change to a native program implementation (not
tests), please include a reviewer from the Firedancer team. And please
keep refactors to a minimum.

jumpsiegel · 2024-08-20T10:40:43Z

This is good with FD

alessandrod · 2024-08-21T15:44:30Z

core/src/commitment_service.rs

            } else {
                account.vote_state()
            };
-            if let Ok(vote_state) = vote_state {


This PR is a bit noisy because there are a lot of changes like this: vote_state() used to return Result, but votes that failed to be parsed were never inserted into VoteAccounts. So effectively the Err branches were never entered, this PR removes them, but there's no actual logic change.

The lazy-init OnceLock logic wasn't really being used. Execution/perf wise this code doesn't change anything since we were already forcing early deserialization in StakesCache::check_and_store: - // Called to eagerly deserialize vote state - let _res = vote_account.vote_state(); For more context see https://discord.com/channels/428295358100013066/439194979856809985/1268759289531596872 https://discord.com/channels/428295358100013066/439194979856809985/1272661616399224892

jstarry

Changes look solid, I just prefer other cleanup to be separate from the main PR if you don't mind.

runtime/src/stakes.rs

vote/src/vote_account.rs

behzadnouri · 2024-08-22T14:09:22Z

vote/src/vote_account.rs

@@ -34,7 +34,7 @@ pub enum Error {
 #[derive(Debug)]
 struct VoteAccountInner {
    account: AccountSharedData,
-    vote_state: OnceLock<Result<VoteState, Error>>,
+    vote_state: VoteState,


In contrast to the PR description, this change is not just removing the OnceLock but also dropping Result from the type.

These two changes should definitely have gone in separate PRs.
We really do not gain anything from mixing several independent changes into one PR, other than making it more likely that something subtle is missed out in the review or testing.

Dropping Result is indeed a functional change because now VoteAccount::try_from(account_shared_data) can fail in the ways that the previous code was not and this could have downstream consequences.
Is this functional change fine? I don't know, and having the change left out of the PR description and mixed in by other independent code changes does not help with scrutinizing that.

These two changes should definitely have gone in separate PRs.

Once again - we're at 3 PRs out of 3 - a dismissive and condescending initial comment that makes me wonder if you've spent more than 10 seconds looking at the code. I invite you to think about what you're suggesting would have looked like:

1 PR with a ~5 line change removing OnceLock from the type declaration and the corresponding OnceLock::new() to initialize

A second PR with all the changes in this PR minus the 5 lines above

So are you saying that you're not able to follow the bulk of the changes because of 5 lines?

Also it would be great if you could avoid saying "definitely". To me it makes absolutely zero sense to split that way. The whole reason for having Result is that OnceLock forced us to have Result. If OnceLock goes, Result has absolutely no reason for being there anymore. Just removing OnceLock but not Result makes no sense. I'm not doing this change in a vacuum - I could care less about OnceLock - I need to put VoteState inside VoteAccountInner so I can deserializa in place. Maybe I should have added that code in this PR too to communicate the intent better and make reviewing easier! (In fact I did initially, but then I anticipated this conversation with you).

I understand that you seem to have a preference for micro PRs. I don't. In fact I hate when PRs aren't atomic and contain all the logical changes that a changeset is trying to make, because then using the history to understand what some code does, which is something I all day every day, becomes harder.

Dropping Result is indeed a functional change because now VoteAccount::try_from(account_shared_data) can fail in the ways that the previous code was not and this could have downstream consequences.
Is this functional change fine? I don't know

Did you read this? #2659 (review) and this #2271 (comment) ?

You don't know because you're not familiar with the code, or you're not convinced by the claim made above?

Reviews are hard work. I'd love for you to review this and the other PRs I tagged you in. If you don't have time or will to do it that's fine too, but then don't ask me to tag you for review.

jstarry · 2024-08-22T14:36:15Z

vote/src/vote_account.rs

@@ -303,8 +320,8 @@ impl TryFrom<AccountSharedData> for VoteAccountInner {
            return Err(Error::InvalidOwner(*account.owner()));
        }
        Ok(Self {
+            vote_state: VoteState::deserialize(account.data()).map_err(Error::InstructionError)?,


@behzadnouri brings up a great point that this deserialization could potentially fail in one of the call-sites (Stakes cache I feel fine with but have to dig into rewards code more carefully). I agree with him that it would be best to have this PR store the Result and follow up with another PR that we can carefully review all the call-sites of the TryFrom in.

If you decide to not split this up (honestly fine with me), I'll do another pass tomorrow after another more careful review. I do think the OnceLock approach caused us to introduce the Result in the first place so I can see why you coupled that change into this PR.

Main concern I had during review was around deserializing VoteAccounts (done when deserializing bank stakes or epoch stakes) because after this PR we would now reject snapshots that contain vote accounts that aren't deserializable. This is fine as long as only deserializable vote accounts were ever stored in the stakes cache / epoch stakes. In the current code it's not possible to store a vote account that's not deserializable because the stakes cache wouldn't insert it. But let's say though that in the past we did store a vote account which was not deserializable, it looks like it would have only been removed by handle_invalid_keys (called downstream of update_rewards_with_thread_pool) if there was a stake account delegated to it because _load_vote_and_stake_accounts looks for any vote keys for stake delegations that correspond to invalid vote accounts. But if there are no stake delegations to an invalid vote account in the stakes cache, it doesn't look like we would ever remove it.

In order to merge this I think we should probably try loading snapshots with this branch on each cluster to see if there are some ancient invalid vote accounts in there. Does that make sense?

Yep this makes sense. As discussed I've added a custom deserializer that discards invalid vote accounts 2dba7d1

For extra peace of mind I've already checked that there are no invalid accounts in mnb, and I'm going to test the other clusters too.

jstarry · 2024-08-23T02:08:25Z

runtime/src/bank.rs

@@ -2280,12 +2280,8 @@ impl Bank {
                invalid_vote_keys.insert(vote_pubkey, InvalidCacheEntryReason::WrongOwner);
                return None;
            }
-            let Ok(vote_state) = vote_account.vote_state().cloned() else {


nit: in the get_vote_account closure above we now have a redundant check for vote owner and deserializability since that's exactly what VoteAccount::try_from is doing now. There are some other redundant checks scattered around for things like vote owner as well. Can be cleaned up in a follow up.

let account = self.get_account_with_fixed_root(vote_pubkey)?; if account.owner() == &solana_vote_program && VoteState::deserialize(account.data()).is_ok() { vote_accounts_cache_miss_count.fetch_add(1, Relaxed); } VoteAccount::try_from(account).ok()

This code has been dead for a long time I think. We check that VoteAccounts contains all vote accounts at snapshot load time. We have the vote_accounts_cache_miss_count metric that's never been non-zero anytime I've looked since I started this cursed work.

jstarry · 2024-08-23T02:13:47Z

runtime/src/epoch_stakes.rs

@@ -101,20 +101,6 @@ impl EpochStakes {
            .iter()
            .filter_map(|(key, (stake, account))| {
                let vote_state = account.vote_state();
-                let vote_state = match vote_state.as_ref() {


A few lines above we would have accumulated total_stake for invalid vote accounts if they were in the epoch vote accounts map. After this change, invalid vote accounts can't be in the epoch vote accounts map because wouldn't deserialize them from a snapshot and we would never insert them into the stakes cache. So this change might be fixing a bug if there actually was an invalid vote account in the epoch vote accounts.

jstarry · 2024-08-23T02:35:34Z

vote/src/vote_account.rs

@@ -303,8 +320,8 @@ impl TryFrom<AccountSharedData> for VoteAccountInner {
            return Err(Error::InvalidOwner(*account.owner()));
        }
        Ok(Self {
+            vote_state: VoteState::deserialize(account.data()).map_err(Error::InstructionError)?,


Main concern I had during review was around deserializing VoteAccounts (done when deserializing bank stakes or epoch stakes) because after this PR we would now reject snapshots that contain vote accounts that aren't deserializable. This is fine as long as only deserializable vote accounts were ever stored in the stakes cache / epoch stakes. In the current code it's not possible to store a vote account that's not deserializable because the stakes cache wouldn't insert it. But let's say though that in the past we did store a vote account which was not deserializable, it looks like it would have only been removed by handle_invalid_keys (called downstream of update_rewards_with_thread_pool) if there was a stake account delegated to it because _load_vote_and_stake_accounts looks for any vote keys for stake delegations that correspond to invalid vote accounts. But if there are no stake delegations to an invalid vote account in the stakes cache, it doesn't look like we would ever remove it.

In order to merge this I think we should probably try loading snapshots with this branch on each cluster to see if there are some ancient invalid vote accounts in there. Does that make sense?

alessandrod · 2024-08-23T13:22:36Z

List of all VoteAccount::try_from() callers

StakesCache

agave/runtime/src/stakes.rs

Line 99 in f1561f4

match VoteAccount::try_from(account.to_account_shared_data()) {
agave/runtime/src/stakes.rs

Line 274 in f1561f4

&& VoteAccount::try_from(account.clone()).is_ok()

These are guarded by VoteVersions::is_correct_size_and_initialized. This PR doesn't change behavior since VoteAccount::try_from doesn't get called at all for invalid vote accounts.

Bank

agave/runtime/src/bank.rs

Line 2271 in f1561f4

VoteAccount::try_from(account).ok()

this was dead code. The metric incremented here is always 0. I do think that it's possible that this code is now un-dead since we don't put invalid votes in cached_vote_accounts, but in any case the result is that before and after the PR, if the vote is invalid make_vote_delegations_entry returns None. Maybe I should remove this as part of this PR?
agave/runtime/src/bank/partitioned_epoch_rewards/calculation.rs

Line 360 in f1561f4

VoteAccount::try_from(account).ok()

here we used to skip and return None in case of invalid account. Now we do the same.
agave/runtime/src/bank/partitioned_epoch_rewards/calculation.rs

Line 496 in f1561f4

VoteAccount::try_from(account).ok()

calculate_reward_points_partitioned used to return 0 on invalid vote, still returns 0
agave/vote/src/vote_account.rs

Line 100 in 356d779

match VoteAccount::try_from(account) {

this is new. If we see an invalid vote in a snapshot we skip it.

alessandrod · 2024-08-23T13:26:58Z

Another data point:

git show HEAD^ | grep warn | wc -l
7

I've grepped for the warnings printed (and removed by this PR) in case of invalid votes on a validator that's been running for ~3 weeks, and I get 0 hits.

Before switching to eager vote account deserialization we were (accidentally) able to load snapshots with invalid vote accounts (because account parsing was deferred). This change ensures that we're still able to parse such snapshots.

vote/src/vote_account.rs

VoteAccount is never deserialized individually, but only as part of VoteAccounts, which has a custom deser that doesn't require VoteAccount to implement Deserialize

alessandrod · 2024-08-24T04:13:21Z

For extra peace of mind I've already checked that there are no invalid accounts in mnb, and I'm going to test the other clusters too.

testnet and devnet have no invalid accounts either

jstarry · 2024-08-24T04:57:54Z

agave/runtime/src/bank.rs

Line 2271 in f1561f4

VoteAccount::try_from(account).ok()

this was dead code. The metric incremented here is always 0. I do think that it's possible that this code is now un-dead since we don't put invalid votes in cached_vote_accounts, but in any case the result is that before and after the PR, if the vote is invalid make_vote_delegations_entry returns None. Maybe I should remove this as part of this PR?

This should be a follow up. Note that this metric wasn't checking for initialized so it could have false positives when a stake is delegated to a zeroed vote-program owned account.

* VoteAccount: remove OnceLock around VoteState The lazy-init OnceLock logic wasn't really being used. Execution/perf wise this code doesn't change anything since we were already forcing early deserialization in StakesCache::check_and_store: - // Called to eagerly deserialize vote state - let _res = vote_account.vote_state(); For more context see https://discord.com/channels/428295358100013066/439194979856809985/1268759289531596872 https://discord.com/channels/428295358100013066/439194979856809985/1272661616399224892 * VoteAccounts: discard invalid accounts on deserialization Before switching to eager vote account deserialization we were (accidentally) able to load snapshots with invalid vote accounts (because account parsing was deferred). This change ensures that we're still able to parse such snapshots. * VoteAccount: remove Deserialize impl VoteAccount is never deserialized individually, but only as part of VoteAccounts, which has a custom deser that doesn't require VoteAccount to implement Deserialize

alessandrod force-pushed the vote-account-eager branch 2 times, most recently from 9f16085 to 3dac136 Compare August 19, 2024 16:45

alessandrod force-pushed the vote-account-eager branch 5 times, most recently from 771c631 to a93bec3 Compare August 20, 2024 08:33

alessandrod force-pushed the vote-account-eager branch 2 times, most recently from 63b2da0 to ece636d Compare August 20, 2024 09:55

alessandrod force-pushed the vote-account-eager branch 2 times, most recently from d93b53f to 3415feb Compare August 21, 2024 13:40

alessandrod commented Aug 21, 2024

View reviewed changes

alessandrod requested review from behzadnouri and jstarry August 21, 2024 15:45

alessandrod force-pushed the vote-account-eager branch from 3415feb to cca25f6 Compare August 21, 2024 15:55

alessandrod force-pushed the vote-account-eager branch from cca25f6 to f1561f4 Compare August 22, 2024 00:12

jstarry previously approved these changes Aug 22, 2024

View reviewed changes

runtime/src/stakes.rs Show resolved Hide resolved

vote/src/vote_account.rs Show resolved Hide resolved

behzadnouri reviewed Aug 22, 2024

View reviewed changes

jstarry reviewed Aug 22, 2024

View reviewed changes

jstarry reviewed Aug 23, 2024

View reviewed changes

alessandrod dismissed jstarry’s stale review via 356d779 August 23, 2024 12:32

alessandrod force-pushed the vote-account-eager branch 3 times, most recently from 1e1a56b to 9c9e2cc Compare August 23, 2024 14:22

alessandrod force-pushed the vote-account-eager branch from 9c9e2cc to 1db912c Compare August 23, 2024 14:43

alessandrod force-pushed the vote-account-eager branch from 1db912c to 2dba7d1 Compare August 23, 2024 16:02

jstarry reviewed Aug 24, 2024

View reviewed changes

vote/src/vote_account.rs Show resolved Hide resolved

VoteAccount: remove Deserialize impl

1978c05

VoteAccount is never deserialized individually, but only as part of VoteAccounts, which has a custom deser that doesn't require VoteAccount to implement Deserialize

jstarry approved these changes Aug 24, 2024

View reviewed changes

behzadnouri approved these changes Aug 24, 2024

View reviewed changes

alessandrod merged commit 42e72bf into anza-xyz:master Aug 24, 2024
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VoteAccount: remove OnceLock around VoteState #2659

VoteAccount: remove OnceLock around VoteState #2659

alessandrod commented Aug 19, 2024 •

edited

Loading

mergify bot commented Aug 19, 2024

mergify bot commented Aug 20, 2024

jumpsiegel commented Aug 20, 2024

alessandrod Aug 21, 2024 •

edited

Loading

jstarry left a comment

behzadnouri Aug 22, 2024

alessandrod Aug 22, 2024

jstarry Aug 22, 2024

jstarry Aug 22, 2024

jstarry Aug 23, 2024

alessandrod Aug 23, 2024 •

edited

Loading

jstarry Aug 23, 2024

alessandrod Aug 23, 2024 •

edited

Loading

jstarry Aug 23, 2024 •

edited

Loading

jstarry Aug 23, 2024

alessandrod commented Aug 23, 2024 •

edited

Loading

alessandrod commented Aug 23, 2024

alessandrod commented Aug 24, 2024 •

edited

Loading

jstarry commented Aug 24, 2024

VoteAccount: remove OnceLock around VoteState #2659

VoteAccount: remove OnceLock around VoteState #2659

Conversation

alessandrod commented Aug 19, 2024 • edited Loading

mergify bot commented Aug 19, 2024

mergify bot commented Aug 20, 2024

jumpsiegel commented Aug 20, 2024

alessandrod Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

jstarry left a comment

Choose a reason for hiding this comment

behzadnouri Aug 22, 2024

Choose a reason for hiding this comment

alessandrod Aug 22, 2024

Choose a reason for hiding this comment

jstarry Aug 22, 2024

Choose a reason for hiding this comment

jstarry Aug 22, 2024

Choose a reason for hiding this comment

jstarry Aug 23, 2024

Choose a reason for hiding this comment

alessandrod Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

jstarry Aug 23, 2024

Choose a reason for hiding this comment

alessandrod Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

jstarry Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

jstarry Aug 23, 2024

Choose a reason for hiding this comment

alessandrod commented Aug 23, 2024 • edited Loading

alessandrod commented Aug 23, 2024

alessandrod commented Aug 24, 2024 • edited Loading

jstarry commented Aug 24, 2024

alessandrod commented Aug 19, 2024 •

edited

Loading

alessandrod Aug 21, 2024 •

edited

Loading

alessandrod Aug 23, 2024 •

edited

Loading

alessandrod Aug 23, 2024 •

edited

Loading

jstarry Aug 23, 2024 •

edited

Loading

alessandrod commented Aug 23, 2024 •

edited

Loading

alessandrod commented Aug 24, 2024 •

edited

Loading