Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VoteAccount: remove OnceLock around VoteState #2659

Merged
merged 3 commits into from
Aug 24, 2024

Conversation

alessandrod
Copy link

@alessandrod alessandrod commented Aug 19, 2024

The lazy-init OnceLock logic wasn't really being used. Execution/perf wise this code doesn't change anything since we were already forcing early deserialization in StakesCache::check_and_store:

-                            // Called to eagerly deserialize vote state
-                            let _res = vote_account.vote_state();

For more context see
https://discord.com/channels/428295358100013066/439194979856809985/1268759289531596872 https://discord.com/channels/428295358100013066/439194979856809985/1272661616399224892

@alessandrod alessandrod force-pushed the vote-account-eager branch 2 times, most recently from 9f16085 to 3dac136 Compare August 19, 2024 16:45
Copy link

mergify bot commented Aug 19, 2024

If this PR represents a change to the public RPC API:

  1. Make sure it includes a complementary update to rpc-client/ (example)
  2. Open a follow-up PR to update the JavaScript client @solana/web3.js (example)

Thank you for keeping the RPC clients in sync with the server API @alessandrod.

@alessandrod alessandrod force-pushed the vote-account-eager branch 5 times, most recently from 771c631 to a93bec3 Compare August 20, 2024 08:33
Copy link

mergify bot commented Aug 20, 2024

The Firedancer team maintains a line-for-line reimplementation of the
native programs, and until native programs are moved to BPF, those
implementations must exactly match their Agave counterparts.
If this PR represents a change to a native program implementation (not
tests), please include a reviewer from the Firedancer team. And please
keep refactors to a minimum.

@alessandrod alessandrod force-pushed the vote-account-eager branch 2 times, most recently from 63b2da0 to ece636d Compare August 20, 2024 09:55
@jumpsiegel
Copy link

This is good with FD

@alessandrod alessandrod force-pushed the vote-account-eager branch 2 times, most recently from d93b53f to 3415feb Compare August 21, 2024 13:40
} else {
account.vote_state()
};
if let Ok(vote_state) = vote_state {
Copy link
Author

@alessandrod alessandrod Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a bit noisy because there are a lot of changes like this: vote_state() used to return Result, but votes that failed to be parsed were never inserted into VoteAccounts. So effectively the Err branches were never entered, this PR removes them, but there's no actual logic change.

The lazy-init OnceLock logic wasn't really being used. Execution/perf
wise this code doesn't change anything since we were already forcing early
deserialization in StakesCache::check_and_store:

-                            // Called to eagerly deserialize vote state
-                            let _res = vote_account.vote_state();

For more context see
https://discord.com/channels/428295358100013066/439194979856809985/1268759289531596872
https://discord.com/channels/428295358100013066/439194979856809985/1272661616399224892
jstarry
jstarry previously approved these changes Aug 22, 2024
Copy link

@jstarry jstarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look solid, I just prefer other cleanup to be separate from the main PR if you don't mind.

runtime/src/stakes.rs Show resolved Hide resolved
vote/src/vote_account.rs Show resolved Hide resolved
@@ -34,7 +34,7 @@ pub enum Error {
#[derive(Debug)]
struct VoteAccountInner {
account: AccountSharedData,
vote_state: OnceLock<Result<VoteState, Error>>,
vote_state: VoteState,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contrast to the PR description, this change is not just removing the OnceLock but also dropping Result from the type.

These two changes should definitely have gone in separate PRs.
We really do not gain anything from mixing several independent changes into one PR, other than making it more likely that something subtle is missed out in the review or testing.

Dropping Result is indeed a functional change because now VoteAccount::try_from(account_shared_data) can fail in the ways that the previous code was not and this could have downstream consequences.
Is this functional change fine? I don't know, and having the change left out of the PR description and mixed in by other independent code changes does not help with scrutinizing that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two changes should definitely have gone in separate PRs.

Once again - we're at 3 PRs out of 3 - a dismissive and condescending initial comment that makes me wonder if you've spent more than 10 seconds looking at the code. I invite you to think about what you're suggesting would have looked like:

  • 1 PR with a ~5 line change removing OnceLock from the type declaration and the corresponding OnceLock::new() to initialize
  • A second PR with all the changes in this PR minus the 5 lines above

So are you saying that you're not able to follow the bulk of the changes because of 5 lines?

Also it would be great if you could avoid saying "definitely". To me it makes absolutely zero sense to split that way. The whole reason for having Result is that OnceLock forced us to have Result. If OnceLock goes, Result has absolutely no reason for being there anymore. Just removing OnceLock but not Result makes no sense. I'm not doing this change in a vacuum - I could care less about OnceLock - I need to put VoteState inside VoteAccountInner so I can deserializa in place. Maybe I should have added that code in this PR too to communicate the intent better and make reviewing easier! (In fact I did initially, but then I anticipated this conversation with you).

I understand that you seem to have a preference for micro PRs. I don't. In fact I hate when PRs aren't atomic and contain all the logical changes that a changeset is trying to make, because then using the history to understand what some code does, which is something I all day every day, becomes harder.

Dropping Result is indeed a functional change because now VoteAccount::try_from(account_shared_data) can fail in the ways that the previous code was not and this could have downstream consequences.
Is this functional change fine? I don't know

Did you read this? #2659 (review) and this #2271 (comment) ?

You don't know because you're not familiar with the code, or you're not convinced by the claim made above?

Reviews are hard work. I'd love for you to review this and the other PRs I tagged you in. If you don't have time or will to do it that's fine too, but then don't ask me to tag you for review.

@@ -303,8 +320,8 @@ impl TryFrom<AccountSharedData> for VoteAccountInner {
return Err(Error::InvalidOwner(*account.owner()));
}
Ok(Self {
vote_state: VoteState::deserialize(account.data()).map_err(Error::InstructionError)?,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behzadnouri brings up a great point that this deserialization could potentially fail in one of the call-sites (Stakes cache I feel fine with but have to dig into rewards code more carefully). I agree with him that it would be best to have this PR store the Result and follow up with another PR that we can carefully review all the call-sites of the TryFrom in.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you decide to not split this up (honestly fine with me), I'll do another pass tomorrow after another more careful review. I do think the OnceLock approach caused us to introduce the Result in the first place so I can see why you coupled that change into this PR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main concern I had during review was around deserializing VoteAccounts (done when deserializing bank stakes or epoch stakes) because after this PR we would now reject snapshots that contain vote accounts that aren't deserializable. This is fine as long as only deserializable vote accounts were ever stored in the stakes cache / epoch stakes. In the current code it's not possible to store a vote account that's not deserializable because the stakes cache wouldn't insert it. But let's say though that in the past we did store a vote account which was not deserializable, it looks like it would have only been removed by handle_invalid_keys (called downstream of update_rewards_with_thread_pool) if there was a stake account delegated to it because _load_vote_and_stake_accounts looks for any vote keys for stake delegations that correspond to invalid vote accounts. But if there are no stake delegations to an invalid vote account in the stakes cache, it doesn't look like we would ever remove it.

In order to merge this I think we should probably try loading snapshots with this branch on each cluster to see if there are some ancient invalid vote accounts in there. Does that make sense?

Copy link
Author

@alessandrod alessandrod Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this makes sense. As discussed I've added a custom deserializer that discards invalid vote accounts 2dba7d1

For extra peace of mind I've already checked that there are no invalid accounts in mnb, and I'm going to test the other clusters too.

@@ -2280,12 +2280,8 @@ impl Bank {
invalid_vote_keys.insert(vote_pubkey, InvalidCacheEntryReason::WrongOwner);
return None;
}
let Ok(vote_state) = vote_account.vote_state().cloned() else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in the get_vote_account closure above we now have a redundant check for vote owner and deserializability since that's exactly what VoteAccount::try_from is doing now. There are some other redundant checks scattered around for things like vote owner as well. Can be cleaned up in a follow up.

            let account = self.get_account_with_fixed_root(vote_pubkey)?;
            if account.owner() == &solana_vote_program
                && VoteState::deserialize(account.data()).is_ok()
            {
                vote_accounts_cache_miss_count.fetch_add(1, Relaxed);
            }
            VoteAccount::try_from(account).ok()

Copy link
Author

@alessandrod alessandrod Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has been dead for a long time I think. We check that VoteAccounts contains all vote accounts at snapshot load time. We have the vote_accounts_cache_miss_count metric that's never been non-zero anytime I've looked since I started this cursed work.

@@ -101,20 +101,6 @@ impl EpochStakes {
.iter()
.filter_map(|(key, (stake, account))| {
let vote_state = account.vote_state();
let vote_state = match vote_state.as_ref() {
Copy link

@jstarry jstarry Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines above we would have accumulated total_stake for invalid vote accounts if they were in the epoch vote accounts map. After this change, invalid vote accounts can't be in the epoch vote accounts map because wouldn't deserialize them from a snapshot and we would never insert them into the stakes cache. So this change might be fixing a bug if there actually was an invalid vote account in the epoch vote accounts.

@@ -303,8 +320,8 @@ impl TryFrom<AccountSharedData> for VoteAccountInner {
return Err(Error::InvalidOwner(*account.owner()));
}
Ok(Self {
vote_state: VoteState::deserialize(account.data()).map_err(Error::InstructionError)?,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main concern I had during review was around deserializing VoteAccounts (done when deserializing bank stakes or epoch stakes) because after this PR we would now reject snapshots that contain vote accounts that aren't deserializable. This is fine as long as only deserializable vote accounts were ever stored in the stakes cache / epoch stakes. In the current code it's not possible to store a vote account that's not deserializable because the stakes cache wouldn't insert it. But let's say though that in the past we did store a vote account which was not deserializable, it looks like it would have only been removed by handle_invalid_keys (called downstream of update_rewards_with_thread_pool) if there was a stake account delegated to it because _load_vote_and_stake_accounts looks for any vote keys for stake delegations that correspond to invalid vote accounts. But if there are no stake delegations to an invalid vote account in the stakes cache, it doesn't look like we would ever remove it.

In order to merge this I think we should probably try loading snapshots with this branch on each cluster to see if there are some ancient invalid vote accounts in there. Does that make sense?

@alessandrod
Copy link
Author

alessandrod commented Aug 23, 2024

List of all VoteAccount::try_from() callers

StakesCache

These are guarded by VoteVersions::is_correct_size_and_initialized. This PR doesn't change behavior since VoteAccount::try_from doesn't get called at all for invalid vote accounts.

Bank

@alessandrod
Copy link
Author

Another data point:

git show HEAD^ | grep warn | wc -l
7

I've grepped for the warnings printed (and removed by this PR) in case of invalid votes on a validator that's been running for ~3 weeks, and I get 0 hits.

@alessandrod alessandrod force-pushed the vote-account-eager branch 3 times, most recently from 1e1a56b to 9c9e2cc Compare August 23, 2024 14:22
Before switching to eager vote account deserialization we were
(accidentally) able to load snapshots with invalid vote accounts
(because account parsing was deferred).

This change ensures that we're still able to parse such snapshots.
VoteAccount is never deserialized individually, but only as part of
VoteAccounts, which has a custom deser that doesn't require VoteAccount
to implement Deserialize
@alessandrod
Copy link
Author

alessandrod commented Aug 24, 2024

For extra peace of mind I've already checked that there are no invalid accounts in mnb, and I'm going to test the other clusters too.

testnet and devnet have no invalid accounts either

@jstarry
Copy link

jstarry commented Aug 24, 2024

  • VoteAccount::try_from(account).ok()
    this was dead code. The metric incremented here is always 0. I do think that it's possible that this code is now un-dead since we don't put invalid votes in cached_vote_accounts, but in any case the result is that before and after the PR, if the vote is invalid make_vote_delegations_entry returns None. Maybe I should remove this as part of this PR?

This should be a follow up. Note that this metric wasn't checking for initialized so it could have false positives when a stake is delegated to a zeroed vote-program owned account.

@alessandrod alessandrod merged commit 42e72bf into anza-xyz:master Aug 24, 2024
51 checks passed
ray-kast pushed a commit to abklabs/agave that referenced this pull request Nov 27, 2024
* VoteAccount: remove OnceLock around VoteState

The lazy-init OnceLock logic wasn't really being used. Execution/perf
wise this code doesn't change anything since we were already forcing early
deserialization in StakesCache::check_and_store:

-                            // Called to eagerly deserialize vote state
-                            let _res = vote_account.vote_state();

For more context see
https://discord.com/channels/428295358100013066/439194979856809985/1268759289531596872
https://discord.com/channels/428295358100013066/439194979856809985/1272661616399224892

* VoteAccounts: discard invalid accounts on deserialization

Before switching to eager vote account deserialization we were
(accidentally) able to load snapshots with invalid vote accounts
(because account parsing was deferred).

This change ensures that we're still able to parse such snapshots.

* VoteAccount: remove Deserialize impl

VoteAccount is never deserialized individually, but only as part of
VoteAccounts, which has a custom deser that doesn't require VoteAccount
to implement Deserialize
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants