Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce VoteState::deserialize_into_uninit #2272

Merged

Conversation

alessandrod
Copy link

@alessandrod alessandrod commented Jul 24, 2024

Deserializing into MaybeUninit<VoteState> saves the extra cost of initializing into a value initialized with VoteState::default().

I am planning to use this to speed up VoteAccount::try_from() in a followup PR.

@alessandrod alessandrod requested a review from 2501babe July 24, 2024 13:59
@alessandrod alessandrod marked this pull request as draft July 24, 2024 16:24
@alessandrod alessandrod changed the title VoteState::deserialize_into: take impl Into<*mut VoteState> VoteState::deserialize_into: take MaybeUninit<VoteState> Jul 27, 2024
@alessandrod alessandrod marked this pull request as ready for review July 27, 2024 03:28
@alessandrod alessandrod force-pushed the vote-state-deserialize-pointer branch 2 times, most recently from a716f0e to 460566b Compare July 27, 2024 03:32
@alessandrod
Copy link
Author

self reminder: add SAFETY markers to all unsafe blocks

@alessandrod
Copy link
Author

alessandrod commented Jul 27, 2024

This PR breaks the API (the breakage is warranted IMO), and now SPL tests are failing. But this is chicken and egg? How do I land this PR if the SPL tests fail, but how do I fix the SPL tests for the new API if I can't land this?

Maybe I could add a new VoteState::deserialize_into_uninit, land, deprecate VoteState::deserialize_into then rename deserialize_into_uninit to deserialize_into?

@alessandrod
Copy link
Author

Maybe I could add a new VoteState::deserialize_into_uninit

I've done this now

@alessandrod alessandrod changed the title VoteState::deserialize_into: take MaybeUninit<VoteState> Introduce VoteState::deserialize_into_uninit Jul 27, 2024
@alessandrod alessandrod force-pushed the vote-state-deserialize-pointer branch 2 times, most recently from 02b33bb to e00e03b Compare July 29, 2024 13:11
@2501babe
Copy link
Member

had to read the docs on MaybeUninit<T> and addr_of_mut, am i correct in thinking:

  • the unsafe writes are safe because the typechecker ensures anything passed as vote_state: &mut MaybeUninit<VoteState>, is properly sized (ie protected from out of bounds writes)?
  • furthermore all unsafe writes are to the Sized fields of the struct only. eg you allocate the dynamic vectors normally and merely write the pointer/capacity/length tripe into VoteState?
  • because all the addresses are given to write via field references, it doesnt matter that the structs are repr(Rust)?

just want to be sure i understand everything right since deserialize_into is meant to be used in bpf programs, but seems like this is a "safe as long as the caller behaves" and doesnt depend on the input buffer at all

@alessandrod
Copy link
Author

alessandrod commented Jul 29, 2024

had to read the docs on MaybeUninit<T> and addr_of_mut, am i correct in thinking:

* the `unsafe` writes are safe because the typechecker ensures anything passed as `vote_state: &mut MaybeUninit<VoteState>,` is properly sized (ie protected from out of bounds writes)?

Correct! Both in the deserialize_into() and deserialize_into_uninit() cases the type checker checks that the memory layout is correct; since we call deserialize_into_ptr from those, we can guarantee that the pointer passed is always correct.

* furthermore all `unsafe` writes are to the `Sized` fields of the `struct` only. eg you allocate the dynamic vectors normally and merely write the pointer/capacity/length tripe into `VoteState`?

Correct.

* because all the addresses are given to `write` via field references, it doesnt matter that the structs are `repr(Rust)`?

write() doesn't care about type layout, it only cares about the type. So if you have let ptr: *mut T you can always do let v = ..somehow create a value of type T...; ptr.write(v) as long as ptr is a valid pointer. What makes write() unique is that it doesn't drop the previous value, eg consider:

#[derive(Debug)]
struct A {
    msg: String
}

fn main() {
    let mut a: std::mem::MaybeUninit<A> = std::mem::MaybeUninit::uninit();
    let a = unsafe {
        let a_ptr = a.as_mut_ptr();
        // this is undefined behaviour and likely segfaults, since it drops the previous `msg` but it hasn't been allocated yet so it points to a random address
        // *std::ptr::addr_of_mut!((*a_ptr).msg) = String::from("hello");
        // this is correct
        std::ptr::addr_of_mut!((*a_ptr).msg).write(String::from("hello"));
        a.assume_init()
    };
    eprintln!("{a:?}")
}

just want to be sure i understand everything right since deserialize_into is meant to be used in bpf programs, but seems like this is a "safe as long as the caller behaves" and doesnt depend on the input buffer at all

Yes it's meant to always be safe as long as you use safe rust, independent from what's inside the buffer.

@alessandrod
Copy link
Author

just want to be sure i understand everything right since deserialize_into is meant to be used in bpf programs, but seems like this is a "safe as long as the caller behaves" and doesnt depend on the input buffer at all

also mind that I haven't added a test, but this fixes deserialization for the following case:

let mut vote_state = VoteState::default();
VoteState::deserialize_into(data, &mut vote_state).unwrap();
VoteState::deserialize_into(data, &mut vote_state).unwrap();

In master this will incorrectly append votes, authorized_voters etc twice. With this PR the 2nd deserialize_into resets the passed &mut VoteState before deserializing, so fields don't get "duplicated".

@alessandrod alessandrod force-pushed the vote-state-deserialize-pointer branch from e00e03b to d78c70b Compare July 29, 2024 14:01
@alessandrod alessandrod requested a review from ryoqun July 29, 2024 14:13
@alessandrod
Copy link
Author

@ryoqun just tagged you too for review to double check the unsafe bits if you can

Copy link

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still reviewing the rest, this was the first thing that caught my eye.

sdk/program/src/vote/state/mod.rs Show resolved Hide resolved
@alessandrod alessandrod force-pushed the vote-state-deserialize-pointer branch from f0c7d92 to 38be6b8 Compare July 30, 2024 15:24
@brooksprumo brooksprumo self-requested a review July 30, 2024 15:48
std::ptr::drop_in_place(vote_state);
}

VoteState::deserialize_into_ptr(input, vote_state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Self::deserialzie_into_ptr()? (i.e. can use Self).


match cursor.fill_buf() {
Ok(buf) if buf.len() >= PUBKEY_SIZE => {
// Safety: `buf` is guaranteed to be at least `PUBKEY_SIZE` bytes long
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about commenting that this safety also rely on the fact align = 1 for Pubkey?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ryoqun
Copy link
Member

ryoqun commented Jul 31, 2024

fyi, it looks like miri passes: f9e974f (need to severely reduce the loop counts due to miri being too slow.. lol)

$ cargo miri test -p solana-program vote -- --skip compact
Preparing a sysroot for Miri (target: x86_64-unknown-linux-gnu)... done
   Compiling solana-program v2.1.0 (/home/ryoqun/work/solana/for-small-patches2/sdk/program)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.47s
     Running unittests src/lib.rs (target/miri/x86_64-unknown-linux-gnu/debug/deps/solana_program-e0bef1dfcfb6cd11)

running 22 tests
test vote::error::tests::test_custom_error_decode ... ok
test vote::program::test_id ... ok
test vote::state::tests::test_authorized_voter_is_locked_within_epoch ... ok
test vote::state::tests::test_circbuf_oob ... ok
test vote::state::tests::test_default_vote_state_is_uninitialized ... ok
test vote::state::tests::test_get_and_update_authorized_voter ... ok
test vote::state::tests::test_is_correct_size_and_initialized ... ok
test vote::state::tests::test_minimum_balance ... ok
test vote::state::tests::test_set_new_authorized_voter ... ok
test vote::state::tests::test_vote_deserialize_into_uninit ... ok
test vote::state::tests::test_vote_deserialize_into_uninit_ill_sized ... ok
test vote::state::tests::test_vote_deserialize_into_uninit_nopanic ... ok
test vote::state::tests::test_vote_process_timestamp ... ok
test vote::state::tests::test_vote_serialize ... ok
test vote::state::tests::test_vote_state_commission_split ... ok
test vote::state::tests::test_vote_state_epoch0_no_credits ... ok
test vote::state::tests::test_vote_state_epoch_credits ... ok
test vote::state::tests::test_vote_state_increment_credits ... ok
test vote::state::tests::test_vote_state_max_size ... ok
test vote::state::tests::test_vote_state_size_of ... ok
test vote::state::vote_state_0_23_5::tests::test_vote_deserialize_0_23_5 ... ok
test vote::state::vote_state_1_14_11::tests::test_vote_deserialize_1_14_11 ... ok

test result: ok. 22 passed; 0 failed; 0 ignored; 0 measured; 232 filtered out; finished in 164.07s

@alessandrod alessandrod force-pushed the vote-state-deserialize-pointer branch from 354d61f to 04cb1c3 Compare July 31, 2024 11:03
vote_state: &mut VoteState,
) -> Result<(), InstructionError> {
let epoch_credit_count = read_u64(cursor)?;
) -> Result<Vec<(u64, u64, u64)>, InstructionError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the first u64 is Epoch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cursor: &mut Cursor<&[u8]>,
pubkey: *mut Pubkey,
) -> Result<(), InstructionError> {
const PUBKEY_SIZE: usize = mem::size_of::<Pubkey>();
Copy link
Member

@ryoqun ryoqun Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: strictly speaking, i think solana_program::pubkey::PUBKEY_BYTES is more appropriate than this because this code is dealing with reading N bytes from the buffer. however, mem::size_of::<Pubkey> == PUBKEY_BYTES (no padding or whatsoever). and i don't see any possibility of diverting the equality in foreseeable future. so quite minor thing ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 507 to 526
// Safety:
//
// Deserialize failed so at this point vote_state is uninitialized. We must write a
// new _valid_ value into it or after returning from this function the caller is
// left with an uninitialized `&mut VoteState`, which is UB (references must always
// be valid).
//
// This is always safe and doesn't leak memory because deserialize_into_ptr() writes
// into the fields that heap alloc only when it returns Ok().
unsafe {
vote_state.write(VoteState::default());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks okay.

nit: however, for completeness, similar cleaning is desired when panicking. that said, i couldn't think of panic arising from VoteState::deserialize_into_ptr(). so, just mentioning about no panic will be enough.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep nice catch, fixed the panic case too

@ryoqun
Copy link
Member

ryoqun commented Jul 31, 2024

deserialize_into is meant to be used in bpf programs

just curious but i wonder how the CUs is reduced after this patch to save kittens. :)

@alessandrod
Copy link
Author

deserialize_into is meant to be used in bpf programs

just curious but i wonder how the CUs is reduced after this patch to save kittens. :)

that'd be interesting, but it's not why I'm doing this. I'm doing this to fix VoteAccount::try_from:

before: test bench_vote_account ... bench:       5,214 ns/iter (+/- 1,102)
after: test bench_vote_account ... bench:         326 ns/iter (+/- 85)

Deserializing into MaybeUninit<VoteState> saves the extra cost of
initializing into a value initialized with VoteState::default()
Introduce new ::deserialize_into_uninit that takes
MaybeUninit<VoteState>. This way we don't break API, and we can less
abruptly deprecate ::deserialize_into and later on rename
::deserialize_into_uninit to ::deserialize_into.
…eserializing

This is needed to deallocate votes, authorized_voters and epoch_credits
- explain why addr_of_mut!() is needed
- minimize code inside unsafe blocks
- tweak type definition in pointer cast
Avoids a (small) memcpy per key
On failure we must ensure that `vote_state` is left in a valid state to
avoid UB.
@alessandrod alessandrod force-pushed the vote-state-deserialize-pointer branch from 6e3333c to 339ce8c Compare August 1, 2024 02:18
@alessandrod alessandrod force-pushed the vote-state-deserialize-pointer branch from 339ce8c to 5651ee3 Compare August 1, 2024 02:26
brooksprumo
brooksprumo previously approved these changes Aug 1, 2024
Copy link

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

ryoqun
ryoqun previously approved these changes Aug 1, 2024
Copy link
Member

@ryoqun ryoqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with nits.

the bench numbers looks promising. looking forward to see a follow-up pr for actual perf gains.

@alessandrod alessandrod dismissed stale reviews from ryoqun and brooksprumo via f51bc19 August 1, 2024 16:45
Copy link
Member

@ryoqun ryoqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with no nit (really thanks for addressing all the review comments)

@alessandrod alessandrod merged commit e32e126 into anza-xyz:master Aug 2, 2024
52 checks passed
@alessandrod
Copy link
Author

thanks a lot for the thorough reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants