SIMD-0186: Loaded Transaction Data Size Specification #186

2501babe · 2024-10-21T04:52:58Z

precisely define how transaction data sizes are calculated

apfitzge · 2024-10-21T13:37:17Z

proposals/0186-transaction-data-size-specification.md

+must be loaded, including any programs it may call. The amount of data a
+transaction is allowed to load is capped, and if it exceeds that limit, loading
+is aborted. This functionality is already implemented in the validator. The
+purpose of this SIMD is to explicitly define how transaction size is calculated.


Think should be more precise here:
"transaction size" => "transaction account data size" or "transaction loaded data size"

apfitzge · 2024-10-21T13:44:48Z

proposals/0186-transaction-data-size-specification.md

+enshrine the current Agave behavior in the protocol. This is undesirable because
+the current behavior is highly idiosyncratic, and LoaderV3 program sizes are
+routinely undercounted.
+* Builtin programs are backed by accounts that only contain the program name as


it's also in the works to make (some/most) builtins just normal programs, so adding any special cases here would probably be nullified in the future anyway

apfitzge · 2024-10-21T13:45:47Z

proposals/0186-transaction-data-size-specification.md

+
+## Backwards Compatibility
+
+Transactions that call LoaderV3 programs via CPI and are extremely close to the


do you happen to know if there are any such programs on mnb?

ill edit this to be clearer, but its a property of the transaction, not the program

for instance if i have a transaction that normally uses 63 1mb accounts and a 64th account which is a 1mb program used for cpi, right now (ignoring loader sizes as not relevant) the total transaction size would be 63mb + 36 bytes because we incorrectly dont charge for the program data. with this change the calculated transaction size would become 63mb + 1mb + 36 bytes and the transaction would fail

it seems extremely unreasonable to me that any legitimate transaction would actually use >60mb of data, but i dont know how we would be able to find that out

shoutout to tao for help with metrics, looks like in the past 30 days grouped by hour, we saw one max size 110mb (?!) one 65mb one 57mb, three in the 40s, everything else 20s and below. change seems safe to me, this shouldnt break any real workflows. for point of comparison jupiter v6 is 3mb so with a 30mb transaction you would need to cpi into a dozen different jupiter-sized programs to even be near the limit

2501babe · 2024-10-24T21:24:58Z

@brooksprumo adding you as a second reviewer since youve done research on transaction sizes

2501babe · 2024-10-24T21:27:52Z

@topointon-jump also i would add you but it doesnt seem like github lets me add people not in the suggested list (unless this ping fixes that)

brooksprumo

Looks good to me. Please consider this approval as "soft", since I am not an expert in the whole runtime code. Defining and simplifying this counting of account data size is much appreciated.

brooksprumo · 2024-10-25T18:51:41Z

proposals/0186-transaction-data-size-specification.md

+* The size of a program owned by LoaderV3 may or may not include the size of its
+programdata depending on how the program account is used on the transaction.
+Programdata is also itself counted if included in the transaction accounts list.
+This means programdata may be counted zero, one, or two times per transaction.


Suggested change

This means programdata may be counted zero, one, or two times per transaction.

This means programdata may be counted zero, one, or two times per transaction.

🫠

2501babe · 2024-10-30T10:20:24Z

i added another existing issue in the current implementation to the problem description. the proposal itself is unchanged

apfitzge · 2024-10-30T18:47:41Z

proposals/0186-transaction-data-size-specification.md

+transaction is allowed to load is capped, and if it exceeds that limit, loading
+is aborted. This functionality is already implemented in the validator. The
+purpose of this SIMD is to explicitly define how loaded transaction data size is
+calculated.


Last 2 sentences here make it sound like the SIMD just documents the existing behavior.
Let's change the last sentence so its' clear there is a proposal for a new algorithm for calculating loaded account size.

apfitzge · 2024-10-30T18:54:55Z

proposals/0186-transaction-data-size-specification.md

+once and only once.
+2. A valid program owned by LoaderV3 also includes the size of its programdata.
+3. Other than point 2, no accounts are implicitly added to the total data size.
+


edge-case for consideration.

Does it matter if the account is used as a top-level or instruction account? Realistically we should not see these if people construct txs reasonably, but if an account is:

not the fee-payer

not passed to any instructions

not used a program

we could reasonably choose to not load it at all?

With the proposed algorithm we would still need to load it in order to find the size.

as-written it wouldnt matter, all accounts are treated the same

but thats an interesting idea, i wasnt thinking about account loading with this simd. i was only focused on how to arrive at a correct total given a set of accounts. but youre right, i cant think of a reason why we should load these kinds of accounts at all, other than the fact that we already do it now

if we do that (im in favor), there are several formulas that seem plausible to me

Option 1a: sum the sizes of the unique set of accounts instruction accounts + program ids + fee payer. any account in that set which is a LoaderV3 program account also includes its programdata

Option 1b: the same as above but programdata is only added to program size if the program is valid for execution in the current slot

Option 2: sum the sizes of the unique set of accounts instruction accounts + fee payer. add it to the sum of the sizes of the unique set of program ids, which include programdata sizes for LoaderV3 program ids

i think Option 1 is preferable. Option 2 fails to count cpi programs, but it sidesteps any issues with tombstoned programs because (after fee-only transactions... i say that a lot lately) the sizes dont matter, transactions that use them as program ids would fail anyway

Option 1a and 1b behave the same for closed programs since there is no programdata. they differ on handling programs which fail validation and programs that are changed in-slot. this is especially germaine to simd83 since transactions will be able to mess with programs in the same batch

all the options can double-count LoaderV3-owned accounts but i think this is appropriate, the compiled program and the account data if accessed in an instruction are basically their own separate things

im undecided between 1a and 1b. do you have an opinion?

Difference of a and b is just with valid programs v3 program's programdata?

A bit of a detail, but can we determine if it's valid or not without loading the programdata? In the general case** assuming not in cache already.

no, you need programdata because once the program account is deployed it never changes

1a is probably safer than 1b on reflection, i was thinking 1b is easy because we can just check program cache tombstones (and firedancer has to have something similar to conform) but the FailedVerification case means both our elf verifiers would need to produce identical results to maintain consensus

Yeah 1a seems better option to me in that case

Agreed on 1a. And I think I agree that it's best to double count programdata accounts if they are included in the ix account list since the SVM would effectively keep 2 copies in memory anyways.

jstarry · 2024-11-02T22:41:42Z

proposals/0186-transaction-data-size-specification.md

@@ -0,0 +1,152 @@
+---
+simd: '0186'
+title: Transaction Data Size Specification


This title reads as if the proposal specifies the size of a serialized transaction rather than size of the account data loaded for processing a transaction. How about Transaction Account Data Size Specification?

went with Loaded Transaction Data Size Specification

jstarry · 2024-11-02T22:54:04Z

proposals/0186-transaction-data-size-specification.md

+once and only once.
+2. A valid program owned by LoaderV3 also includes the size of its programdata.
+3. Other than point 2, no accounts are implicitly added to the total data size.
+


Agreed on 1a. And I think I agree that it's best to double count programdata accounts if they are included in the ix account list since the SVM would effectively keep 2 copies in memory anyways.

jstarry · 2024-11-04T15:53:20Z

proposals/0186-loaded-transaction-data-size-specification.md

+    * The fee-payer.
+2. Each account's size is determined solely by the byte length of its data prior
+to transaction execution.
+3. For any `LoaderV3` program account, add the size of the programdata account


nit: On the first read I missed the fact that any loaded account which happened to be a LoaderV3 program account would have its programdata account size counted as well. I assumed it was instruction programs that would be handled this way. Can you reword slightly to say something like:

For any loaded account which is identified as a LoaderV3 program account, add the size of the programdata account it references, regardless of whether it is directly invoked by a transaction instruction.

jstarry · 2024-11-04T15:56:24Z

proposals/0186-loaded-transaction-data-size-specification.md

+2. Each account's size is determined solely by the byte length of its data prior
+to transaction execution.


Should we include an overhead amount of bytes as well? There is still accounts db / svm overhead when loading a bunch of accounts with no data. Rent calculations use this constant:

pub const ACCOUNT_STORAGE_OVERHEAD: u64 = 128;

but the size of AccountSharedData is 64 bytes, so that seems like a reasonable value to use

I feel like we should include everything that is serialized, not just data.
This is the first i'm realizing the checks only count the data field's length.

@tao-stones any reason why we shouldn't just make this a loaded_size instead of loaded_data_size?

iirc, original motivation is primarily focus on memory footprint and how that chunk of memory is used, loading from accountdb and serializing wasn't discussed much then, it might make sense now

How about 96 bytes? From https://github.com/anza-xyz/agave/blob/4e7f7f76f453e126b171c800bbaca2cb28637535/programs/bpf_loader/src/serialization.rs#L417, I see we have to serialize 3 bytes for is_writable, is_signer, and is_executable, 4 bytes for original data length, 8 bytes each for data length, lamports, and rent_epoch, 32 bytes for the account_key, and 32 bytes for the owner and we adjust for alignment to the nearest multiple of 8. cc @Lichtso

Yes that is the size of the serialized account metadata. But have you thought about the resize / realloc padding (MAX_PERMITTED_DATA_INCREASE)?

Yes, serialization will stay ABIv1 in loader-v4.
ABIv2 is its own SIMD.

realloc headroom is quite different conceptually from "loaded data" tho isnt it? memory is reserved in case its needed, but no data is loaded

Sounds like we need a (loose?) definition for what loaded data is intended to mean. It could be the amount of bytes read from accounts db (in that case maybe programdata is only counted once?) or it could be roughly the amount of bytes loaded into memory before tx execution?

In either case I think it makes sense to include the account metadata overhead of approx. 64 bytes and exclude the overhead of the account key itself and the realloc buffer since they're not really loaded in any sense.

im equally fine counting programdata once or twice, and after this pr lets us refactor we should only have to actually load it once. main concerns are that the algorithm is described unambiguously, is very simple to implement correctly, and always counts programdata if you use the program

maybe a new definition (counting programdata once) like:

Given a transaction, take the unique set of account keys which are used as:

An instruction account.

A program ID for an instruction.

The fee-payer.

For all LoaderV3 program accounts, add the key of the programdata account it references to the set of account keys, if it does not already exist.

Each account's size is defined as the byte length of its data prior to transaction execution, plus 64 bytes to account for metadata. This is irrespective of how the account is used in the transaction.

The total transaction loaded account data size is the sum of these sizes.

I'm also fine either way, slight bias to counting it once now if we only need to load it once from accounts-db. Do we agree on 64 bytes overhead?

jstarry · 2024-11-04T15:58:26Z

proposals/0186-loaded-transaction-data-size-specification.md

+`ComputeBudgetInstruction::SetLoadedAccountsDataSizeLimit` instruction to define
+a data size limit for the transaction. Otherwise, the default limit is 64MiB


Might be worth calling out that the default limit of 64MiB is also the max limit. The limit can only be lowered right now

Suggested change

`ComputeBudgetInstruction::SetLoadedAccountsDataSizeLimit` instruction to define

a data size limit for the transaction. Otherwise, the default limit is 64MiB

`ComputeBudgetInstruction::SetLoadedAccountsDataSizeLimit` instruction to define

a lower data size limit for the transaction. Otherwise, the default limit is 64MiB

Just fyi, anza-xyz/agave#1355 attempted to introduce DEFAULT_LOADED_ACCOUNTS_DATA_SIZE_BYTES. Leader currently used "actual" loaded accounts size after execution in block packing, so a lower DEFAULT is critical atm

im definitely on board with reducing the default, but it should be done separately from this simd. after we implement the new algorithm, we can run it against ledger history to collect new data about what transaction sizes would look like with it enabled. calculated sizes will increase nontrivially and we dont want to pick a number that will cause significant breakage

for example, in #1355 2MiB is suggested as a good number that only causes 5% breakage. however, the jupiter v4 program is 2.5MiB and jupiter v6 is close to 3MiB. this means they must be passing their program as an instruction account, which due to the current algorithm, means the program is only counted as 36 bytes. it also means under the new algorithm which properly counts programs, a 2MiB default limit would break 100% of jupiter transactions

once we fix the size calculations, we will be in a better position to judge what the real sizes of transactions actually are

jstarry · 2024-11-05T16:24:34Z

proposals/0186-loaded-transaction-data-size-specification.md

+2. Each account's size is determined solely by the byte length of its data prior
+to transaction execution, irrespective of it is used on the transaction.


Suggested change

2. Each account's size is determined solely by the byte length of its data prior

to transaction execution, irrespective of it is used on the transaction.

2. Add each account's size, which is determined solely by the byte length of its data prior

to transaction execution, irrespective of how it is used on the transaction.

in my head i imagine 1 as defining a list of accounts, 2 and 3 defining how each one maps to a size, and 4 says to sum them, to avoid any ambiguity over how many times programdata is counted

jstarry · 2024-11-05T16:27:03Z

proposals/0186-loaded-transaction-data-size-specification.md

+3. For any loaded account identified as a `LoaderV3` program account, add the
+size of the programdata account it references to its own size, irrespective of
+how the program account is used on the transaction.


Suggested change

3. For any loaded account identified as a `LoaderV3` program account, add the

size of the programdata account it references to its own size, irrespective of

how the program account is used on the transaction.

3. For any loaded account identified as a `LoaderV3` program account, add the

size of the programdata account it references, irrespective of

how the program account is used in the transaction.

jstarry · 2024-11-05T16:27:35Z

proposals/0186-loaded-transaction-data-size-specification.md

+3. For any loaded account identified as a `LoaderV3` program account, add the
+size of the programdata account it references to its own size, irrespective of
+how the program account is used on the transaction.
+4. The total transaction size is the sum of these sizes.


Suggested change

4. The total transaction size is the sum of these sizes.

4. The total transaction loaded account data size is the sum of these sizes.

jstarry · 2024-11-06T17:57:40Z

proposals/0186-loaded-transaction-data-size-specification.md

+1. Given a transaction, take the unique set of account keys which are used as:
+    * An instruction account.
+    * A program ID for an instruction.
+    * The fee-payer.


We may have to include every account in the transaction actually if this SIMD is accepted: #163 (cc @Lichtso)

Yes, it is generally bad to try to have extra conditions for TX accounts depending on how they are used. That is how we got to the write lock demotion and only top-level-instructions counting loader-v3 programdata account mess we have.

cool, ill revert that change

also i love this simd, i was thinking the other week it would be so nice if you could specify accounts you might want to execute but dont need to see instruction data for

Nice catch!

2501babe · 2024-11-06T22:31:57Z

ive updated the language for the algorithm again and i think i finally got the declarative rather than procedural description ive been aiming for

jstarry · 2024-11-12T17:03:13Z

proposals/0186-loaded-transaction-data-size-specification.md

+1. The set of accounts that determine loaded transaction data size is defined as
+the unique intersection of:
+    * The set of account keys explicitly specified on the transaction,
+irrespective of how they are used.
+    * The set of programdata accounts referenced by the LoaderV3 program
+accounts specified on the transaction.


Oh just realized we should probably also include ALT account sizes as well

Hmm but ALTs are not read or write locked by the transaction so they could be extended in parallel so we can't use the actual size. Maybe just use the max size of LOOKUP_TABLE_META_SIZE + LOOKUP_TABLE_MAX_ADDRESSES * 32?

is it possible to know whether an ALT was used at all in transaction processing? my understanding is resolution happens too early and all the information is erased. im not sure if there is an easy solution to this or if we should defer adding it to transaction data size after the ALT async execution simd changes how that code works

It's possible, we could simply include that information in static meta and make it available to the account loader. The ALT SIMD changes will most likely only really cover failure cases so I don't think we need to defer this decision.

If we're just using the max size of a table, we don't need to store any additional information in static meta. We have the number of ALTs in the message itself.

and I (accidently) have already exposed getting that information in SVMMessage trait, so we should be able to get it in our specific implementation easily.

(based on discussion we are considering a flat 8192 byte cost for ALT usage, pending feedback from @tao-stones)

(based on discussion we are considering a flat 8192 byte cost for ALT usage ...

Should be 8248 byte cost because there is space in the ALT accounts for 56 bytes of metadata as well

8248 * 8 / 32000 = 2 CU, that doesn't sound bad at all.

Cool, let's go forward with that in the proposal for now then

jstarry · 2024-11-12T17:05:37Z

proposals/0186-loaded-transaction-data-size-specification.md

+As a consequence of 1 and 3, for LoaderV3 programs, programdata is counted twice
+if a transaction explicitly references the program account and its programdata
+account. This is done partly for simplicity, and partly to account for the cost
+of maintaining the compiled program in addition to the actual bytes of
+the programdata account.


I think we settled on just counting them once right?

yes i forgot to delete this

jstarry · 2024-11-12T17:06:11Z

proposals/0186-loaded-transaction-data-size-specification.md

+* We considered loading and counting sizes for accounts on the transaction
+account list which are not used for any purpose. This is the current behavior,
+but there is no reason to load such accounts at all.


This section is no longer relevant

…s in another simd

topointon-jump

Great to simplify things here! 🎉

topointon-jump · 2024-11-16T01:14:31Z

proposals/0186-loaded-transaction-data-size-specification.md

+If a transaction exceeds its data size limit, the transaction is failed. Fees
+will be charged once `enable_transaction_loading_failure_fees` is enabled.


Where will this be checked? Do transactions that fail this check make it into blocks? Do you mind clarifying this in the SIMD? 🙏

ok i updated the wording to:

If a transaction exceeds its data size limit, a loading failure occurs. This SIMD does not change any aspect of how such a failure is handled. At time of writing, such a transaction would be excluded from the ledger. When enable_transaction_loading_failure_fees is enabled, it will be written to the ledger and charged fees as a processed, failed transaction.

the idea is we arent changing any of the logic about how loading works here, the existing flow stays the same (loading happens at the same time in the same place of transaction processing, what kind of error you get if you exceed the limit, how that error is handled and how its reflected in ledger history, etc)

the only thing that changes is what the number of bytes you arrive at will be and how you determine what that number is

2501babe · 2024-11-18T18:14:05Z

@jacobcreech i believe we have consensus

2501babe force-pushed the transaction-data-size branch from af85c16 to c565856 Compare October 21, 2024 04:54

2501babe changed the title ~~SIMD-XXXX: Transaction Data Size Specification~~ SIMD-0186: Transaction Data Size Specification Oct 21, 2024

2501babe force-pushed the transaction-data-size branch from c565856 to a364ce7 Compare October 21, 2024 04:57

2501babe self-assigned this Oct 21, 2024

SIMD-0186: Transaction Data Size Specification

092a67c

2501babe force-pushed the transaction-data-size branch from a364ce7 to 092a67c Compare October 21, 2024 12:29

2501babe marked this pull request as ready for review October 21, 2024 12:30

2501babe requested a review from apfitzge October 21, 2024 12:30

apfitzge reviewed Oct 21, 2024

View reviewed changes

edits for clarity

04e283b

2501babe mentioned this pull request Oct 24, 2024

block writeable account data limit #184

Open

2501babe requested a review from brooksprumo October 24, 2024 21:23

brooksprumo approved these changes Oct 25, 2024

View reviewed changes

github-actions bot mentioned this pull request Oct 28, 2024

Upstream Updates - Mon Oct 28 00:14:59 UTC 2024 smartcontractkit/chainlink-solana#907

Closed

2501babe mentioned this pull request Oct 30, 2024

svm: allow conflicting transactions in entries anza-xyz/agave#3146

Closed

new language for another edge case

c694155

2501babe force-pushed the transaction-data-size branch from 25d0096 to c694155 Compare October 30, 2024 10:23

apfitzge reviewed Oct 30, 2024

View reviewed changes

jstarry reviewed Nov 2, 2024

View reviewed changes

2501babe changed the title ~~SIMD-0186: Transaction Data Size Specification~~ SIMD-0186: Loaded Transaction Data Size Specification Nov 4, 2024

2501babe force-pushed the transaction-data-size branch from ac55c34 to 7816e67 Compare November 4, 2024 11:40

update to new algorithm

078a6f9

2501babe force-pushed the transaction-data-size branch from 7816e67 to 078a6f9 Compare November 4, 2024 11:42

jstarry reviewed Nov 4, 2024

View reviewed changes

minor edits from reviews

8c0603b

jstarry reviewed Nov 5, 2024

View reviewed changes

minor copyedits

a015ab7

jstarry reviewed Nov 6, 2024

View reviewed changes

add invisible accounts back

64db94f

jstarry reviewed Nov 12, 2024

View reviewed changes

jstarry mentioned this pull request Nov 12, 2024

Loader-v3 programdata account size is not counted towards limit if executed through CPI anza-xyz/agave#2274

Open

2501babe added 2 commits November 12, 2024 11:18

minor edits

ec59003

add lookup table cost

173f092

jstarry approved these changes Nov 13, 2024

View reviewed changes

trivial copyedits since we are enshrining a use for invisible account…

194c751

…s in another simd

topointon-jump approved these changes Nov 16, 2024

View reviewed changes

clarify wording

59bf637

2501babe mentioned this pull request Nov 20, 2024

svm: skip appending loaders to loaded accounts anza-xyz/agave#3631

Merged


		## Backwards Compatibility

		Transactions that call LoaderV3 programs via CPI and are extremely close to the

	This means programdata may be counted zero, one, or two times per transaction.
	This means programdata may be counted zero, one, or two times per transaction.

		2. Each account's size is determined solely by the byte length of its data prior
		to transaction execution.

		`ComputeBudgetInstruction::SetLoadedAccountsDataSizeLimit` instruction to define
		a data size limit for the transaction. Otherwise, the default limit is 64MiB

		2. Each account's size is determined solely by the byte length of its data prior
		to transaction execution, irrespective of it is used on the transaction.

	4. The total transaction size is the sum of these sizes.
	4. The total transaction loaded account data size is the sum of these sizes.

		If a transaction exceeds its data size limit, the transaction is failed. Fees
		will be charged once `enable_transaction_loading_failure_fees` is enabled.

SIMD-0186: Loaded Transaction Data Size Specification #186

Are you sure you want to change the base?

SIMD-0186: Loaded Transaction Data Size Specification #186

Conversation

2501babe commented Oct 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2501babe commented Oct 24, 2024

2501babe commented Oct 24, 2024

brooksprumo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2501babe commented Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apfitzge Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2501babe Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2501babe Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lichtso Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2501babe commented Nov 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topointon-jump left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2501babe commented Nov 18, 2024

2501babe commented Oct 30, 2024 •

edited

Loading

apfitzge Oct 31, 2024 •

edited

Loading

2501babe Nov 5, 2024 •

edited

Loading

2501babe Nov 6, 2024 •

edited

Loading

Lichtso Nov 6, 2024 •

edited

Loading

topointon-jump left a comment •

edited

Loading