diff --git a/core/README.md b/core/README.md index 5777f561c..957245f1e 100644 --- a/core/README.md +++ b/core/README.md @@ -96,6 +96,7 @@ | [CAP-0060](cap-0060.md) | Update to Wasmi register machine| Graydon Hoare | Accepted | | [CAP-0061](cap-0061.md) | Smart Contract Standardized Asset (Stellar Asset Contract) Extension: Memo | Tomer Weller | Draft | | [CAP-0062](cap-0062.md) | Soroban Live State Prioritization | Garand Tyson | Draft | +| [CAP-0063](cap-0063.md) | Soroban In-memory Read Resource | Garand Tyson | Draft | ### Rejected Proposals | Number | Title | Author | Status | diff --git a/core/cap-0063.md b/core/cap-0063.md new file mode 100644 index 000000000..16bba2b07 --- /dev/null +++ b/core/cap-0063.md @@ -0,0 +1,447 @@ +``` +CAP: 0063 +Title: Soroban In-memory Read Resource +Working Group: + Owner: Garand Tyson <@SirTyson> + Authors: Garand Tyson <@SirTyson> + Consulted: Dmytro Kozhevin <@dmkozh>, Nicolas Barry <@MonsieurNicolas> +Status: Draft +Created: 2024-12-09 +Discussion: https://github.com/stellar/stellar-protocol/discussions/1585 +Protocol version: 23 +``` + +## Simple Summary + +This proposal introduces a new resource type for Soroban reads, distinguishing between in-memory and +disk reads. This also proposes automatic restoration for archived entries via `InvokeHostFunctionOp`. + +## Working Group + +As specified in the Preamble. + +## Motivation + +By distinguishing disk and in-memory reads, this proposal allows for significant increase in Soroban +read limits. This distinction also allows for safe automatic entry restoration, significantly improving UX. + +### Goals Alignment + +This change is aligned with the goal of lowering the cost and increasing the scale of the network. + +## Abstract + +[CAP-0062](cap-0062.md) introduces partial State Archival, where evicted and live Soroban state is stored in separate databases. +This separation allows live Soroban state to be efficiently cached entirely in memory, removing disc reads entirely for all +live Soroban state. However, classic state and evicted entries are still subject to disk reads. + +This CAP introduces a new resource type for reads, distinguishing between disk reads and in-memory reads. By making this +distinction at the protocol level, read limits for Soroban data can greatly increase. This also opens the door for other +optimizations, such as a complete module cache for all live contracts. + +Additionally, this CAP introduces automatic restoration via `InvokeHostFunctionOp`, where any archived key present in the +footprint is automatically restored. Initially in protocol 20, the state archival design was not solidified enough to +enable automatic restoration, so an explicit restore operation was required. While this was not technically required in +protocol 20, it was important for contract interfaces to reason properly about restoration such that when full state +archival was introduce, it would not break preexisting deployments. Given that the full State Archival design has been +mostly finalized in [CAP-0057](cap-0057.md), it is now appropriate to introduce automatic restoration. + +## Specification + +### XDR changes + +``` +--- a/Stellar-contract-config-setting.x ++++ b/Stellar-contract-config-setting.x +@@ -60,6 +60,46 @@ struct ConfigSettingContractLedgerCostV0 + uint32 bucketListWriteFeeGrowthFactor; + }; + ++// Ledger access settings for contracts. ++struct ConfigSettingContractLedgerCostV1 ++{ ++ // Maximum number of in-memory entry read operations per ledger ++ uint32 ledgerMaxInMemoryReadEntries; ++ // Maximum number of disk entry read operations per ledger ++ uint32 ledgerMaxDiskReadEntries; ++ // Maximum number of bytes of disk reads that can be performed per ledger ++ uint32 ledgerMaxDiskReadBytes; ++ // Maximum number of ledger entry write operations per ledger ++ uint32 ledgerMaxWriteLedgerEntries; ++ // Maximum number of bytes that can be written per ledger ++ uint32 ledgerMaxWriteBytes; ++ ++ // Maximum number of in-memory ledger entry read operations per transaction ++ uint32 txMaxInMemoryReadEntries; ++ // Maximum number of disk entry read operations per transaction ++ uint32 txMaxDiskReadEntries; ++ // Maximum number of bytes of disk reads that can be performed per transaction ++ uint32 txMaxDiskReadBytes; ++ // Maximum number of ledger entry write operations per transaction ++ uint32 txMaxWriteLedgerEntries; ++ // Maximum number of bytes that can be written per transaction ++ uint32 txMaxWriteBytes; ++ ++ int64 feeDiskReadLedgerEntry; // Fee per disk ledger entry read ++ int64 feeDiskRead1KB; // Fee for reading 1KB disk ++ int64 feeWriteLedgerEntry; // Fee per ledger entry write ++ ++ // The following parameters determine the write fee per 1KB. ++ // Write fee grows linearly until bucket list reaches this size ++ int64 bucketListTargetSizeBytes; ++ // Fee per 1KB write when the bucket list is empty ++ int64 writeFee1KBBucketListLow; ++ // Fee per 1KB write when the bucket list has reached `bucketListTargetSizeBytes` ++ int64 writeFee1KBBucketListHigh; ++ // Write fee multiplier for any additional data past the first `bucketListTargetSizeBytes` ++ uint32 bucketListWriteFeeGrowthFactor; ++}; ++ + // Historical data (pushed to core archives) settings for contracts. + struct ConfigSettingContractHistoricalDataV0 + { +@@ -302,7 +342,8 @@ enum ConfigSettingID + CONFIG_SETTING_STATE_ARCHIVAL = 10, + CONFIG_SETTING_CONTRACT_EXECUTION_LANES = 11, + CONFIG_SETTING_BUCKETLIST_SIZE_WINDOW = 12, +- CONFIG_SETTING_EVICTION_ITERATOR = 13 ++ CONFIG_SETTING_EVICTION_ITERATOR = 13, ++ CONFIG_SETTING_CONTRACT_LEDGER_COST_V1 = 14 + }; + + union ConfigSettingEntry switch (ConfigSettingID configSettingID) +@@ -335,5 +376,7 @@ case CONFIG_SETTING_BUCKETLIST_SIZE_WINDOW: + uint64 bucketListSizeWindow<>; + case CONFIG_SETTING_EVICTION_ITERATOR: + EvictionIterator evictionIterator; ++case CONFIG_SETTING_CONTRACT_LEDGER_COST_V1: ++ ConfigSettingContractLedgerCostV1 contractLedgerCost; + }; + } +diff --git a/Stellar-transaction.x b/Stellar-transaction.x +index 7d32481..f9bdc1f 100644 +--- a/Stellar-transaction.x ++++ b/Stellar-transaction.x +@@ -882,8 +882,8 @@ struct SorobanResources + // The maximum number of instructions this transaction can use + uint32 instructions; + +- // The maximum number of bytes this transaction can read from ledger +- uint32 readBytes; ++ // The maximum number of bytes this transaction can read from disk backed entries ++ uint32 diskReadBytes; + // The maximum number of bytes this transaction can write to ledger + uint32 writeBytes; + }; + ++struct SorobanResourcesExtV0 ++{ ++ // Vector of uint16_t indices representing what Soroban ++ // entries in the footprint are evicted, based on the ++ // order of keys provided in the readWrite footprint. ++ opaque<> evictedSorobanEntries; ++}; ++ + // The transaction extension for Soroban. + struct SorobanTransactionData + { +- ExtensionPoint ext; ++ union switch (int v) ++ { ++ case 0: ++ void; ++ case 1: ++ SorobanResourcesExtV0 resourceExt; ++ } ext; + SorobanResources resources; + // Amount of the transaction `fee` allocated to the Soroban resource fees. + // The fraction of `resourceFee` corresponding to `resources` specified +``` + +### Semantics + +#### In-memory vs Disk Read Resources + +Read resources are now separated between memory backed and disk backed state as follows: + +- in-memory entries: + - Live Soroban entries + - Archived entries that have not yet been evicted (i.e. still exist in the live BucketList) +- disk entries: + - Archived Soroban entries that have been evicted + - Classic entries + +In-memory entries are subject to the corresponding in-memory limits, but charge no entry or byte based +read fees. On disk entries are subject to the corresponding limits and also charge entry and byte based +read fees. + +Note that while read resources and fees have changed, write semantics have not. Writes to both entry +types have the same fee and limits as protocol 22. + +#### Initial Network Config Settings + +Initially, all disk read limits and fees should match exactly the limits and fees of reads today. Limits must not +decrease from current values in order to prevent bricking existing contracts. In the worst case, a contract can read +up to `txMaxReadEntries - 2` classic entries today (all classic entries + wasm + instance). Current limits have been +carefully measured assuming disk access only, so the addition of in-memory state should not negatively affect the network +in any way should we maintain these values via disk read limits. + +TODO: In-memory read entries must be measured in order to find a reasonable starting point. + +#### `evictedSorobanEntries` Vector + +For the purposes of block creation, the `LedgerKey` alone is sufficient to distinguish between classic and Soroban data. +However, a disk read would be required prior to tx application in order to determine if a Soroban entry was live or evicted. +This is necessary for determining what resources limits a Soroban key counts towards, but +exposes a significant DOS angle. To prevent this attack, the footprint must statically declare which entries are live vs. evicted. + +This is accomplished via the `evictedSorobanEntries` vector. If any evicted Soroban entry is present in a TX's footprint, +it must provide the `evictedSorobanEntries` vector containing the index of each evicted key based on the ordering of the +readWrite footprint. Because restores are a write operation, no evicted key should be in the readOnly footprint. This vector +should contain no classic key indexes, as classic keys are always considered disk reads. + +#### Changes to `InvokeHostFunctionOp` + +##### Automatic Entry Restoration + +Whenever `InvokeHostFunctionOp` is applied, any archived state is +automatically restored prior to host function invocation. This restored state is then accessible to the host function. + +Restored state is still subject to the same minimum rent and write fees that exist currently. + +If an archived Soroban entry has not yet been evicted and still exists in the live BucketList, it is metered and charged based +on in-memory read limits and fees. If an archived Soroban entry has been evicted, it is metered and charged based +on on-disk limits and fees. All evicted keys must be declared in the `evictedSorobanEntries` vector via `SorobanResourcesExtV0`. +If no evicted entries are present in the footprint, `SorobanResourcesExtV0` may be omitted. + +##### Incorrect Entries in `evictedSorobanEntries` Vector + +If a key is not marked as evicted, but the entry is actually evicted, then the TX fails. If a Soroban key is marked as evicted when +it is not, the transaction does not fail. The entry is treated as if it is evicted from a resource and fee perspective. +This is appropriate, as on disk reads are more expensive than in-memory. So long as the TX pays the disk based fees, +there is no issue with actually loading an in-memory entry. + +If a classic key is marked as evicted, the TX fails, as the footprint is malformed. + +##### Failure Codes + +If an evicted entry is included in the TX's footprint but is not specified via the `evictedSorobanEntries` vector, the +tx will fail with the `INVOKE_HOST_FUNCTION_ENTRY_ARCHIVED` code. If there are insufficient resources for entry restoration, +the tx will fail with the `INVOKE_HOST_FUNCTION_RESOURCE_LIMIT_EXCEEDED` code. + +If a classic entry is marked as evicted, the TX fails with `INVOKE_HOST_FUNCTION_MALFORMED` during apply time. + +##### Meta + +Whenever an entry is restored, it will be emitted as a `LedgerEntryChangeType` of type `LEDGER_ENTRY_RESTORE`. +This will include the complete LedgerEntry of the restored entry. It does not matter if the entry has been +evicted or not, the meta produced will be the same. + +If a restored entry is modified during host function invocation, another `LedgerEntryChangeType` will be +emitted corresponding to the entry change in addition to the `LEDGER_ENTRY_RESTORE` restore event. +This is similar to the current meta schema, where the starting value of a LedgerEntry is emitted as +`LEDGER_ENTRY_STATE` followed by the change. This is similar, except that `LEDGER_ENTRY_STATE` is replaced +by `LEDGER_ENTRY_RESTORE`. + +#### Deprecation of RestoreFootprintOp(?) + +With the addition of automatic restoration, there is currently no need for `RestoreFootprintOp`. Functionally, the same automatic +restoration behavior that exists in `InvokeHostFunctionOp` could be replicated in `ExtendFootprintTTLOp` to provide a single +"admin" op type for managing lifetimes and restoration. + +Currently, `ExtendFootprintTTLOp` is rarely called, as most rent management is performed via contract calls. With the addition of +automatic restoration, I expect that `RestoreFootprintOp` will also be rarely invoked. It may make sense from a maintenance and UX +perspective to deprecate `RestoreFootprintOp` in favor of a single op type. + +#### getledgerentry Endpoint + +In order to facilitate state queries for RPC preflight, captive-core will expose the following HTTP endpoint: + +`getledgerentry` + +Used to query both live and archived LedgerEntries. While `getledgerentryraw` does simple key-value lookup +on the live ledger, `getledgerentry` will query a given key in both the live BucketList and Hot Archive BucketList. +It will also report whether an entry is archived, evicted, or live, and return the entry's current TTL value. + +A POST request with the following body: + +```http +ledgerSeq=NUM&key=Base64&key=Base64... +``` + +- `ledgerSeq`: An optional parameter, specifying the ledger snapshot to base the query on. + If the specified ledger in not available, a 404 error will be returned. If this parameter + is not set, the current ledger is used. +- `key`: A series of Base64 encoded XDR strings specifying the `LedgerKey` to query. TTL keys + must not be queried and will return 400 if done so. + +A JSON payload is returned as follows: + +```json +{ +"entries": [ + {"e": "Base64-LedgerEntry", "state": "live", /*optional*/ "ttl": uint32}, + {"e": "Base64-LedgerKey", "state": "new"}, + {"e": "Base64-LedgerEntry", "state": "archived"}, + {"e": "Base64-LedgerEntry", "state": "evicted"} +], +"ledger": ledgerSeq +} +``` + +- `entries`: A list of entries for each queried LedgerKey. Every `key` queried is guaranteed to +have a corresponding entry returned. +- `e`: Either the `LedgerEntry` or `LedgerKey` for a given key encoded as a Base64 string. If a key +is live or archived, `e` contains the corresponding `LedgerEntry`. If a key does not exist +(including expired temporary entries) `e` contains the corresponding `LedgerKey`. +- `state`: One of the following values: + - `live`: Entry is live. + - `new`: Entry does not exist. Either the entry has never existed or is an expired temp entry. + - `archived`: Entry is archived, but not yet evicted, counts towards in-memory resources. + - `evicted`: Entry is archived and evicted, counts towards disk resources. +- `ttl`: An optional value, only returned for live Soroban entries. Contains +a uint32 value for the entry's `liveUntilLedgerSeq`. +- `ledgerSeq`: The ledger number on which the query was performed. + +Classic entries will always return a state of `live` or `new`. +If a classic entry does not exist, it will have a state of `new`. + +Similarly, temporary Soroban entries will always return a state of `live` or +`new`. If a temporary entry does not exist or has expired, it +will have a state of `new`. + +This endpoint will always give correct information for archived entries. Even +if an entry has been archived and evicted to the Hot Archive, this endpoint will +still the archived entry's full `LedgerEntry` as well as the proper state. + +## Design Rationale + +### Reasoning behind in-memory limits and no fees + +While there is a resource limit for the maximum number of in-memory entry reads, there is no limit on the +number of in-memory read bytes. Additionally, there are no read-specific fees for in-memory state. + +There does not seem to be a need for limits on in-memory bytes read. The cost associated with loading an +in-memory entry is searching an in-memory map and making a copy of the LedgerEntry. Given that there is a limit +on the maximum number of in-memory entries, and given that there is a limit on the maximum size for each entry, +it seems highly unlikely that an attacker would be able to DOS the network via in-memory entry copies. + +Given that in-memory state lookups are cheap, it also does not seem necessary to have an explicit fee for +in-memory reads. In-memory reads will already be implicitly charged fees due to instruction and memory costs +associated with interacting with the entry. An attacker could potentially exploit these implicit fees +by creating TXs with large readOnly footprints that don't actually interact with the loaded state. The +TX would consume little resources and would be cheap, but would load the maximum amount of in-memory state +and avoid these implicit fees. Due to the low cost of these loads, it seems unlikely this would negatively +affect the network. Even the TX size costs associated with the large footprint may make the attack economically +nonviable relative to the amount of stress on the network. However, if this were to become and issue, +this could be solved in implementation. For example, instead of passing copies of state to the VM, const +pointers to readOnly state could be used. + +### UX Expectations + +With the addition of the new read resource type, contract developers will need to reason about what types of data +they are accessing wrt limits. However, the system has been designed such that this should be straight forward. +From a correctness standpoint, all smart contracts can be guaranteed to only interact with live state. A contract +developer can assume that all Soroban state consumes only in-memory resources, since the `RestoreFootprintOp` +(or equivalent in the case of deprecation) can restore all required Soroban entries before host function invocation +in the worst case. This means that from a limits perspective, only Soroban state vs. classic state must be considered +for contract correctness. However, there may be issues with usability or efficiency, discussed in the section below. + +While contract developers only need to distinguish between Soroban and classic state for contract correctness, the +tx itself will still need to properly populate limits and the `evictedSorobanEntries` vector based on whether or not +entries are archived. Similar to today, these resources and footprints will be generated automatically via RPC. +Specifically, RPC will correctly generate the footprint, `evictedSorobanEntries` vector, in-memory read resources, +and disk read resources. Thus most of the complexity introduced by these changes should be abstracted away. + +Note that the RPC itself is not required to implement this behavior, but will call captive-core endpoints. + +### Host Functions that are too expensive for automatic restore + +Suppose an expensive contract is written, such that it uses the maximum in-memory resources and the maximum +disk resources when no entries are evicted. Should any Soroban entry be evicted, automatic +restoration would not be possible, as the restoration would exceed resource limits. In this case, it would be +necessary to manually issue a `RestoreFootprintOp` (or equivalent in the case of deprecation) prior to the +host function invocation. + +Ideally, this should be avoided if possible. The challenge will be communicating this to developers. This may be accomplished +via documentation. Additionally, RPC could provide a warning to developers when the submitted invocation would +exceed network limits should a sufficient amount of state restoration be required. This may help developers catch this +class of performance issue earlier in the development process. + +Despite warning and documentation, there will be expensive contracts that are not compatible with automatic restore +in all cases. For this reason, RPC preflight must be able to distinguish whether automatic restore is possible for a +given invocation. If it is not, preflight must return the appropriate `restorePreamble` and communicate that an +explicit restoration is required before function invocation. + +### Expectations for Downstream Systems + +RPC must make the following changes: + +1. Ingest the new `LEDGER_ENTRY_RESTORE` restore changes. This includes interpreting the new `LedgerEntryChangeType` +correctly, but also correctly handling the case where an entry is restored and modified in the same ledger. +2. Distinguish between live and evicted Soroban state during preflight in order to populate the new resource types +and populate the `evictedSorobanEntries` vector. +3. Notify users if a given preflighted transaction will not be able to be automatically restored due to resource limits, +and provide a restore preamble for manual restoration in these cases. +4. Allow user queries for both live and evicted state via the `getledgerentry` endpoint. + +Change 1 should be straight forward and is defined in the [InvokeHostFunctionOp Meta](#meta) section. + +For change 2, the construction of the footprint, resources, and evicted keys vector will be handled by the simulation +library. However, RPC will need to input a given LedgerEntry and its eviction state (whether or not it is evicted) +to the simulation library. RPC can accomplish this by utilizing the captive-core [`getledgerentry` HTTP endpoint](#getledgerentry-endpoint). + +Change 3 is largely an interface problem. The simulation library is responsible for detecting whether or not +automatic restoration will violate current network limits and if a manual restore op is required. It is then +the responsibility of RPC to forward this information to the user in an actionable way. It may also be useful +to warn a user if a given transaction could in the future require manual restoration, even if it is not the case +with the current ledger state. + +Change 4 allows wallets to effectively reason about ledger state. RPC integration is minimal, as internally +this endpoint can be implemented by calling the captive-core [`getledgerentry` HTTP endpoint](#getledgerentry-endpoint). + +### Concerns regarding Automatic Restoration with Full State Archival + +In protocol 20, there was no technical reason against enabling automatic restoration. However, the proof system +for full state archival was not finalized, and the explicit `RestoreFootprintOp` provided more future flexibility. +Additionally, it was important for developers to build preflighting transactions into their dApp flows. + +If Soroban had launched with automatic restoration, it is possible many developers would attempt to avoid preflight +by creating transaction footprint and resource declarations manually with reasonable fee and resource assumptions. +In a system where explicit entry restoration is not required, the burden of resource discoverability is low, so +developers are not incentivized or required to build preflight into their tx submission system. Long term this is +a significant issue, as the addition of full state archival with proofs makes manual resource and footprint +declaration largely impossible. Any system that did not use RPC preflight would be broken by full state archival. + +With this proposal, it is vital to ensure that automatic restore does not result in a situation where large numbers +of developers circumvent preflight. Two specific features of this proposal help ensure developers are incentivized +to build preflight into their flows now such that they are not broken later by archival proofs. Specifically, +no in-memory read fees incentivize developers to minimize fees by using preflight to detect the maximum +number of entries currently live. Additionally, the requirement of `evictedSorobanEntries` vectors makes manually +constructing footprints challenging, as the developer would need to be able to distinguish live vs evicted state +to minimize fees. Given that preflighting will result in lower on-chain fees, and given that resource footprints +are somewhat challenging to create without captive-core seems to indicate that developers will preflight transactions +prior to submission as intended. + +## Security Concerns + +### Memory Based DOS Attack + +By distinguishing between disk backed and memory backed reads at the protocol level, it is now required that +validators maintain all live Soroban state in memory. A significant increase in live Soroban state could +lead to an OOM based attack on the network. + +However, this is not a significant concern. The `sorobanLiveStateTargetSizeBytes` (see [CAP-0062](cap-0062.md)) +provides a soft cap on the amount of live Soroban state that can exist at any given time. This prevents any +sort of runaway memory issue. Additionally, because of the eviction process, there is a natural back pressure +applied to in-memory state, the rate at which can be controlled via network config settings. Finally, the rate +at which new state is created is also controlled via network configs, so there does not appear to be any +memory based DOS attack possible. + +## Test Cases + +TBD + +## Implementation + +TBD