Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow domain owner to change Private EVM contract creation allow list #3381

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

teor2345
Copy link
Member

@teor2345 teor2345 commented Feb 12, 2025

Why are we doing this?

We want to allow the domain owner to change the EVM contract creation allow list. So we need to add a call to the subspace runtime, which gets turned into an inherent for the EVM domain runtime.

What it does

This PR makes the following changes:

  • use the EthereumAccountId type in all code that deals with contract creation filters (called AccountId in EVM runtime code)
  • add a call to pallet-domains, and turn it into an inherent in pallet-evm-tracker
  • add a fraud proof for the new inherent
  • optional: improve UX by changing the pallet-domains call to take a list of accounts, rather than an encoded call

This is similar to pallet-domain-sudo, except that we're sending just one kind of call (earlier in the PR), or just the call argument (at the end of the PR).

Breaking Changes

This PR has some (potentially) breaking changes:

  • a new runtime API version
  • adding new fields to structs passed to and from host functions
  • adding new variants to enums passed to the stateless runtime, and new API impls on the stateless runtime
  • new fraud proof fields, storage key request variants, and other enum variants

What it doesn’t do

There are no tests yet, I'll do those in another PR, while we're testing on devnet.

Focus of Initial Reviews

The next step is testing on Devnet, and writing another PR with Rust tests. I don’t think we need to merge it to do that, but we should have any major changes done.

So here’s what I’d like to focus on in the review:

  • is the code good enough to deploy to Devnet?
  • are there any API changes needed? (ideally they should happen before Rust or Devnet tests)
  • do we want the call to take an encoded blob, or a list of accounts? There is a commit for each of these alternatives.
  • are there any migrations needed for Taurus?

Code contributor checklist:

@teor2345 teor2345 added improvement it is already working, but can be better breaking-runtime This PR introduces breaking changes to the runtime audit-P2 Medium audit priority labels Feb 12, 2025
@teor2345 teor2345 self-assigned this Feb 12, 2025
@teor2345 teor2345 enabled auto-merge February 12, 2025 04:12
Copy link
Member

@NingLin-P NingLin-P left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the code good enough to deploy to Devnet?

If you just want to test it before deploy to Taurus, you can set up a local dev network with --chain dev

are there any API changes needed? (ideally they should happen before Rust or Devnet tests)

Not sure what kind of API you mean

do we want the call to take an encoded blob, or a list of accounts? There is a commit for each of these alternatives.

I much prefer to take the call argument as input instead of a whole encoded call, the required fraud proof and host function change should be similar to the messenger channel allowlist update.

are there any migrations needed for Taurus?

Not storage migration needed but we need to careful with the runtime API and host function change

Comment on lines +1810 to +1822
let domain_runtime = Self::domain_runtime_code(domain_id).ok_or(
Error::<T>::DomainRegistry(DomainRegistryError::DomainNotFound),
)?;
ensure!(
domain_runtime_call(
domain_runtime,
StatelessDomainRuntimeCall::IsValidEvmDomainContractCreationAllowedByCall(
contract_creation_allowed_by.clone()
),
)
.unwrap_or(false),
Error::<T>::InvalidEvmDomainContractCreationAllowedByCall
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed as we take the call argument instead of the encoded call

Copy link
Member Author

@teor2345 teor2345 Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all argument values going to be valid for all EVM domains?
For example, what if a specific Private EVM domain wanted to ban Anyone as an argument?

Edited:

This is required so that older EVM domains can reject the inherent:
https://github.com/autonomys/subspace/pull/3381/files#r1953459735

I added comments to explain why it's needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, what if a specific Private EVM domain wanted to ban Anyone as an argument?

No sure I understand, the domain owner create the domain and update the allowlist, if they don't want to use Anyone no one can force the allowlist update to Anyone

If there are any validation needed for the arg in the future, that validation can be added to the inherent call (e.g. fail to execute if a wrong arg is provide), which is also in the domain runtime code but not like this in a runtime API and checked in the consensus runtime.

Comment on lines +417 to +421
<Self as EvmTrackerApi<Block>>::construct_evm_contract_creation_allowed_by_extrinsic(
self,
Default::default(),
inner_call,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to check runtime api version to see if this API exist in the domain runtime code

Copy link
Member Author

@teor2345 teor2345 Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new API, so there's no versioning here. What kind of check do you mean? Is there an example somewhere?

Edited:

I did it with a new API, because we only want EVM domains to implement that API. It doesn't make sense for other domains.

construct_domain_inherent_extrinsic only calls construct_evm_contract_creation_allowed_by_extrinsic if the fraud proof contains that EVM call. That call (and fraud proof) is only valid if the domain is an upgraded EVM domain.

If it's not EVM or not upgraded, then the existing code finds an unimplemented API method, and returns an error:

self.executor
.call(
&mut ext,
&runtime_code,
fn_name,
&input,
CallContext::Offchain,
)
.0
.map_err(|err| {
ApiError::Application(format!("Failed to invoke call to {fn_name}: {err}").into())
})

This eventually causes verify_invalid_domain_extrinsics_root_fraud_proof to return VerificationError::FailedToDeriveDomainInherentExtrinsic, causing validate_fraud_proof to return FraudProofError::InvalidExtrinsicRootFraudProof, which marks the transaction as invalid.

I think this is correct behaviour, because a fraud proof containing an EVM call for a non-EVM domain (or an EVM domain that hasn't been upgraded) is an invalid fraud proof. Is there some other way we should handle the error?

Actually, I think there's one way this could go wrong:

  • pallet-domains is upgraded and has the new call
  • but the EVM domain is not upgraded, so the call is invalid

This is why we need is_valid_evm_contract_creation_allowed_by_call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new API, so there's no versioning here. What kind of check do you mean? Is there an example somewhere?

See https://github.com/autonomys/polkadot-sdk/blob/819a5818284a96a5a5bd65ce67e69bab860d4534/substrate/primitives/api/src/lib.rs#L591-L594

Actually, I think there's one way this could go wrong:
pallet-domains is upgraded and has the new call
but the EVM domain is not upgraded, so the call is invalid

This usually happens because the consensus and domain runtime are not guaranteed to happen in the same block.

But I don't think we need is_valid_evm_contract_creation_allowed_by_call to prevent this, because if pallet-domains is upgraded but the domain runtime is not then:

  • During domain block production, even inherent data for the allowlist update is provided, the domain runtime doesn't have construct_evm_contract_creation_allowed_by_extrinsic to construct the inherent tx so no inherent tx added to the block
  • During FP verification, since the domain runtime code don't has construct_evm_contract_creation_allowed_by_extrinsic we simple return None to keep consistency with the domain block production process.

domains/pallets/evm-tracker/src/lib.rs Outdated Show resolved Hide resolved
domains/runtime/evm/src/lib.rs Outdated Show resolved Hide resolved
@@ -102,6 +103,8 @@ pub struct DomainInherentExtrinsicData {
pub consensus_transaction_byte_fee: Balance,
pub domain_chain_allowlist: DomainAllowlistUpdates,
pub maybe_sudo_runtime_call: Option<Vec<u8>>,
pub maybe_evm_domain_contract_creation_allowed_by_call:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes both the input and output of the host function, while both the input and output will SCALE encoded and if this value is None it should be fine, but I think it is better to have test to ensure it won't break anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are coming in the next PR, can you point me to an example of what this test looks like?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are coming in the next PR, can you point me to an example of what this test looks like?

I'm actually not comfortable pushing this without testing. Reason being any runtime upgrades before the tests and might need potential fix will break the fraud proofs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you point me to an example of what this test looks like?

It is hard to do that in code because a lot of infra will be needed, but you can set up a local dev network with an old release and then upgrade the node with binary built from this branch

Copy link
Member Author

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick review, I have a few follow-up questions!

Comment on lines +1810 to +1822
let domain_runtime = Self::domain_runtime_code(domain_id).ok_or(
Error::<T>::DomainRegistry(DomainRegistryError::DomainNotFound),
)?;
ensure!(
domain_runtime_call(
domain_runtime,
StatelessDomainRuntimeCall::IsValidEvmDomainContractCreationAllowedByCall(
contract_creation_allowed_by.clone()
),
)
.unwrap_or(false),
Error::<T>::InvalidEvmDomainContractCreationAllowedByCall
);
Copy link
Member Author

@teor2345 teor2345 Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all argument values going to be valid for all EVM domains?
For example, what if a specific Private EVM domain wanted to ban Anyone as an argument?

Edited:

This is required so that older EVM domains can reject the inherent:
https://github.com/autonomys/subspace/pull/3381/files#r1953459735

I added comments to explain why it's needed.

@@ -102,6 +103,8 @@ pub struct DomainInherentExtrinsicData {
pub consensus_transaction_byte_fee: Balance,
pub domain_chain_allowlist: DomainAllowlistUpdates,
pub maybe_sudo_runtime_call: Option<Vec<u8>>,
pub maybe_evm_domain_contract_creation_allowed_by_call:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are coming in the next PR, can you point me to an example of what this test looks like?

Comment on lines +417 to +421
<Self as EvmTrackerApi<Block>>::construct_evm_contract_creation_allowed_by_extrinsic(
self,
Default::default(),
inner_call,
)
Copy link
Member Author

@teor2345 teor2345 Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new API, so there's no versioning here. What kind of check do you mean? Is there an example somewhere?

Edited:

I did it with a new API, because we only want EVM domains to implement that API. It doesn't make sense for other domains.

construct_domain_inherent_extrinsic only calls construct_evm_contract_creation_allowed_by_extrinsic if the fraud proof contains that EVM call. That call (and fraud proof) is only valid if the domain is an upgraded EVM domain.

If it's not EVM or not upgraded, then the existing code finds an unimplemented API method, and returns an error:

self.executor
.call(
&mut ext,
&runtime_code,
fn_name,
&input,
CallContext::Offchain,
)
.0
.map_err(|err| {
ApiError::Application(format!("Failed to invoke call to {fn_name}: {err}").into())
})

This eventually causes verify_invalid_domain_extrinsics_root_fraud_proof to return VerificationError::FailedToDeriveDomainInherentExtrinsic, causing validate_fraud_proof to return FraudProofError::InvalidExtrinsicRootFraudProof, which marks the transaction as invalid.

I think this is correct behaviour, because a fraud proof containing an EVM call for a non-EVM domain (or an EVM domain that hasn't been upgraded) is an invalid fraud proof. Is there some other way we should handle the error?

Actually, I think there's one way this could go wrong:

  • pallet-domains is upgraded and has the new call
  • but the EVM domain is not upgraded, so the call is invalid

This is why we need is_valid_evm_contract_creation_allowed_by_call.

domains/runtime/evm/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member Author

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added commits that inline the redundant trait, and explain why we need to check calls for validity.

are there any API changes needed? (ideally they should happen before Rust or Devnet tests)

Not sure what kind of API you mean

Any changes to functions or types that we'll use in Rust tests?
Or any significant behaviour changes that we'll be testing in Rust or Devnet tests?

EthereumAccountId,
>,
) -> DispatchResult {
let maybe_owner = ensure_signed(origin)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually prefix maybe_* if its optional type

Error::<T>::DomainRegistry(DomainRegistryError::DomainNotFound),
)?;
ensure!(
domain_runtime_call(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be simplified.
Only reason we want to check is if the specific call exists in the Domain Runtime since consensus runtime upgrade may end up before domain runtime upgrade. Once the upgrade is pushed to domains, this call is redundant and can be removed. Benefits from saving a host function call

@@ -102,6 +103,8 @@ pub struct DomainInherentExtrinsicData {
pub consensus_transaction_byte_fee: Balance,
pub domain_chain_allowlist: DomainAllowlistUpdates,
pub maybe_sudo_runtime_call: Option<Vec<u8>>,
pub maybe_evm_domain_contract_creation_allowed_by_call:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are coming in the next PR, can you point me to an example of what this test looks like?

I'm actually not comfortable pushing this without testing. Reason being any runtime upgrades before the tests and might need potential fix will break the fraud proofs

// - timestamp extrinsic
// - executive set_code extrinsic
// - messenger update_domain_allowlist extrinsic
// - block_fees transaction_byte_fee_extrinsic
// - domain_sudo extrinsic
// since we use `push_front` the extrinsic should be pushed in reversed order
// - evm_tracker contract_creation_allowed_by extrinsic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order is incorrect here. EVM tracker as per the contruct_runtime of the EVM is after messenger

https://github.com/autonomys/subspace/blob/main/domains/runtime/evm/src/lib.rs#L840

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls also add a test case in test_bad_fraud_proof_is_rejected by adding a allowlist update extrinsic after this line:

alice
.construct_and_send_extrinsic(pallet_balances::Call::transfer_allow_death {
dest: Bob.to_account_id(),
value: 12334567890987654321,
})
.await
.expect("Failed to send extrinsic");

/// An inherent call to set ContractCreationAllowedBy.
#[pallet::call_index(1)]
#[pallet::weight((T::DbWeight::get().reads_writes(1, 1), DispatchClass::Mandatory))]
pub fn inherent_set_contract_creation_allowed_by(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inherent_* prefix would be redundant here

@@ -73,6 +73,7 @@ pub type EthereumSignature = EVMSignature;

/// Some way of identifying an account on the EVM chain. We intentionally make it equivalent
/// to the public key of the EVM transaction signing scheme.
// TODO: sometimes this type alias causes complex trait ambiguity / conflicting implementation errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note instead of TODO is better.

TODO would be something we want to handle in future but in this case I'm not sure about the future change

@@ -911,6 +914,34 @@ fn construct_sudo_call_extrinsic(encoded_ext: Vec<u8>) -> <Block as BlockT>::Ext
)
}

/// Returns `true` if this is a valid pallet-evm-tracker "contract creation allowed by" inherent
/// call.
fn is_valid_evm_contract_creation_allowed_by_call(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can always adapt the code when we have such requirements. No need to future proof this right now. This just adds more noise right now

@vedhavyas
Copy link
Member

vedhavyas commented Feb 13, 2025

Thinking further on is_valid_evm_contract_creation_allowed_by_call api.
Ideally, we should NOT allow such a call in Public EVM as this could lead to misconfiguration by mistake. Only private EVM should allow such a call. So the call is useful but result of the call should be varied for type of EVM.

Here is what I'm thinking, we can allow users to pick the type of EVM they want

enum EVMType {
	#[default]
	Public,
	Private(ContractCreationAllowedBy)
}

This also enables us to know the EVM type of a given Domain. We currently use set_storage for stateless runtime call with genesis data as we require DomainID which is set at the time of instantiation and during the stateless call, we would know the DomainID of runtime and EVMChainInfo like evm_chain_id. Since we add evm type to the domain, this will be accessible and based on the type of EVM, we can either allow or disallow the contract_creation_allowed_by inherent from the consensus itself.

This is not a blocker for this PR but would need to go in later before we deploy.

I have also noticed another thing regarding the genesis hash of domain when it comes to taurus specifically this piece of code https://github.com/autonomys/subspace/blob/main/crates/pallet-domains/src/runtime_registry.rs#L156

When a new operator tries to sync from initial block, due to the above piece of code, currently running genesis hash would be different as contract_creation_allowed_by did not exist in the taurus domain.

So we would need to handle the migration in way that when requesting genesis storage using into_complete_raw_genesis, the contract_creation_allowed_by should not be set for existing domains on taurus. Unfortunately, this piece of code will have to exist for as long as Taurus network is running.

For mainnet this would not be an issue since there are no domains yet.

@NingLin-P
Copy link
Member

Any changes to functions or types that we'll use in Rust tests?
Or any significant behaviour changes that we'll be testing in Rust or Devnet tests?

Adding a test case in test_bad_fraud_proof_is_rejected would be helpful, Devnet test (the log run network test) are not implemented yet.

When a new operator tries to sync from initial block, due to the above piece of code, currently running genesis hash would be different as contract_creation_allowed_by did not exist in the taurus domain.

I have considered this when I reviewed the previous private EVM PR, but it is not an issue because when the node syncs from genesis it will use the block where the domain is instantiated to derive the genesis domain block instead of the latest block in the chain.

@vedhavyas
Copy link
Member

vedhavyas commented Feb 13, 2025

I have considered this when I reviewed the previous private EVM PR, but it is not an issue because when the node syncs from genesis it will use the block where the domain is instantiated to derive the genesis domain block instead of the latest block in the chain.

Okay that is fair. But in general problem still exists where I cannot use the latest block to actually derive the correct genesis hash. Ofcourse, we can always derive it from the block at which this was instantiated but Ideally at any block, shouldn't this always gives you same genesis for a given domain ?

Or maybe I'm overthinking this situation

@NingLin-P
Copy link
Member

But in general problem still exists where I cannot use the latest block to actually derive the correct genesis hash. Ofcourse, we can always derive it from the block at which this was instantiated but Ideally at any block, shouldn't this always gives you same genesis for a given domain ?

That is true, but we don't have use case of this currently, we can keep the genesis ER in the consensus chain never pruned if needed. Also, even without the allowlist state, the issue still exists since the domain runtime code in the consensus chain will also change after upgrade.

@vedhavyas
Copy link
Member

That is true, but we don't have use case of this currently, we can keep the genesis ER in the consensus chain never pruned if needed. Also, even without the allowlist state, the issue still exists since the domain runtime code in the consensus chain will also change after upgrade.

That is also correct. We should store the initial genesis state used at the time of domain instantiation in the domain object itself. This ensure, the genesis storage is untouched for the lifetime of the doman and will always gives the correct storage at any block

not a blocker for this PR but if agreed, should be handled in a different PR through a tracking github issue

@NingLin-P
Copy link
Member

We should store the initial genesis state used at the time of domain instantiation in the domain object itself. This ensure, the genesis storage is untouched for the lifetime of the doman and will always gives the correct storage at any block

I don't think we should do that, which will bloat up the consensus state and may end up storing all versions of domain runtime code in the worst case, and this is just for a non-exist use case.

Again, we already store the genesis_receipt_hash in the domain object, if needed we can store the whole ER, which may be useful for verification during domain sync.

@vedhavyas
Copy link
Member

Why would it store all the versions? It would store the genesis storage for each domain instance once.

While this maybe a non existent issue but nevertheless an issue if we ever need to reconstruct the genesis. I do not see any reason why we should not do that 🤔

@NingLin-P
Copy link
Member

Why would it store all the versions? It would store the genesis storage for each domain instance once.

If there are domains instantiated at each runtime version, we at least need to keep one copy of each version of domain runtime.

While this maybe a non existent issue but nevertheless an issue if we ever need to reconstruct the genesis.

Well, every domain node are able to reconstruct the genesis from consensus block history during sync. What you propose just reconstruct the domain genesis from any height of the consensus chain, as the genesis state is the stalest state I don't see why would anyone want to reconstruct and query from it.

do not see any reason why we should not do that

As said above, it bloats up the consensus state for no good reason.

@vedhavyas
Copy link
Member

Hmm I don't necessarily agree with state bloat.

Maybe let see what @teor2345 and @nazar-pc has to say

@vedhavyas
Copy link
Member

moving this genesis storage discussion to this issue - #3388

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit-P2 Medium audit priority breaking-runtime This PR introduces breaking changes to the runtime improvement it is already working, but can be better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants