Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add single and mapping struct to integration test for generic extraction (Part 4) #397

Open
wants to merge 24 commits into
base: generic-extraction-row-id-update
Choose a base branch
from

Conversation

silathdiir
Copy link
Contributor

@silathdiir silathdiir commented Oct 28, 2024

No description provided.

@silathdiir silathdiir marked this pull request as draft October 28, 2024 09:14
@silathdiir silathdiir force-pushed the generic-extraction-integration-test branch 5 times, most recently from f453df0 to a38724c Compare October 31, 2024 09:35
@silathdiir silathdiir changed the base branch from generic-extraction-tree-creation to generic-extraction-row-id-update October 31, 2024 09:36
@silathdiir silathdiir force-pushed the generic-extraction-integration-test branch 3 times, most recently from da7559c to 1f979a4 Compare November 1, 2024 13:53
@silathdiir silathdiir force-pushed the generic-extraction-integration-test branch 3 times, most recently from 7c95a5b to 869a1ab Compare November 3, 2024 12:19
@silathdiir silathdiir changed the title [WIP] fix: add single and mapping struct to integration test for generic extraction (Part 3) fix: add single and mapping struct to integration test for generic extraction (Part 3) Nov 3, 2024
@silathdiir silathdiir marked this pull request as ready for review November 3, 2024 12:38
@silathdiir silathdiir force-pushed the generic-extraction-integration-test branch from 869a1ab to 93c80de Compare November 4, 2024 01:45
Copy link
Collaborator

@nikkolasg nikkolasg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall very nice work ! My comments are mostly related to the

  1. external API, how should it be, and I will probably make more comments down the line after this first review
  2. The genericity of the description of what are we extracting. We can not customize for each case that we support now we have to think about all the use case we want to support later as well and all the combination possible.

@@ -338,7 +339,7 @@ fn value_metadata<const MAX_COLUMNS: usize, const MAX_FIELD_PER_EVM: usize>(
}

/// Compute the table information for the value columns.
fn compute_table_info(
pub fn compute_table_info(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are doing big changes, let's take a shot at having a better unified API.
For example, the address chain_id fields could be directly put inside extra no ? Since we don't prove the computation of these identifiers, we dont need to have special ordering or whatever.
Then we could add if we want another API endpoint that call this one with explicit chain_id address genesis_block for example since those are the information we put. But we can do that on DQ side it's fine.

Copy link
Contributor Author

@silathdiir silathdiir Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, in commit c21fe13 I added the similar functions (*_raw) for this and other identifiers computation which could only pass the extra argument, the original functions (have contract_address and chain_id arguments) call these functions. If we set extra = (contract_address || chain_id || extra), the result of two similar functions should be same.

@@ -209,7 +210,7 @@ pub enum SlotInputs {
MappingWithLength(Vec<SlotInput>, u8),
}

#[derive(Debug)]
#[derive(Clone, Debug, Eq, PartialEq, Hash, Serialize, Deserialize)]
pub struct SlotInput {
/// Slot information of the variable
pub(crate) slot: u8,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant information, can we extract it and put it separately ?
i.e. a table column's set is represented by (slot, Vec<SlotInput>)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I make this SlotInput Struct corresponding to ColumnInfo, and seems it makes the conversion complicated. Please correct me if I was wrong, thanks.

@@ -209,7 +210,7 @@ pub enum SlotInputs {
MappingWithLength(Vec<SlotInput>, u8),
}

#[derive(Debug)]
#[derive(Clone, Debug, Eq, PartialEq, Hash, Serialize, Deserialize)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SlotInput::new() <-- now that we now there is no bitpacking level, then we can get rid of the bit_offset argument and just set it to 0 for the moment, until we refactor the circuit to be simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove bit_offset in the SlotInput and the new function in commit a2c4cfc.

Comment on lines 295 to 299
MappingValues((MappingValuesExtractionArgs, Option<LengthExtractionArgs>)),
/// Test arguments for single struct extraction
SingleStruct(SingleStructExtractionArgs),
/// Test arguments for mapping struct extraction
MappingStruct((MappingStructExtractionArgs, Option<LengthExtractionArgs>)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to bind MappingValues and MappingStruct together, otherwise it will lead to an cambrian explosion of combination down the line.
From what I see the only difference is that the latter has the additional information about the metadata, but we could make it as well for the single value, there is no reason not to, or am i missing stg ?

Same for the SingleValues and SIngleStruct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to fix in commit b33cd73:

  • Combine both single test cases into one, its table includes 4 single value slots and 1 Struct slot now.
  • Combine the mapping arguments to MappingExtractionArgs<V> (V could be Address or LargeStruct).
  • Update the merge case for the above single columns (4 single values and 1 Struct) and MappingExtractArgs<LargeStruct>.

/// Test arguments for single struct extraction
SingleStruct(SingleStructExtractionArgs),
/// Test arguments for mapping struct extraction
MappingStruct((MappingStructExtractionArgs, Option<LengthExtractionArgs>)),
Merge(MergeSource),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MergeSourceshould be adapted to be able to use the combined enum above, not just SingleValue and MappingValue (well it will be after you merge the values andstruct together as i suggest before)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this merge case to a more complicated case, it includes the single columns of 4 single values and 1 Struct, and the mapping columns of Struct now. Please correct me if I was wrong.

Comment on lines 51 to 52
outer_key_id: Option<u64>,
inner_key_id: Option<u64>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a bit hard for me to realize what should the API look like before this PR, but this definitely doesn't sound good. We need an API that is generic enough to cover

myVar uint256
myStruct MyStruct
myMapping mapping(uint256 => Address)
myStructMapping mapping(uint256 => MyStruct)
myArraySingle []uint256
myArrayStruct []MyStruct
myMappingArray mapping(uint256 => []MyStruct) 
myDoubleMapping mapping(uint256 => mapping(Address => MyStruct)) 

The last two are the most complicated ones but you see the point.
We should not have to ahve these two outer and inner keys if we don't need them here. It looks to me the StorageSlot structure because it's recursive could already support giving all the details for that. Is this correct or is it missing some things ?
It's ok we don't support the array and mapping of arrays use case now but we should eventually, so thinking about this in the most generic way should be the goal here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the StorageSlotInfo is a collection which fields should be used during the values extraction proving process in the integration test storage trie. I think it may be different in DQ (let me think more details in PR #404).

Maybe I should replace this outer_key_id and inner_key_id to a vector of mapping key IDs. What do you think?

  • Single slot has no mapping key ID (it only has value IDs).
  • Mapping slot has one mapping key ID.
  • Mapping of mappings slot has both the outer and inner mapping key IDs.

Copy link
Contributor Author

@silathdiir silathdiir Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove outer_key_id and inner_key_id from StorageSlotInfo in commit 50a58d9.

Copy link
Contributor

@nicholas-mainardi nicholas-mainardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave some comments but after a while I skipped table_source and indexing because I agree with @nikkolasg that the main change is to avoid having specific data structures for each variant: ideally, we should have a data structure for simple variables (where each could be a scalar or a struct), and one for mappings (where the mapping value can be either a simple variable or a struct). So I think it makes sense to review deeply these files only after these changes

mp2-v1/src/values_extraction/gadgets/column_gadget.rs Outdated Show resolved Hide resolved
mp2-v1/tests/common/cases/table_source.rs Show resolved Hide resolved
mp2-v1/tests/common/cases/table_source.rs Show resolved Hide resolved
mp2-v1/tests/common/cases/table_source.rs Show resolved Hide resolved
mp2-v1/tests/common/storage_trie.rs Show resolved Hide resolved
mp2-v1/tests/common/values_extraction.rs Outdated Show resolved Hide resolved
mp2-v1/src/api.rs Show resolved Hide resolved
Copy link
Collaborator

@nikkolasg nikkolasg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good ! Left a few comments tho that I think could make a cleaner approach as suggestion, but let's discuss, it's also LGTM as current code !

}

/// Common functions for a specific type to interact with the test contract
pub trait SimpleContractValue {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of the Simple here ? simple starts to get overloaded.
Maybe ContractManipulator or ContractController or stg ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to ContractController in commit 911aadb.

Comment on lines 49 to 52
async fn update_contract_single_values(&self, ctx: &TestContext, contract: &Contract);

/// Update the mapping values to the test contract.
async fn update_contract_mapping_values(&self, ctx: &TestContext, contract: &Contract);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why two methods different with the same inputs ? From what I see, only one or the other is implemented ?
So we could make the trait simpler no ?
stg like

trait ContractController {
   async fn current_values(...) -> Self;
   async fn update_contract(&self, ....);
}

wdyt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the trait functions to current_values and update_contract in commit 911aadb.

metadata(&table_info)
}

fn storage_slots(&self, metadata: &[MetadataGadget]) -> Vec<StorageSlot> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not related to this PR but it's frankly confusing to see an Array of metadata gadget.

  • Gadget name refers usually to circuit, here we're not dealing with any circuits
  • The reason there's an array is just to separate the columns that corresponds to a certain EVM word. But a regular HashMap works as well. We could have a single struct Metadata { words: HashMap<EvmWord, MetadataSubset> for example.

and that would allow us to avoid doing weird shenanigans with re-ordering the columns according to the EVM words being used (here) etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rename the previous MetadataGadget to ColumnsMetadata, but leave build and assign functions to MetadataGadget (empty struct now) in commit 051bf96.

Metadata { words: HashMap<EvmWord, MetadataSubset>

Seems the ColumnsMetadata (or MetadataSubset) needs to include the all information to extract the EVM value, since one ColumnsMetadata belongs to one leaf node of values extraction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't find this super intuitive for the end user. Maybe we could change mp2-v1 APIs (i.e., like these methods) to take as input, in place of ColumnsMetadata, the description of the table (i.e., Vec<ColumnInfo>) and the set of ids of columns to be extracted, so that MetadataGadget (which is a circuit specific data structure, and so should not be directly exposed if possible) can always be constructed inside mp2-v1? Alternatively, in place of the set of ids of columns to be extracted, we might only provide the evm_word as input and then the columns to be extracted are the ones with ColumnInfo.evm_word == evm_word?
This would also allow to avoid having both MetadataGagdet and the "alias" struct ColumnsMetadata being publicly exposed, which looks a bit weird to me. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I fix to pass the evm_word and table_info to these APIs, and extract the column info of same slot and evm word to build the metadata in internal. Please help review this commit 50a58d9, thanks.

panic!("Wrong slot number");
}
current_values
.update_contract_single_values(ctx, contract)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the updates are deterministic then couldn't we put that inside the trait implementation directly ? Instead of computing the updates outside of the trait (which requires us to be very specific about which struct / value etc it is) and then just ask the trait to update. Wdyt ? Maybe i'm missing stg obvious here, not sure..

Copy link
Contributor Author

@silathdiir silathdiir Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the update value is random (as +1 or -2), and we don't exactly know if slot (or field of a Struct) is the secondary index (when it's a secondary index update), I combine the both 4 single slots and 1 Struct into this SingleExtractionArgs and set a random slot_input as the secondary index (it could be one single slot or a field of Struct), so seems only SingleExtractionArgs knows which is the secondary index (not the Value which implements the trait).

Copy link
Contributor

@nicholas-mainardi nicholas-mainardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better, thanks for the significant refactoring. Still some smaller changes, but it's getting in a good shape ;)

mp2-v1/tests/common/cases/contract.rs Outdated Show resolved Hide resolved
mp2-v1/tests/common/cases/contract.rs Show resolved Hide resolved
mp2-v1/tests/common/cases/table_source.rs Outdated Show resolved Hide resolved
);
old_table_values.compute_update(&new_table_values[0])
let input = ExtractionProofInput::Single(ExtractionTableProof {
dimension: TableDimension::Compound,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is TableDimension still necessary or can we remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I delete the TableDimension in commit 29fd77b, re-run the integration test and it could work for me.

mp2-v1/tests/common/cases/indexing.rs Show resolved Hide resolved
};
debug!("MAPPING ZK COLUMNS -> {:?}", columns);
let index_genesis_block = ctx.block_number().await;
let row_unique_id = TableRowUniqueID::Mapping(columns.secondary.identifier());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The row unique id, in case of mappings, should be always computed from the mapping key, independently from whether the mapping key is the secondary index or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, I fix to the key ID in commit 2d1aa8a.

mp2-v1/tests/common/storage_trie.rs Outdated Show resolved Hide resolved
metadata(&table_info)
}

fn storage_slots(&self, metadata: &[MetadataGadget]) -> Vec<StorageSlot> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't find this super intuitive for the end user. Maybe we could change mp2-v1 APIs (i.e., like these methods) to take as input, in place of ColumnsMetadata, the description of the table (i.e., Vec<ColumnInfo>) and the set of ids of columns to be extracted, so that MetadataGadget (which is a circuit specific data structure, and so should not be directly exposed if possible) can always be constructed inside mp2-v1? Alternatively, in place of the set of ids of columns to be extracted, we might only provide the evm_word as input and then the columns to be extracted are the ones with ColumnInfo.evm_word == evm_word?
This would also allow to avoid having both MetadataGagdet and the "alias" struct ColumnsMetadata being publicly exposed, which looks a bit weird to me. Wdyt?

mp2-v1/src/api.rs Show resolved Hide resolved
Copy link
Contributor

@nicholas-mainardi nicholas-mainardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now! APIs for column gadget looks much better now, thanks for all the hard work on this PR

@silathdiir silathdiir changed the title fix: add single and mapping struct to integration test for generic extraction (Part 3) fix: add single and mapping struct to integration test for generic extraction (Part 4) Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants