Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add dummy circuit to final extraction for indexing Data with no provable extraction #415

Merged

Conversation

silathdiir
Copy link
Contributor

@silathdiir silathdiir commented Nov 29, 2024

The related document: Indexing Data with No Provable Extraction

Summary

  • Add a dummy circuit to final extraction, it exports the input digests as public values directly (without proving).
  • Add compute_table_row_digest and merge_table_row_digests for table row digest computation.
  • Add run_no_provable_test_cases to wrap the original test cases to run the final extraction and check with DB result directly (without values extraction).

silathdiir and others added 30 commits October 7, 2024 05:50
@nicholas-mainardi nicholas-mainardi changed the base branch from generic-extraction-integration-test-mapping-of-mappings to feat/support_non_consecutive_blocks_ryhope February 6, 2025 11:50
@@ -219,6 +269,38 @@ impl CircuitInput {
let length_proof = ProofWithVK::deserialize(&length_proof)?;
Ok(Self::Lengthed(LengthedCircuitInput { base, length_proof }))
}
/// Instantiate inputs for the dummy circuit dealing with no provable extraction case
pub fn new_no_provable_input<PrimaryIndex: PartialEq + Eq + Default + Clone + Debug>(
block_number: U256,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this "primary index" ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And shouldn't this be of type PrimaryIndex ? Or why we have a primary index generic in that case ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed block_number -> primary_index, block_hash -> root_of_trust, prev_block_hash -> prev_root_of_trust. Regarding the PrimaryIndex generic: I mostly introduced it to satisfy the trait bound for the CellsCollection provided as input to represent the rows of the table. It required some additional bounds but I have changed the type of primary_index to PrimaryIndex as well, given that we have the generic type I agree it makes sense to use it also for the primary_index input parameter

pub fn new_no_provable_input<PrimaryIndex: PartialEq + Eq + Default + Clone + Debug>(
block_number: U256,
block_hash: OffChainRootOfTrust,
prev_block_hash: HashOutput,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to provide one ? If it's not equal to the previous insertion, will it fail ? Couldn't we simply put a dummy value inside the API ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we don't provide here the block_hash used for previous primary index, a constraint in IVC proof will fail. I think it would be simpler and cleaner to provide the previous root of trust here rather than having to handle that constraint differently in the IVC proof, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As agreed in DM, in commit 2f71b25 I changed the interface of this method to take as input the previous IVC proof rather than the previous block hash, so that we can fetch the previous block hash from the proof internally

@@ -45,6 +48,15 @@ pub(crate) const OUTER_KEY_ID_PREFIX: &[u8] = b"\0OUT_KEY";

pub(crate) const BLOCK_ID_DST: &[u8] = b"BLOCK_NUMBER";

/// Compute the identifier for a column of a table containing off-chain data
pub fn identifier_offchain_column(table_name: &str, column_name: &str) -> u64 {
let inputs: Vec<F> = vec![table_name, column_name]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put a DST just because ? like OFFCHAIN || table_name || col_name ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 4a513aa

ColumnMetadata::NoPrimaryKey(REMAINING_COLUMN_NAMES[0].to_string()),
ColumnMetadata::NoPrimaryKey(REMAINING_COLUMN_NAMES[1].to_string()),
];
let mut off_chain_data_args = OffChainDataArgs::new(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we name it OffchainTable ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to OffChainTableArgs in commit 4a513aa to keep naming consistent with the other table types

Comment on lines 1860 to 1861
// there is no previous primary index, so we generate previous hash at random
HashOutput::from(array::from_fn(|_| rng.gen()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if there is, could we avoid relying on this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU this is asking the same as this comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been removed after the API changes in commit 2f71b25

prev_hash,
primary_index: self.last_update_epoch,
rows,
primary_key_columns: self.primary_key_column_ids(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing there should only be ONE primary key column no ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, a primary key may be composed of multiple columns: think of mapping of mappings tables, where the primary key is given by the keys of both the outer and the inner mappings. To clarify, primary key != primary index:: the primary key is the set of columns that uniquely identifies a row, while the primary index is just an index we build on a column (but it isn't necessarily unique for each row). I added a comment to ColumnMetadata of OffchainTableArgs data structure to clarify the meaning of primary key columns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh gotcha. ok. @delehef something we discussed this morning.

Copy link
Contributor

@Zyouell Zyouell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me! I had one or two comments but they are really more of a personal taste thing than actual needs so I've still mentioned them but will leave it up to you to decide whther they are good ideas or not!

}

/// Compute the row unique data using the set of column values provided as input
pub fn row_unique_data<'a, I: Iterator<Item = &'a [u8]>>(columns: I) -> HashOutput {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe if you changed this to row_unique_data<I: IntoIterator<Item = &[u8]>> and then changed it to let packed_columns = columns.into-iter(). .... it would make it so that you can pass anything to this function as long as it iterates a slices (and also gets rid of the lifetime) which would mean we wouldn't have to make everything into vec![stuff].into_iter() when we call this function above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 0cf3693, though it didn't get rid of the lifetime, but that's ok I guess

@@ -672,7 +672,7 @@ pub(crate) async fn cook_query_no_matching_entries(
) -> Result<QueryCooking> {
let initial_epoch = table.row.initial_epoch().await;
// choose query bounds outside of the range [initial_epoch, last_epoch]
let min_block = 0;
let min_block = max(0, initial_epoch - 2) as usize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pointless to me as a usize can't be smaller than zero

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial_epoch is a signed integer, so subtracting 2 could yield a negative value, that's why I added the max method. Does it make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah of course, you are converting to usize after the call to max my bad

@@ -110,11 +110,28 @@ pub async fn test_query(ctx: &mut TestContext, table: Table, t: TableInfo) -> Re
| TableSource::MappingStruct(_, _)
| TableSource::MappingOfSingleValueMappings(_)
| TableSource::MappingOfStructMappings(_) => query_mapping(ctx, &table, &t).await?,
TableSource::OffChain(_) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative solution to tables having to keep track of where they came from (because in my opinion once a table is a "table" it shouldn't care about where it came from), could we make query_table_with_no_consecutive_blocks() return and Option<QueryCookin> and then ust handle the None case in the query_mapping function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 0cf3693, nice suggestion, thanks

@mergify mergify bot mentioned this pull request Mar 5, 2025
Base automatically changed from feat/support_non_consecutive_blocks_ryhope to feat/new_extraction_features March 5, 2025 22:38
@nicholas-mainardi nicholas-mainardi merged commit 9d292a7 into feat/new_extraction_features Mar 6, 2025
5 checks passed
@nicholas-mainardi nicholas-mainardi deleted the indexing-with-no-probvable-ext branch March 6, 2025 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants