Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add format validation to era2-stats binary #1555

Merged
merged 3 commits into from
Oct 27, 2024

Conversation

morph-dev
Copy link
Collaborator

What was wrong?

The "era2-stats" binary doesn't check validity of the file.

How was it fixed?

Added validation. The only thing missing is checking that state data actually results in the canonical state trie, but that is covered by import-state subcommand of the trin-execution.

To-Do

@morph-dev morph-dev requested a review from KolbyML October 25, 2024 20:31
@morph-dev morph-dev self-assigned this Oct 25, 2024
@KolbyML
Copy link
Member

KolbyML commented Oct 25, 2024

Checking for sequentiality and if the content is valid is 2 different things no. I guess you could say we are checking if the files our format specification which includes sequentially. But I would say just saying "validation" is misleading. As we are not checking if the data itself is valid, this PR is checking if our the file follows the format requirements we laid out.

Even if the accounts are out of order, the data can still be valid and generate the canonical state trie, so I think the wording is misleading.

@morph-dev
Copy link
Collaborator Author

Even if the accounts are out of order, the data can still be valid and generate the canonical state trie, so I think the wording is misleading.

How would you phrase it?

Copy link
Member

@KolbyML KolbyML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just 2 idea's maybe there is a better one? I just would like it to be more specific so there isn't the implication this verifies we can generate the expected trie. As just because a file isn't sequentially ordered doesn't mean it isn't a valid input to generate the canonical trie order doesn't matter for that.

We want to ensure sequentially as that is the most optimized case for importing the file era2 file. In the importer we should throw a warning if a file doesn't have it's keys linear etc due to that.

So any phase which is more specific and doesn't imply this is validating the data itself is correct or not I think would be fine.

e2store/src/bin/era2-stats.rs Outdated Show resolved Hide resolved
e2store/src/bin/era2-stats.rs Outdated Show resolved Hide resolved
@morph-dev morph-dev changed the title feat: add validation to era2-stats binary feat: add format validation to era2-stats binary Oct 26, 2024
@morph-dev
Copy link
Collaborator Author

Updated as suggested.

@morph-dev morph-dev requested a review from KolbyML October 26, 2024 13:13
Copy link
Member

@KolbyML KolbyML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Collaborator

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

///
/// It can be run with following command:
///
/// ```bash
/// cargo run -p e2store --bin era2-stats --features era2-stats-binary -- <path>
/// cargo run --release -p e2store --bin era2-stats --features era2-stats-binary -- <path>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is just because the operation is slow, so --release helps speed it up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct.

Comment on lines 82 to 88
if index + 1 < account.storage_count {
assert_eq!(
storage.len(),
10_000_000,
"Non-final storage entry doesn't have 10M items",
);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a minute to grok this (granted, I was unfamiliar with the era format). Maybe it's the double-negative in the error message? A quick comment might help speed up comprehension for era noobs. Something like:

Suggested change
if index + 1 < account.storage_count {
assert_eq!(
storage.len(),
10_000_000,
"Non-final storage entry doesn't have 10M items",
);
}
// Validate that every chunk of storage is full, with 10M items (skipping the final chunk)
if index + 1 < account.storage_count {
assert_eq!(
storage.len(),
10_000_000,
"Non-final storage entry doesn't have 10M items",
);
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment, removed double-negative and added more details to the message.

e2store/src/bin/era2-stats.rs Outdated Show resolved Hide resolved
@morph-dev morph-dev merged commit bf02bc3 into ethereum:master Oct 27, 2024
9 checks passed
@morph-dev morph-dev deleted the era2_validation branch October 27, 2024 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants