Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[move] Added bytecode map generation during bytecode disassembly #20797

Merged
merged 13 commits into from
Jan 10, 2025

Conversation

awelc
Copy link
Contributor

@awelc awelc commented Jan 6, 2025

Description

This PR adds support for generating "bytecode maps" (source maps for disassembled bytecode)

Test plan

A new test has been added and all other test have to pass

@awelc awelc requested a review from tzakian January 6, 2025 22:05
@awelc awelc self-assigned this Jan 6, 2025
Copy link

vercel bot commented Jan 6, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 9, 2025 10:44pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Jan 9, 2025 10:44pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Jan 9, 2025 10:44pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Jan 9, 2025 10:44pm

@awelc awelc temporarily deployed to sui-typescript-aws-kms-test-env January 6, 2025 22:05 — with GitHub Actions Inactive
)
.map_err(|e| Error::Internal(format!("Error creating disassembler: {e}")))
.extend()?
.disassemble()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about this. I'd probably prefer something like adding a new function, e.g., disasssemble_with_source_map (or something like that) that would also generate the source map during disassembly, and then keeping this function the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines 75 to 82
if bytecode_map {
d.generate_bytecode_map();
}
let (disassemble_string, bcode_map) = d.disassemble()?;
if bytecode_map {
println!("{}", serialize_to_json_string(&bcode_map)?);
}
println!("{}", disassemble_string);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think places like this will be nicer with the two function approach as this will also allow removing the generate_bytecode_map method as well. E.g., something like this

Suggested change
if bytecode_map {
d.generate_bytecode_map();
}
let (disassemble_string, bcode_map) = d.disassemble()?;
if bytecode_map {
println!("{}", serialize_to_json_string(&bcode_map)?);
}
println!("{}", disassemble_string);
let diss = if bytecode_map {
let (disassemble_string, bcode_map) = d.disassemble_with_source_map()?;
println!("{}", serialize_to_json_string(&bcode_map);
disassemble_string
} else {
d.disassemble()?;
};
println!("{}", disassemble_string);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it allowed me to remove this function but it also means that having a global flag is no longer necessary (and somewhat awkward even)

@@ -253,9 +272,59 @@ pub fn run_move_unit_tests<W: Write + Send>(
let coverage_map = CoverageMap::from_trace_file(trace_path);
output_map_to_file(coverage_map_path, &coverage_map).unwrap();
}
if save_disassembly {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get why you put it here, since for the immediate use case it's only needed during testing, but producing disassembly here feels wrong to me.

IMO it feels like an option that should be able to be fed into the build process, and should be done when saving a compiled package out on that side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally followed my principle to avoid modifications to other crates but it certainly makes sense in this case. I think it also made the code cleaner as I didn't have to repeat some of the boilerplate.

@@ -360,7 +448,20 @@ impl<'a> Disassembler<'a> {
"",
buffer,
|buffer, (name, ty)| {
any_write!(buffer, "\t{name}: ")?;
any_write!(buffer, "\t")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lot of these places could be simplified if any_write/any_writeln returned back start and end offsets. e.g., you'd then have something like

let (fname_start, fname_end) = any_write!(buffer, "{name}")?;

I'd also see about abstracting the source map generation and location generation as well. I think just a method would be fine for this. e.g.,

fn map_location(&self, start: u32, end: u32, action: FnMut(Loc) -> ()) {
   if self.bytecode_map {
     let loc = Loc::new(FileHash::empty(), start, end);
     action(loc)
   }
}

With that this part could look something like

let (start, end) = any_write!(buffer, "{name}")?;
self.map_location(start, end, |loc| bcode_map.add_struct_field_mapping(struct_def_idx, loc))

Alternately, if map_location took a tuple this could be

self.map_location(any_write!(buffer, "{name}")?, |loc| bcode_map.add_struct_field_mapping(struct_def_idx, loc));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did something similar, but I actually pulled the Loc generation into macros. I think it's probably simpler and thus possibly more readable than the callback solution

any_write!(buffer, " }}")
any_write!(buffer, " }}")?;
// perhaps surprisingly, but the location is of the whole variant
// and not just of its name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed surprising. Maybe we should make it just be of its name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't want to sneak it in here, but can certainly do it in a separate PR if need be

unit: &CompiledUnitWithSource,
) -> Result<()> {
let root_package = self.package.compiled_package_info.package_name;
let bytecode_modules_dir = CompiledPackageLayout::CompiledModules.path();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of two minds on this. I think I'm fine with this, but part of me thinks this might be nice to be in its own separate top-level dir under the build/<pkg_name> (i.e., a sibling of bytecode_modules/sources (Etc)).

Thoughts? I'm pretty open to either option I think

Copy link
Contributor Author

@awelc awelc Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's a great point. Putting them in a separate directory for trace-debugger to pick up will make it easier down the line - we can have other methods of generating disassembly (e.g., from the replay tool) to keep the same directory structure. I am putting them into build/disassembly directory then

Comment on lines 408 to 409
bcode_map_gen: bool,
bcode_map: &mut SourceMap,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be so nitty, but I only mention it as I think it will clean up the logic...

What about passing bcode_map: &mut Option<SourceMap> here. Then below, instead of things like

if bcode_map_gen {
   bcode_map.add_top_level_struct_mapping(...)
}

This could be rewritten as

bcode_map.map(|m| m.add_top_level_struct_mapping(..));

or even as

if let Some(m) = bcode_map {
   m.add_top_level_struct_mapping(...);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - will also let me shave off a parameter in a few places

// NB: The order in which these are called is important as each function is effectful.
self.print_header(buffer)?;
let mut bcode_map = self.print_header(buffer)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below but I think turning this into an option may make things even cleaner.

SignatureToken::U256 => any_write!(buffer, "u256"),
SignatureToken::Address => any_write!(buffer, "address"),
SignatureToken::Signer => any_write!(buffer, "signer"),
SignatureToken::Bool => any_write!(buffer, "bool").map(|_| ()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO would be cleaner to do something like

match sig_tok {
 ....
}.map(|_| ())

or to ? all of these and then have an Ok(()) at the bottom of the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that some match arms return Result<()> instead of Result<Loc>, and I had to unify them which made this look uglier...

Copy link
Contributor

@tzakian tzakian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit to consider, but otherwise LGTM!

@@ -279,13 +279,14 @@ impl<'a> Disassembler<'a> {
bcode_map_gen: bool,
) -> Result<SourceMap> {
// NB: The order in which these are called is important as each function is effectful.
let mut bcode_map = self.print_header(buffer)?;
let mut res = Some(self.print_header(buffer)?);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: consider if this needs to make the source map at all, but I think this is fine as-is if you'd prefer to keep it like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it decided to let it simply return largely empty SourceMap rather than None is that on the client side we then don't have to do another check (and possibly user-facing message) when unwrapping (e.g. for printing)

@awelc awelc merged commit 2b655b2 into main Jan 10, 2025
51 checks passed
@awelc awelc deleted the aw/bytecode-map branch January 10, 2025 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants