-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add support for DWARF compile units in wasm binaries #210
base: main
Are you sure you want to change the base?
Conversation
dschuff
commented
Jul 31, 2020
- Add support for reading DWARF data from wasm custom sections
- Use the common DWARF compile unit parser to support Bloaty's standard compileunit sink
Hello! |
Hi @dschuff, thanks for the PR! Does the PR seem to give reasonable results when you run it on WebAssembly files? You could add some unit tests if you wanted, and check in some .wasm binaries to test against. In general Bloaty is hard to unit test due because compiler output is rather unpredictable. Also improvements in Bloaty can invalidate existing unit tests, for example if we increase our coverage. |
(sorry, was out of town this week) Actually I do have one question about the output. With this PR it looks something like this for wasm:
(ignore the NANs in the "VM Size" column, VM size isn't meaningful for wasm) |
... oh, it must the case that the size for e.g.
which would seem to indicate that the Code section is being shown as monolithic and no parts of it are contributing to the CU totals (which of course is kind of the whole point). Do you have any idea how I might figure out how to fix that? |
Sorry for the slow reply on this! Ultimately the memory map, which you can see with Usually with DWARF we are relying on VM addresses to break down compile units. But with wasm there are no vm addresses available (at least that was my experience when implementing initial wasm support). Does WASM for DWARF contain enough information to break down the file ranges? |
OK, I think i understand what you're saying, but first a question about what I should expect to see and what the numbers should mean.
Since that 12.6Mi size is greater than the whole code section, I'm sure that the number includes the debug info (aggregated across all the different debuginfo sections) for that CU. When I run bloaty with on that same ELF binary with
In other words bloaty can break down the text section based on CU information from the debuginfo. I guess you're saying that bloaty uses the VM addresses (which, now that I think about it, makes sense because address fields in the debug info are VM address). |
I took another crack at this. Here I made the "VM" space represent just space in the code section. It's a break from what "VM" is supposed to mean (particularly for wasm, which doesn't map the code section at runtime at all). But setting up a VM mapping this way does make DWARF "just work" because the VM addresses in DWARF represent code section offsets. It also has another nice property when just using the wasm "name" section (which is used by bloaty's symbol sink) that it allows you to ignore the space taken up by the name section itself (which will often be stripped out like debug info before shipping). Separately from DWARF, I also thought it might be nice to do sort of the opposite, and make "VM" refer to just the data section, which is actually initialized into memory at runtime (since it's useful to be able to profile bloat in the data section, separately from code bloat). This would repurpose the "VM" idea in a different, incompatible way (which doesn't match the way it's used by DWARF). |
I think this is the right answer. In Bloaty, "VM" should describe parts of the binary that will be mapped directly into memory at runtime. It sounds like for WASM, this only applies to data, not code. So VM in WASM will probably want to describe data only. If we can use DWARF data to refine the "File size" report for the code section, that seems ideal. By the way, another contributor has recently started adding yaml2obj tests for WASM, which make for much more robust tests: https://github.com/google/bloaty/tree/master/tests/wasm |
Yes, this does make sense to me, but...
I don't actually know how to do this without going and changing all the DWARF parsing code, since it wants to add VM ranges. Maybe we could somehow actually have 2 separate "VM" address spaces? One for data, and one for the code section (which could then be displayed separately, or used to refine the file size report, or something?
Yes, I used it on my previous PR, and (coming from the LLVM world) it's nice! |
Or, to put it another way: Without adding a the fake VM range backing the code section as this PR currently does, there are no file ranges added for the code section when reading the DWARF sections (the only file ranges that show up are the dwarf sections themselves). But with this PR, the file ranges in the code section seem to be properly accounted for (in addition to the VM ranges the mirror them). So I don't know how to get the file ranges in the code section reported without also having the VM ranges. |
Yes, I see the conundrum. Bloaty's DWARF support is currently hard-coded to add VM ranges, not file ranges. This is "correct" given the defintion of DWARF. But WASM uses DWARF in a nonstandard way, due to WASM's design which is significantly different than ELF or Mach-O. What if we added some new functions to enum class RangeType {
kVM,
kFile,
};
class RangeSink {
AddRange(const char * analyzer, RangeType type, uint64_t start, uint64_t size);
} Then the DWARF routines could take Would that work?
I think it does, see the tests in tests/dwarf, which uses I don't know if this works in WASM though... |