-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post-hoc allocation site analysis #99
Comments
(Of course the right solution is to extend the toolchain, e.g. by expanding what is captured in debugging information, such that it's possible to reconstruct the |
Roughly the solution I have in mind is:
Chunks that use functionlike macros in a functionlike way will have "undeclared function" problems but will probably parse. Chunks that use functionlike macros in other ways will probably have syntactic errors. Chunks that use non-functionlike macros will have name resolution problems for types or variables that are being referenced through a macro. Will they still parse? Not in general... the classic C |
Probably the best we could do is pre-scan for unresolvable identifiers, then make a guess about which is a type. This is overkill for now... just discard the chunks that fail to parse, i.e. hope they are not functions that do allocations. |
I think the slight generalisation of this to a "C-repairing parser" could be a nice student project. Step 1 is to write a parser for modern / GNU-extended C that is tolerant of type/var ambiguities, producing multiple parses if necessary. Step 2 is to traverse the resulting ASTs to scrape a list of the depended-on definitions for each top-level definition -- again, there may be "type or value" disjunction as a result of the ambiguity. Step 3 is to synthesise definitions of the dependencies, using the DWARF information. Extensions could try to work with syntax-generating macros somehow (e.g. imagine how to handle Linux's MODULE-* macros, say). |
It would be very useful to be able to run a
dumpallocs
-style analysis for a built binary, given its debug info and source tree but without rebuilding it -- which takes time.This is particularly important for introspecting on the ld.so (#98) because we don't want to have to rebuild the whole ld.so, but we could reasonably expect a source tree to be available.
An obvious problem is that
dumpallocs
requires a complete.i
file to analyse, but DWARF does not currently include enough information to reconstruct the.i
for any compilation unit. It does, however, contain various hints: many header files will be mentioned in the line table (but many won't!) and we have information about the compiler including its version (if that matters) and some of its command-line arguments (apparently just thecc1
-style ones, though).(One interesting question is to what extent the macro information, enabled at
-g3
with GCC, fills these gaps. I'm not pursuing this because it's almost never used in the field.)I had a go at creating a simple awk script that mimics the preprocessor and tries to reconstruct where the included files came from. On simple examples this works*, but on realistic examples (e.g. compilation units from glibc's ld.so), even with maximum guesswork, it falls down for a number of reasons:
#include_next
, again in the absence of information about include paths' orderingWhat might be a better way forward is a way to reconstitute a 'good enough'
.i
file even in the absence of such information. That would require a forgiving parser. C is already forgiving about missing function prototypes, so we'd mostly be worried about type information (which of course affects the parse tree!). We could even usedwarfidl
to generate a rendering of all the type information up-front, and then our parser would just have to be forgiving of duplicates.Some problems might remain, e.g. function-like macros used as syntax generators -- if we don't manage to slurp the definition of such a macro, we will choke. However, we expect to get most headers, so it could still work most of the time.
* The definition of 'works' here is already a bit generous. Since we don't have information about
-D
options on the command line, or builtin macros defined by the compiler, we don't know which#if
/#ifdef
branches are taken. So the tool will follow both branches but keeping track of a 'path condition'. The idea would then be to synthesise an environment of-D
options that could have generated the output file's DWARF, e.g. would have included the set of embodied header files that the line table reports (among others, but hitting all of those). That already is into SMT-solver territory.The text was updated successfully, but these errors were encountered: