-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
offsetof operator and struct info extracting #1
Comments
Thierry, do you think https://github.com/ThierryGoubier/SmaCC could help me here? |
@marianopeck Yes, it would do the job with a few modifications. What the C parser in SmaCC is missing is the symbol table management (mostly problematic because of typedefs). There is an extended grammar which handles that, and a few other possible techniques. I think a full parser in that case is easier to write than a partial one. Most of the time in a C parser development is spent dealing with not so standard C found in the headers... |
By 'There is an extended grammar which handles that' you mean there is a extended grammar but not the one of SmaCC?? And what are the other 'possible techniques'? I would like to at least cover the not-so-complicated cases in a first step. Thanks! |
Yes, it is a specific grammar, longer than the "ansi", to allow for redefinition of a typedef as a variable. I forgot to get it from my work references, will have to look for that tomorrow. But it solves only a part of the problem. Modifications require managing the symbol table: registering declarations of types and ensuring that they are reported as typedefs tokens instead of identifier tokens. Not too hard. The problems:
So I wonder if the right approach wouldn't be to start from the documentation, and include the tools (C program generation and C parser) to be able to both parse chunks of documentation (for example the function prototypes in the man pages) and generate the C program outputting the values (type sizes, defines values) as well as the Smalltalk code (FFI entry points). Like that, it would help to create and document the API (since a FFI writer will usually build a higher level interface on top of it). |
Found it! It's the Roskind C grammar[1], with some nice explanations [2]. In the literature, there are a few papers about alternatives approaches. I found a different one a few years ago. Small notes: for getting all defines:
For getting the structures, CLANG could be used. [1] http://www.ccs.neu.edu/research/demeter/tools/master/doc/headers/C++Grammar/c4.y |
Noob question... how difficult is to adapt the CParser to use that grammar instead? Another question... by using clang you mean doing something like this? For example, I wanted to get the clang -cc1 -ast-dump -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/spawn.h > spawn2.output 2>&1 And the interesting part is:
|
Yes, and it seems to be very different on mine (have yet a few issues with #includes).
|
Maybe |
You'll get them by your C program anyway... |
This seems to work better. For example, let's say I want to know how how SQFile is. Then: clang -cc1 -fdump-record-layouts -I/Users/mariano/Pharo/git/pharo-vm/platforms/Cross/vm -I/Users/mariano/Pharo/git/pharo-vm/platforms/Mac\ OS/vm/ -I/Users/mariano/Pharo/git/pharo-vm/src/vm/ -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include /Users/mariano/Pharo/git/pharo-vm/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c > vm.output Gives me: *** Dumping AST Record Layout
Type: SQFile
Record:
Layout: <ASTRecordLayout
Size:256
DataSize:256
Alignment:64
FieldOffsets: [0, 64, 128, 192, 200, 208, 216]> |
Yes, and if you have the offsets, then you have everything necessary.
|
Yes! Problem is... I cannot change the aligment from 64 bits to 32 bits... I tried everything CFLAGS="-m32" CPPFLAGS="-m32" clang -cc1 -fdump-record-layouts -I/Users/mariano/Pharo/git/pharo-vm/platforms/Cross/vm -I/Users/mariano/Pharo/git/pharo-vm/platforms/Mac\ OS/vm/ -I/Users/mariano/Pharo/git/pharo-vm/src/vm/ -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include /Users/mariano/Pharo/git/pharo-vm/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c But no way...are you able to get 32 bits? |
Good point. I'll have a look into that later (change the gcc installation selected by clang?) |
Let me ask something else while we try to discover the 32 bits problem... Watching this post http://eli.thegreenplace.net/2012/12/17/dumping-a-c-objects-memory-layout-with-clang do you think we should do the same tip mentioned at the end (run with |
Well, yes, it is an idea because handling includes is complex; for example, in my experiments, I had to point it (with -I) to the headers associated with a specific version of gcc. Now, if you got your includes right by hand, then the result should be the same with preprocessing with -E (but easier to write). |
As recommended by Eliot Miranda:
"Here's another tip. offsetof is an operator in the same class as sizeof that answers the byte offset of a field in a structure. So you can use it to print out the offsets of fields in structures. So offsetof(type,field) evaluates to the zero-relative offset of field in type. So to analyse structures
a) use cc -E to generate raw C containing struct and typedefs
b) using as simple a C parser as you can write, find all the struct and type definitions (you don't have to parse function bodies; you can just scan for { and matching }, being careful to skip over { and } in strings).
c) for all the fields in all the structure definitions generate something like
printFieldOffset(struct stat,st_ino)
If you look at the grammar at the end of the C book its quite minimal. Given that you only want to parse declarations you may be able to use e.g. PetitParser to write your own. here for example is a complete C grammer in YACC. Quite small. https://www.lysator.liu.se/c/ANSI-C-grammar-y.html
Remember you can hack around all the expression parts by tokenising (which will convert strings to tokens) and then searching for function definitions and skipping the parts from { to }. So the part of the grammar necessary for parsing function declarations and typedefs is small. But being able to parse the declaration subset of C is extremely valuable.
"
The text was updated successfully, but these errors were encountered: