Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

offsetof operator and struct info extracting #1

Open
marianopeck opened this issue Jan 26, 2016 · 15 comments
Open

offsetof operator and struct info extracting #1

marianopeck opened this issue Jan 26, 2016 · 15 comments

Comments

@marianopeck
Copy link
Owner

As recommended by Eliot Miranda:

"Here's another tip. offsetof is an operator in the same class as sizeof that answers the byte offset of a field in a structure. So you can use it to print out the offsets of fields in structures. So offsetof(type,field) evaluates to the zero-relative offset of field in type. So to analyse structures
a) use cc -E to generate raw C containing struct and typedefs
b) using as simple a C parser as you can write, find all the struct and type definitions (you don't have to parse function bodies; you can just scan for { and matching }, being careful to skip over { and } in strings).
c) for all the fields in all the structure definitions generate something like
printFieldOffset(struct stat,st_ino)

If you look at the grammar at the end of the C book its quite minimal. Given that you only want to parse declarations you may be able to use e.g. PetitParser to write your own. here for example is a complete C grammer in YACC. Quite small. https://www.lysator.liu.se/c/ANSI-C-grammar-y.html

Remember you can hack around all the expression parts by tokenising (which will convert strings to tokens) and then searching for function definitions and skipping the parts from { to }. So the part of the grammar necessary for parsing function declarations and typedefs is small. But being able to parse the declaration subset of C is extremely valuable.

"

@marianopeck
Copy link
Owner Author

Thierry, do you think https://github.com/ThierryGoubier/SmaCC could help me here?

@ThierryGoubier
Copy link

@marianopeck Yes, it would do the job with a few modifications.

What the C parser in SmaCC is missing is the symbol table management (mostly problematic because of typedefs). There is an extended grammar which handles that, and a few other possible techniques. I think a full parser in that case is easier to write than a partial one. Most of the time in a C parser development is spent dealing with not so standard C found in the headers...

@marianopeck
Copy link
Owner Author

Hi @ThierryGoubier

By 'There is an extended grammar which handles that' you mean there is a extended grammar but not the one of SmaCC??

And what are the other 'possible techniques'?
Or which are the 'with a few modifications'?

I would like to at least cover the not-so-complicated cases in a first step.

Thanks!

@ThierryGoubier
Copy link

Yes, it is a specific grammar, longer than the "ansi", to allow for redefinition of a typedef as a variable. I forgot to get it from my work references, will have to look for that tomorrow. But it solves only a part of the problem.

Modifications require managing the symbol table: registering declarations of types and ensuring that they are reported as typedefs tokens instead of identifier tokens. Not too hard.

The problems:

  • A lot of the information in the headers are #defines, and those defines are wiped out when preprocessing (gcc -E).
  • Writing your own preprocessor is messy and a lengthy job, usually avoided, in part because the ifdefs in your header files are a mess to track.
  • There is an option for gcc -E to keep defines in the output, but then you get a whole mess of unrelated defines with the ones you want.
  • Some defines do not resolve as functions or values, but instead as pure C code (define MACRO ({ ... dozens of lines of C ... })
  • Your C parser does not handle that part, but requires preprocessing to be done.
  • Functions, definitions and defines are dependent on the options you pass to the preprocessor (level of C)
  • headers are often full of junk you are not interested in when you use the library.

So I wonder if the right approach wouldn't be to start from the documentation, and include the tools (C program generation and C parser) to be able to both parse chunks of documentation (for example the function prototypes in the man pages) and generate the C program outputting the values (type sizes, defines values) as well as the Smalltalk code (FFI entry points). Like that, it would help to create and document the API (since a FFI writer will usually build a higher level interface on top of it).

@ThierryGoubier
Copy link

Found it! It's the Roskind C grammar[1], with some nice explanations [2]. In the literature, there are a few papers about alternatives approaches. I found a different one a few years ago.

Small notes: for getting all defines:

gcc -E -dM

For getting the structures, CLANG could be used.

[1] http://www.ccs.neu.edu/research/demeter/tools/master/doc/headers/C++Grammar/c4.y
[2] https://pdos.csail.mit.edu/archive/l/c/roskind.html

@marianopeck
Copy link
Owner Author

Hi @ThierryGoubier

Noob question... how difficult is to adapt the CParser to use that grammar instead?

Another question... by using clang you mean doing something like this? For example, I wanted to get the posix_spawn_file_actions_t from spwan.h. Then I did:

 clang -cc1 -ast-dump -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/spawn.h > spawn2.output 2>&1

And the interesting part is:

   |-TypedefDecl 0x10404ee30 <line:52:1, col:16> col:16 referenced posix_spawn_file_actions_t 'void *'

@ThierryGoubier
Copy link

Yes, and it seems to be very different on mine (have yet a few issues with #includes).

|-RecordDecl 0x224efc0 <line:44:9, line:50:1> line:44:9 struct definition
| |-FieldDecl 0x224f070 <line:46:3, col:7> col:7 __allocated 'int'
| |-FieldDecl 0x224f0d0 <line:47:3, col:7> col:7 __used 'int'
| |-RecordDecl 0x224f120 parent 0x21cc7c0 <line:48:3, col:10> col:10 struct __spawn_action
| |-FieldDecl 0x224f280 <col:3, col:26> col:26 __actions 'struct __spawn_action *'
| `-FieldDecl 0x224f310 <line:49:3, col:15> col:7 __pad 'int [16]'
|-TypedefDecl 0x224f3b0 <line:44:1, line:50:3> col:3 referenced posix_spawn_file_actions_t 'struct posix_spawn_file_actions_t':'posix_spawn_file_actions_t'

@marianopeck
Copy link
Owner Author

Maybe -fdump-record-layouts ...

@ThierryGoubier
Copy link

You'll get them by your C program anyway...

@marianopeck
Copy link
Owner Author

This seems to work better. For example, let's say I want to know how how SQFile is. Then:

 clang -cc1 -fdump-record-layouts -I/Users/mariano/Pharo/git/pharo-vm/platforms/Cross/vm -I/Users/mariano/Pharo/git/pharo-vm/platforms/Mac\ OS/vm/ -I/Users/mariano/Pharo/git/pharo-vm/src/vm/ -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include /Users/mariano/Pharo/git/pharo-vm/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c > vm.output

Gives me:

*** Dumping AST Record Layout
Type: SQFile
Record: 
Layout: <ASTRecordLayout
  Size:256
  DataSize:256
  Alignment:64
  FieldOffsets: [0, 64, 128, 192, 200, 208, 216]>

@ThierryGoubier
Copy link

Yes, and if you have the offsets, then you have everything necessary.

*** Dumping AST Record Layout
Type: posix_spawn_file_actions_t
Record: 
Layout: <ASTRecordLayout
  Size:640
  DataSize:640
  Alignment:64
  FieldOffsets: [0, 32, 64, 128]>

@marianopeck
Copy link
Owner Author

Yes! Problem is... I cannot change the aligment from 64 bits to 32 bits... I tried everything -m32... and even this:

CFLAGS="-m32" CPPFLAGS="-m32"  clang -cc1 -fdump-record-layouts -I/Users/mariano/Pharo/git/pharo-vm/platforms/Cross/vm -I/Users/mariano/Pharo/git/pharo-vm/platforms/Mac\ OS/vm/ -I/Users/mariano/Pharo/git/pharo-vm/src/vm/ -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include /Users/mariano/Pharo/git/pharo-vm/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c

But no way...are you able to get 32 bits?

@ThierryGoubier
Copy link

Good point. I'll have a look into that later (change the gcc installation selected by clang?)

@marianopeck
Copy link
Owner Author

Hi @ThierryGoubier

Let me ask something else while we try to discover the 32 bits problem... Watching this post http://eli.thegreenplace.net/2012/12/17/dumping-a-c-objects-memory-layout-with-clang do you think we should do the same tip mentioned at the end (run with -E) and then run the clang --cc1 -fdump-record-layouts with the result of -E ???

@ThierryGoubier
Copy link

Well, yes, it is an idea because handling includes is complex; for example, in my experiments, I had to point it (with -I) to the headers associated with a specific version of gcc. Now, if you got your includes right by hand, then the result should be the same with preprocessing with -E (but easier to write).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants