Add C source information to `Hs` and `SHs` ASTs #316

TravisCardwell · 2024-11-28T23:07:59Z

We would like to track how various parts of our generated ASTs are created, referencing the C source.

One motivation for this is test generation (#22), which requires generating both C and Haskell code for testing. Generating the C test code from a C Header is not a good option because we would need to reimplement a lot of the logic for translating from C to Haskell. If the Haskell AST includes C source information, we could traverse the Haskell AST and determine exactly what test functions are required, perhaps referencing the C Header to get C details.

For example, for a given data declaration (called Struct in Hs and Record in SHs), it is useful to know the name of the corresponding C type, including the C namespace. When generating C code, the namespace determines how an identifier is written.

Ordinary namespace: foo
struct tag namespace: struct foo
union tag namespace: union foo
enum namespace: enum foo

C source information can also be used to improve the generated Haddock documentation (#26). For example, we could output corresponding C names to help users understand/confirm which Haskell maps to which C.

We should include source locations, which may optionally be output in LINE pragmas (#74). We could even consider including source location information in generated Haddock documentation.

Related to #23 (which is for the high-level API)

The text was updated successfully, but these errors were encountered:

TravisCardwell · 2024-11-28T23:10:13Z

One idea is to change CName to be a phantom type with a Namespace parameter, like HsName.

data Namespace =
    NsOrdinary
  | NsStructTag
  | NsUnionTag
  | NsEnumTag

newtype CName (ns :: Namespace) = CName { getCName :: Text }
  ...

This type does not include member namespaces, which are separate per structure/union. Discussing with @edsko yesterday, he suggested that we may want to model member namespaces as well, perhaps using a GADT.

C identifier namespace reference:

C99 6.2.3
C11 6.2.3
C17 6.2.3
C23 6.2.3
- Added standard attributes and attribute prefixes
- Added trailing identifier in an attribute prefixed token

phadej · 2024-11-28T23:23:28Z

struct tag namespace: struct foo

it doesn't work in general. Counter example

struct foo {
  struct { int x; int y } bar;
  int z;
}

we will create CFooBar which doesn't correspond to any type we could reference in C.

More generally, what you propose is essentially delaying name mangling.

TravisCardwell · 2024-11-28T23:42:12Z

Thanks for the example!

I think that we should track how parts of our generated ASTs are created, so in this case we could include information for CFooBar that records that it is created for an anonymous structure. When generating tests, this information lets us know that we cannot create some tests for that type. For example, the PokePeekXSameSemanticsX test can be implemented because it does not require C, while the HsSizeOfXEqCSizeOfX cannot be implemented because there is no way to reference the anonymous structure in C.

I do not mean to suggest that name mangling should be delayed. A Struct should continue to contain structName :: HsName NsTypeConstr. The suggestion is to add information about how/why the particular Struct is created.

Regarding documentation, we could generate documentation like the following:

This type corresponds to the anonymous @struct@ defined for field @bar@ of
@struct foo@.  Source: @foobar.h@ line 121

phadej · 2024-11-28T23:51:49Z

The suggestion is to add information about how/why the particular Struct is created.

Regarding documentation, we could generate documentation like the following:
This type corresponds to the anonymous @struct@ defined for field @bar@ of
@struct foo@.  Source: @foobar.h@ line 121

which is essentially preserving exactly the information (to be) passed to the name mangling machinery. (And source location, but that is purely informative bit).

TravisCardwell · 2024-11-29T00:09:06Z

Indeed. Perhaps another way to put it is that name mangling is not invertible.

When generating tests, name CFooBar does not provide enough information for us to determine which tests need to be created for it. Additional information is required. If such information is included in the AST, it can be used to make the necessary decisions. With the current ASTs, generating tests for CFooBar requires that we process the C Header (again) to determine how/why CFooBar is created.

Documentation like the above example could help users understand/confirm which Haskell maps to which C. When a user sees name CFooBar, they may want to confirm that it is for an anonymous struct and not a C type named foo_bar. I imagine that it is not necessary in many cases, but it would likely be appreciated when the C API uses similar names that may be confusing, especially after translation to Haskell.

phadej · 2024-11-29T16:03:54Z

I think that the easiest solution to go forward is to add a field with clang_getTypeSpelling result. We only have to filter out invalid spellings (like struct (unnamed struct at ex.h:2:2)), but AFAICT these are easy to spot (they end with invalid character )).

CXString clang_getTypeSpelling (CXType CT)
Pretty-print the underlying type using the rules of the language of the translation unit from which it came.

That string is not CName, but I don't think we need to parse it in any way, simply use it as is.

TravisCardwell · 2024-12-02T00:35:00Z

Thank you very much for the suggestion. I will give that a try.

This commit adds a type spelling field to the following ASTs: * `C` phase: `Struct`, `Enu`, `Typedef` * Type: `Text` * `Hs` phase: `Struct`, `Newtype` * Type: `Maybe Text` * `SHs` phase: `Record`, `Newtype` * Type: `Maybe Text` As suggested in #316, the string is not parsed. This commit simply uses `Text`, but we could implement a `newtype` wrapper if desired. The goal is to make it easier to generate tests for structures, enumerations, and `typedef`s of structures/enumerations. Note that this is also required for unions, but those are not implemented yet. This is the minimal change required to do this; the `Maybe` is needed because it does *not* track the type spelling for macros. (Cherry-picked from `source-info` for experimentation)

edsko · 2024-12-04T08:35:01Z

After discussing this with @TravisCardwell , the idea to annotate the Haskell tree with some kind of Reason or Origin that gives us information such as

This type corresponds to the anonymous @struct@ defined for field @bar@ of
@struct foo@.  Source: @foobar.h@ line 121

is useful for both test generation and documentation generation; @phadej 's objection that the proposal as originally stated doesn't quite work ("which doesn't correspond to any type we could reference in C.") are especially important cases to consider, both for tests and for documentation generation; we agreed that Travis will submit a draft PR with an initial attempt at this so that we have something concrete to discuss and refine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add C source information to `Hs` and `SHs` ASTs #316

Add C source information to `Hs` and `SHs` ASTs #316

TravisCardwell commented Nov 28, 2024

TravisCardwell commented Nov 28, 2024 •

edited

Loading

phadej commented Nov 28, 2024 •

edited

Loading

TravisCardwell commented Nov 28, 2024

phadej commented Nov 28, 2024

TravisCardwell commented Nov 29, 2024

phadej commented Nov 29, 2024 •

edited

Loading

TravisCardwell commented Dec 2, 2024

edsko commented Dec 4, 2024

Add C source information to Hs and SHs ASTs #316

Add C source information to Hs and SHs ASTs #316

Comments

TravisCardwell commented Nov 28, 2024

TravisCardwell commented Nov 28, 2024 • edited Loading

phadej commented Nov 28, 2024 • edited Loading

TravisCardwell commented Nov 28, 2024

phadej commented Nov 28, 2024

TravisCardwell commented Nov 29, 2024

phadej commented Nov 29, 2024 • edited Loading

TravisCardwell commented Dec 2, 2024

edsko commented Dec 4, 2024

Add C source information to `Hs` and `SHs` ASTs #316

Add C source information to `Hs` and `SHs` ASTs #316

TravisCardwell commented Nov 28, 2024 •

edited

Loading

phadej commented Nov 28, 2024 •

edited

Loading

phadej commented Nov 29, 2024 •

edited

Loading