Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal default name mangling #331

Open
TravisCardwell opened this issue Dec 4, 2024 · 6 comments
Open

Minimal default name mangling #331

TravisCardwell opened this issue Dec 4, 2024 · 6 comments

Comments

@TravisCardwell
Copy link
Collaborator

In #312 (comment) @phadej wrote:

IMHO, the default name mangling should do as little as necessary to make valid Haskell names.

Doing underscores to camelCase like translations feels a bit tricky. Perfectly we will be able to describe the mangling rules in few sentences. Users should be able to predict what the Haskell names will be. IMHO it doesn't matter what are the "idiomatic" names in Haskell.

In particular, recently I started to dislike the C prefix. Why we have it. (Foreign.C.Types has it for a reason, but we don't share the same reasoning: there are no types to clash with).

In #312 (comment) @TravisCardwell replied:

This is a question for @edsko, but I think it is just a matter of taste/preference.

A few technical notes:

  • A prefix is required for type constructors to preserve leading underscores without translation or escaping, since type constructor names may not start with an underscore.

  • When preserving C names as much as possible, a prefix makes it possible to avoid converting the case of the first letter of the C name.

I discussed this with @edsko yesterday.

We currently provide two sets of name manglers. The default (defaultNameMangler) creates Haskell-style type/constructor names while variables are minimally mangled. It is safer than haskellNameMangler, which creates Haskell-style variable names as well. By "safety," I mean risk of name collision. "Safe" means relatively low risk. In addition, the API enables developers to create their own name manglers.

@edsko is fine with making the default do as little as necessary to make valid Haskell names. Users who want Haskell-style names can use the haskellNameMangler.

@phadej Do you think it is worth adding a prefix in order to avoid case conversion? Example: C types foo and Foo may both exist.

If we do not prefix every type constructor name, we have to decide what to do about leading underscores. Should we include a prefix only in that case?

@phadej
Copy link
Collaborator

phadej commented Dec 5, 2024

Example: C types foo and Foo may both exist.

Would they really in any reasonable library?

@TravisCardwell
Copy link
Collaborator Author

TravisCardwell commented Dec 5, 2024

I read that as grounds to not prefix by default. Anybody who uses a library that causes such collisions should work around them by configuring a name mangler that uses prefixes. The name mangling DSL makes it easy to do so, so I think that this is a very reasonable approach.

Do the following strategies look like what you have in mind?

  • Module names, (type class names), type constructor names:
    1. Escape any invalid characters
    2. Convert first character to uppercase
    3. Add a (C) prefix only if the first character is invalid as a first character (such as an underscore)
  • Constructor names: always the same as the type constructor
  • (Type variable names), variable names:
    1. Escape any invalid characters
    2. Convert first character to lowercase
  • Always join by underscore when joining (parts of) names.
  • Special case: use un prefix for newtype wrappers (created for typedefs).

The example from #306 (reproduced below) is a good one.

typedef struct _linked_list {
  int x;
  struct _linked_list* next; /* linked_list doesn't exist here yet */
} linked_list;

Following the above strategies, we would generate the following.

data C_linked_list = C_linked_list {
      c_linked_list_x    :: FC.CInt
    , c_linked_list_next :: FC.Ptr C_linked_list
    }

newtype Linked_list = Linked_list {
      un_Linked_list :: C_linked_list
    }

Since that is a common convention, however, perhaps it would be worthwhile to handle it specially? For example, we could generate the following, dropping the underscore and treating it like the C names are the same (no newtype).

data Linked_list = Linked_list {
      linked_list_x    :: FC.CInt
    , linked_list_next :: FC.Ptr Linked_list
    }

@phadej
Copy link
Collaborator

phadej commented Dec 5, 2024

we could generate the following, dropping the underscore and treating it like the C names are the same

We can't. Firstly it's quite tricky to implement. Currently translation only considers the C names (and those only in quite special cases). The name mangling is an utility.

And secondly, the less surprising (i.e. more predictable) the pipeline is the better.

TL;DR lets keep it simple

@TravisCardwell
Copy link
Collaborator Author

Anonymous structures for named fields are not being named correctly. Some name mangling functionality is now implemented in mkDefnName, in HsBindgen.C.Fold.Type.

If we do this, then users are not able to customize how such structures are named. I can change the implementation to be consistent with the new name mangling defaults, but it would of course not be consistent with the haskellNameMangler or user-defined name manglers.

Implementing name mangling in (just) HsBindgen.Hs.AST.Name allows us to provide consistent name mangling across different name mangling options, as well as allows users to define their own name mangling rules. There is already a context for this, but a difference is that the mkDefnName implementation uses all of the nested structure names (Path). We can modify the context to provide this information.

Thoughts?

@TravisCardwell
Copy link
Collaborator Author

TravisCardwell commented Dec 8, 2024

Notes while reviewing the new defaults:

  • There are some single-letter constants defined in distilled_lib_1.h; for example C constant A is translated to a in Haskell. Users working with such code may run into compiler warnings when such names collide with local variable names. This could be a good use for an override. We could add a context that allows users to customize how such names are mangled, but perhaps this is not necessary.
  • Example constant name: C constant SOME_DEFINED_CONSTANT is translated to sOME_DEFINED_CONSTANT in Haskell.

Unrelated issues:

TravisCardwell added a commit that referenced this issue Dec 8, 2024
This commit changes name mangling so that there are minimal changes by
default, not attempting to create Haskell-style names.  One significant
difference with the previous implementation is that it can now add a
prefix if/when the first character is not valid, used to handle leading
underscores when creating type constructors.

`MkHsName` is removed, replaced by `mkHsNamePrefixInvalid` and
`mkHsNameDropInvalid` functions that are now passed to `translateName`
like other options.

`ctxFieldVarSingleConstr` is removed since constructors now have the
same name as types by default, and we do not generate sum types anyway.
If we do so in the future, we need to decide how to do name mangling for
constructors as well as accessors.

`HsBindgen.C.Fold.Type.mkDefnName` is changed to follow the conventions
of the new defaults.
TravisCardwell added a commit that referenced this issue Dec 10, 2024
This commit changes name mangling so that there are minimal changes by
default, not attempting to create Haskell-style names.  One significant
difference with the previous implementation is that it can now add a
prefix if/when the first character is not valid, used to handle leading
underscores when creating type constructors.

`MkHsName` is removed, replaced by `mkHsNamePrefixInvalid` and
`mkHsNameDropInvalid` functions that are now passed to `translateName`
like other options.

`ctxFieldVarSingleConstr` is removed since constructors now have the
same name as types by default, and we do not generate sum types anyway.
If we do so in the future, we need to decide how to do name mangling for
constructors as well as accessors.

`HsBindgen.C.Fold.Type.mkDefnName` is changed to follow the conventions
of the new defaults.
@TravisCardwell
Copy link
Collaborator Author

Anonymous structures for named fields are not being named correctly. ...

I ended up needing to work on this in the source-info branch, so I cherry-picked the commit implementing minimal default name mangling.

I had to change how overrides work. The initial implementation allowed users to override name mangling by specifying the C name. This is problematic when the name of an anonymous structure/union needs to be overridden. Specifying a C name worked when some name mangling was performed while creating the C AST, but it does not work when all name mangling is done by the name mangler. In the new implementation, users override by specifying the translated Haskell name, which allows overriding names for anonymous structures/unions. Note that since overrides must now be performed after creating the Haskell name, the parameter order of translateName is updated for consistency

The next commit refactors struct name mangling in a way that I hope is acceptable, minimizing changes. It removes DefnName and instead tracks the "path" of a structure, renaming the type to DeclPath. Function mkDefnName is now implemented as part of name mangling (as getDeclPathParts), in a way that works with name mangling options. Context AnonNamedFieldTypeConstrContext is removed, replaced with StructTypeConstrContext that contains a DeclPath instead of a CName. New function translateDeclPath is a high-level function for translating DeclPaths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants