Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to treat structs and newtypes #306

Closed
phadej opened this issue Nov 26, 2024 · 9 comments
Closed

How to treat structs and newtypes #306

phadej opened this issue Nov 26, 2024 · 9 comments

Comments

@phadej
Copy link
Collaborator

phadej commented Nov 26, 2024

How to treat structs and newtypes, there are various cases explained below;
and I suggest:

  • Treat structs (enums, unions) and typedefs as names in the same namespace (like C++ and rust-bindgen):
  • For typedef struct s { ... } t generate only one data declration when
    there is no name s or s and t are the same name; otherwise (when s and t are different),
    generate a data declration for struct, and newtype for typedef.
  • Always treat nested structs as separate data-types.

The nested structs is not as closely related to two first bulled points,
but it does affect what name mangling have to support (i.e. does it need to support no-struct-name no-field-name or not).


The original issue is that C allows us to do

struct foo { int x; int y; };
typedef double foo;

void fun1(struct foo x);
void fun2(foo x);

Currently we'd create two type definitions (CFoo). So does `rust-bindgen:

pub struct foo {
    pub x: ::std::os::raw::c_int,
    pub y: ::std::os::raw::c_int,
}

pub type foo = f64;
extern "C" {
    pub fn fun1(x: foo);
}
extern "C" {
    pub fn fun2(x: foo);
}

which is wrong. A valid reasoning is that the above C example
is invalid as C++:

ex.c:2:16: error: conflicting declaration ‘typedef double foo’
    2 | typedef double foo;
      |

So how to deal with these errors: I suggest that we follow rust-bindgen
example and not complicate matters.


When we have

typedef struct { int x; int y } foo;

it's reasonable to generate data Foo = MkFoo { fooX :: Int, fooY :: Int }.

Also when we have

typedef struct foo { int x; Int y } foo;

it's reasonable to generate only data Foo = MkFoo { foo X :: Int, fooY :: Int }.

Note: this only is reasonable if we treat struct foo and typedef ... foo as the same: If we decide to treat them differently, we would need to generate a type for struct foo and a (new)type for typedef ... foo, i.e.:

data StructFoo = MkStructFoo { foo X :: Int, fooY :: Int }
newtype Foo = MkFoo StructFoo

A third variant is when struct tag and newtype name differ:

typedef struct bar { int x; Int y } foo;

this is not uncommon in C, as you have to write:

typedef struct _linked_list {
  int x;
  struct _linked_list* next; /* linked_list doesn't exist here yet */
} linked_list;

rust-bindgen doesn't try to be too smart here, and generates:

pub struct _linked_list {
    pub x: ::std::os::raw::c_int,
    pub next: *mut _linked_list,
}
pub type linked_list = _linked_list;

and I suggest that we similarly generate

data C_LinkedList = MkC_LinkedList { x :: Int, next :: Ptr C_LinkedList }
newtype CLinkedList = MkCLinkedList C_LinkedList

Note: rust-bindgen uses type alias for newtypes)
Note: @TravisCardwell we need to make sure that underscore-difference does result in different Haskell names, as above example is not uncommon. See e.g. https://stackoverflow.com/a/32405369 where from I borrowed the example.

We could be smart and do just

data CLinkedList = MKCLinkedList { x :: Int, next :: Ptr CLinkedList }

using the typedef name, but than we'd really need a lookup table
to convert possible struct _linked_list usages (in function signatures)
to the newtype name. Not impossible, but it's an additional complexity.

Note: this is consistent even if the C code has forward declrations,
and splits struct definition and typedef:

struct foo;
struct foo { int x; int y; };
typedef struct foo foo;

In this case we can think that we omit the typedef definition,
when it doesn't introduce a new name!


Nested (possibly anonymous and unnamed) structures: even for

no-struct-name no-field-name

struct foo {
  int x;
  struct {
	  int y;
	  int z;
  };
};

rust-bindgen generates

pub struct foo {
    pub x: ::std::os::raw::c_int,
    pub __bindgen_anon_1: foo__bindgen_ty_1,
}
pub struct foo__bindgen_ty_1 {
    pub y: ::std::os::raw::c_int,
    pub z: ::std::os::raw::c_int,
}

similarly struct-name no-field-name

struct foo {
  int x;
  struct bar {
	  int y;
	  int z;
  };
};

int fun1(struct bar x);

results into

pub struct foo {
    pub x: ::std::os::raw::c_int,
}
pub struct foo_bar {
    pub y: ::std::os::raw::c_int,
    pub z: ::std::os::raw::c_int,
}
extern "C" {
    pub fn fun1(x: foo_bar) -> ::std::os::raw::c_int;
}

what's interesting is that struct bar name is manged with foo,
though in C in doesn't need to (and looks like rust-bindgen has some name mangling type-lookup table).
(And this code is valid in C++ too).

then no-struct-name field-name

struct foo {
  int x;
  struct {
	  int y;
	  int z;
  } bar;
};

is converted to

pub struct foo {
    pub x: ::std::os::raw::c_int,
    pub bar: foo__bindgen_ty_1,
}
pub struct foo__bindgen_ty_1 {
    pub y: ::std::os::raw::c_int,
    pub z: ::std::os::raw::c_int,
}

Note: no field name used in generated type, rust-bindgen could be smarter here

The last case, struct-name field-name:

struct foo {
  int x;
  struct bar {
	  int y;
	  int z;
  } baz;
};
pub struct foo {
    pub x: ::std::os::raw::c_int,
    pub baz: foo_bar,
}
pub struct foo_bar {
    pub y: ::std::os::raw::c_int,
    pub z: ::std::os::raw::c_int,
}

So for nested structures I suggest that we again follow rust-bindgen,
and always create a separate structure for them, even in no-struct-name no-field-name case.
This will be uniform approach.

@TravisCardwell
Copy link
Collaborator

Related to #189

@TravisCardwell
Copy link
Collaborator

  • Treat structs (enums, unions) and typedefs as names in the same namespace

This sounds good to me.

  • For typedef struct s { ... } t generate only one data declration when there is no name s or s and t are the same name;

This sounds good to me.

Perhaps we could generate a newtype when appropriate.

otherwise (when s and t are different), generate a data declration for struct, and newtype for typedef.

Another option is to translate a C typedef to a Haskell type alias. Perhaps this would be a pretty faithful translation, and it would only introduce a type constructor. Perhaps the type would be easier to use, without having to unwrap, especially use of instances.

  • Always treat nested structs as separate data-types.

This sounds good to me.

@TravisCardwell
Copy link
Collaborator

struct foo { int x; int y; };
typedef double foo;

I agree that we should just translate normally and let the conflicting definitions of a CFoo type constructor cause a compilation error.

@TravisCardwell
Copy link
Collaborator

typedef struct _linked_list {
  int x;
  struct _linked_list* next; /* linked_list doesn't exist here yet */
} linked_list;

In the default name manglers, such leading underscores are dropped.

λ: import HsBindgen.Hs.AST.Name
λ: mangleTypeConstrName defaultNameMangler $ TypeConstrContext "_linked_list"
"CLinkedList"
it :: HsName NsTypeConstr

There is a trade-off between safety and taste/style, and we deliberately chose to create Haskell-style type constructor names at the expense of safety. A user who encounters such an issue should create a custom name mangler that handles the types of names in their C code. Perhaps we need to reconsider this decision?

data C_LinkedList = MkC_LinkedList { x :: Int, next :: Ptr C_LinkedList }

We can implement this suggestion by handling a leading underscore as a special case. If we do this, should it be done for all names or just type constructors?

@TravisCardwell
Copy link
Collaborator

struct foo {
  int x;
  struct {
    int y;
    int z;
  };
};
struct foo {
  int x;
  struct bar {
    int y;
    int z;
  };
};

Oh! I missed these cases! Context TypeConstrContext has an AnonNamedFieldTypeConstrContext for anonymous structures/unions for named fields, but I did not realize that unnamed fields are permitted after all.

For the first case, perhaps something like the following context would work?

data TypeConstrContext =
  ...
  | -- | Context for anonymous structures/unions for unnamed fields
    AnonUnnamedFieldTypeConstrContext {
      -- | Closest named ancestor type context
      ctxAnonUnnamedFieldTypeConstrAncestorCtx :: TypeConstrContext
    , -- | Preorder index (1-based) of unnamed fields in the root type
      ctxAnonUnnamedFieldTypeConstrIdx :: Word
    }

For the second case, it seems like we could translate to CBar as usual. To follow rust-bindgen, however, perhaps something like the following context would work?

data TypeConstrContext =
  ...
  | -- | Context for named structures/unions for unnamed fields
    UnnamedFieldTypeConstrContext {
      -- | Closest named ancestor type context
      ctxUnnamedFieldTypeConstrAncestorCtx :: TypeConstrContext
    , -- | C name for the type
      ctxUnnamedFieldTypeConstrCName :: CName
    }

@phadej
Copy link
Collaborator Author

phadej commented Nov 27, 2024

Another option is to translate a C typedef to a Haskell type alias.

That was already decided in #173 in favour of newtype. I dont see any new info that would make us to reconsider that choice.

@edsko
Copy link
Collaborator

edsko commented Nov 28, 2024

Let's discuss these case by case.

no-struct-name no-field-name

struct foo {
  int x;
  struct {
	  int y;
	  int z;
  };
};

We should flatten here: x, y and z all become direct members of foo (this is also how they would be addressed in C). Unlike Rust bindgen, We should not generate anonymous structs with hard-to-predict (and unnecessary names).

struct-name no-field-name

struct foo {
  int x;
  struct bar {
	  int y;
	  int z;
  };
};

int fun1(struct bar x);

This is an utterly bizarre example 😬 This definition of bar does not affect foo at all (sizeof(foo) is 4). It's literally just the definition of struct bar in a weird place. We should pull it out and proceed as normal.

no-struct-name field-name

struct foo {
  int x;
  struct {
	  int y;
	  int z;
  } bar;
};

Here we should generate a nested struct: the name (already implemented in the current name mangler) is CFooBar, and it will have fields cFooBar_y and cFooBar_z. (Unlike Rust bindgen, we're not numbering anything).

struct-name field-name

struct foo {
  int x;
  struct bar {
	  int y;
	  int z;
  } baz;
};

Similar to the previous case, but it will be CBar rather than CFooBar.

@edsko
Copy link
Collaborator

edsko commented Nov 28, 2024

Regarding newtypesfor structtypedef`s: agreed, but we could consider some customization later (#308).

@edsko edsko closed this as completed Nov 28, 2024
@phadej
Copy link
Collaborator Author

phadej commented Nov 28, 2024

An example for myself

struct foo {
	struct {
		int z;
	} *x;
};

int main() {
	struct foo f = {0};
	f.x->z = 42; // segfaults as accessing nullptr, but valid C anyway
	return 0;
}

The no-struct-name field-name can be wrapped in pointers and arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants