Skip to content
This repository has been archived by the owner on Aug 2, 2019. It is now read-only.

Reduce special cases involving the void type #43

Open
wks opened this issue Oct 2, 2015 · 7 comments
Open

Reduce special cases involving the void type #43

wks opened this issue Oct 2, 2015 · 7 comments

Comments

@wks
Copy link
Member

wks commented Oct 2, 2015

The current status

The void type is a special type in the Mu type system. It has no value, and thus many instructions/mechanisms have special cases for the void type.

Instructions that have special cases for void:

  • RET and RETVOID: Since void has no value (In fact it does. The return value of the BRANCH instruction, for example, is a value of the void type.), we needed a special syntax to return void, thus we have RETVOID.
  • The "new-stack clause" of the SWAPSTACK instruction: PASS_VALUE <T> %val and PASS_VOID: for the same reason why we have RET and RETVOID.

The trap handler has a special case for void:

Just like SWAPSTACK, the trap handler may rebind the thread to a stack and either "pass a value" or "pass void" or "throw an exception".

Other existing uses

Instructions that always return void: BRAHCN, BRANCH2, SELECT, TAILCALL, RET, RETVOID, THROW, STORE, FENCE, some common instructions: @uvm.kill_stack, @uvm.thread_exit, @uvm.native.unpin, @uvm.native.unexpose, @uvm.meta.load_bundle, @uvm.meta.load_hail, @uvm.meta.pop_frame, @uvm.meta.push_frame, @uvm.meta.enable_watchpoint, @uvm.meta.disable_watchpoint, @uvm.meta.set_trap_handler: These instructions do not return meaningful values.

Instructions that may return void sometimes: CALL, TRAP, WATCHPOINT, CCALL, SWAP_STACK: The callee, client, swappee, or whatever the other end of communication is, may not return meaningful values.

Current properties of void

void can only be used in 3 cases:

  1. As the type of allocation units that do not represent values. Hence it is usable as the referent type of reference types and pointer types. e.g. You can run NEW <@void>. Each time you NEW a void, you have a new empty object, not the same as any other.
  2. As the fixed part of a hybrid to indicate the absence of the fixed part. e.g. hybrid<void int<64>> is a variable-length array of int<64>, without a fixed part.
  3. As the type of instructions or the return type of functions that do not return values. e.g. the BRANCH instruction returns void.

Other properties:

  • void has no value (in fact it does, as mentioned before)
  • void is neither a scalar type nor a composite type.
    • Only scalar types can be used for memory access: LOAD, STORE, ...
    • Only composite types have other types as components: fields/elements
    • void is nether storable nor loadable. It does not contain other parts. It cannot be part of a struct/array/vector. i.e. there is no "array of void". The "fixed part of a hybrid" is an exception.
  • void is native-safe: It can be returned from native functions; and there can be uptr<void>.

Proposed changes

value of void: Instead of "having no value", void now has exactly one value: NULL. This is consistent with Python: NoneType has only one value None.

void constant: We reuse the NULL literal to create a "void constant":

.const @VOID <@void> = NULL    // The only possible value of void.
// For the sake of consistency, we require the client to define it.
// 
// Alternative: make it a pre-defined value, such as the @uvm.predef.void_t type
// and the @uvm.predef.VOID value. We could define @uvm.predef.i8, @uvm.predef.i16,
// @uvm.predef.i32, @uvm.predef.i64, @uvm.predef.float, @uvm.predef.double,
// @uvm.predef.ref_void, @uvm.predef.ref_i32..., @uvm.predef.but the choice seems too arbitrary.

All existing instructions that return void return this NULL value. In theory, the following snippet is valid, but stupid:

%entry:
  %x = BRANCH %bb1

%bb1:
  RET <@void> %x  // return void. Should have said RET <@void> @VOID
  // or even "RET @VOID" omitting the type argument, because RET always returns the
  // return type of the current function. ADD, SUB, MUL ... would have to infer the operand
  // types if the operand type is not provided, but RET does not need to be inferred: the
  // function return type is explicit.

Remove the RETVOID instruction: Use RET <@void> @VOID instead, or simply RET @VOID.

Remove the SWAPSTACK clause PASS_VOID: Use PASS_VAL <@void> @VOID instead. Unlike RET, the type parameter here is necessary: the type that the swappee expects is dynamic. It may expect a different type at a different SWAPSTACK site. Guessing the wrong type while swapping has undefined behaviour.

Trap handlers no longer needs a PASS_VOID return case: Instead, pass a NULL constant.

New ways to use void

In addition to the existing three ways, i.e. empty objects, hybrid fixed part, empty return value, void can now be used in the following ways:

  • In RET to return from a function of void return type.
  • In SWAPSTACK to swap to a stack that does not expect to receive a value (it receives the NULL value of the void type).
  • In the trap handler, rebind the stack which expect void.

They all fit into the category that "the other end of communication" does not pass a value.

Things that should still be forbidden

void must not be a parameter type: I don't have a very compelling reason, but it is completely useless (only increases the apparent arity of a function).

void must not be part of a struct/array/vector or the variable part of a hybrid: Not allowing this will gain us a very nice property: each field/element in any struct/array/vector/varpart has a different offset. In struct<@i32 void void void void @x>, since void should have size 0 and alignment 1 (in the sense void can be allocated at any address a such that a % 1 == 0), void does occupy space. Then all of the void fields are at the same offset as @x. Another reason: C does not allow void to be a struct field.

Empty structs (struct<>) should be forbidden: For the same reason as void as a field. Just use void because it is so special. C forbids empty structs, too, but GCC allows it.

How about LLVM?

LLVM IR has two syntax for the ret instruction:

  • ret <type> <value> for example: ret i32 100
  • ret void this returns void.

LLVM does not have "void constant", either, since void is not a "first class type".

LLVM void is not a "first class type". Only void and function types are not "first class type". LLVM has both "function" types and "pointer to function" types.

LLVM LangRef does not say parameter types cannot be void, but void is never used as parameter types. In C, void is an incomplete type, and thus cannot be a parameter type.

@mn200
Copy link

mn200 commented Oct 9, 2015

RetVoid was removed in the formal spec in microvm/uvm-hol@775713ce26e1

mn200 referenced this issue in microvm/uvm-hol Oct 9, 2015
Relates to microvm-meta/#43
@mn200
Copy link

mn200 commented Oct 9, 2015

Commit microvm/uvm-hol@eced7b4 deals with the Things that should still be forbidden above. Note that we don't need to explicitly forbid Void in vector types because we already require the type argument to be scalar (and Void is not scalar).

@wks
Copy link
Member Author

wks commented Oct 20, 2015

The other proposal #45 which allows zero or more return values will undo this change. Then we will have a real "unit" value: 0-tuple. Then:

  • Instructions, instead of returning (void)NULL, will return ().
  • Functions, instead of returning (void)NULL, will return ().
  • SWAPSTACK can pass <@T1 @T2 @T3> (%a %b %c) as well as <> (). As an instruction, it can receive (), too.

Then void can only be used in two cases:

  1. As ref<void>, iref<void>, weakref<void> and ptr<void> to refer/point to anything.
  2. As the fixed part of hybrid.

@mn200
Copy link

mn200 commented Oct 20, 2015

Good point! Getting rid of (void)NULL can only be a good thing, so I'm quite happy with this. It does suggest to me that the first part of hybrid should just be a list of types. Then we only need case 1 of your list above. (And in that situation maybe void should be Any...)

@wks
Copy link
Member Author

wks commented Oct 20, 2015

@mn200 Good idea. Alternatively we can make "the whole hybrid" as a counterpart of a C99 struct where the last field is a "flexible array element" (struct {int a; char fae[];} ). Then hybrid will be exactly like a C99 struct with a FAE. Then we can write: hybrid<F1 F2 F3 F4 V>, which is like struct {F1 f1; F2 f2; F3 f3; F4 f4; V v[];}.

Obviously struct<T1 T2 T3> is a prefix of hybrid<T1 T2 T3 V>.

Then we can also write hybrid<V> which is just a variable-length array. We can get rid of the GETFIXEDPARTIREF instruction (which is just an no-op), and reuse the GETFIELDIREF for the fixed part (which never summons nasal demons), but still use GETVARPARTIREF for the variable part (may summon demons when the length of the variable part is 0).

Then void is basically an alias of "Any".

@mn200
Copy link

mn200 commented Oct 20, 2015

Are irefs allowed to point one past the end of an object? Then even a zero length variable part won't cause a failure for GETVARPARTIREF.

@wks
Copy link
Member Author

wks commented Oct 20, 2015

Out-of-bound irefs are currently not allowed. Worried that the GC may not be able to trace out-of-bound irefs correctly.

@eliotmoss mentioned it may be useful for some languages where array indices are not 0-based, but something like 100..200. This has some implications on the GC that the GC must keep track of the object reference even if the iref points out of it. It is easy to implement with fat pointers, but @steveblackburn has some very strong opinion against them.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants