Skip to content
This repository has been archived by the owner on Aug 2, 2019. It is now read-only.

Latest commit

 

History

History
454 lines (346 loc) · 21.5 KB

threads-stacks.rest

File metadata and controls

454 lines (346 loc) · 21.5 KB

Threads and Stacks

One unique feature of Mu is the flexible relation between stacks and threads. A thread can swap between multiple stacks to achieve light-weighted context switch. This provides support for language features like co-routines and green threads.

On the other hand, Mu does allow multiple simultaneous threads. Mu threads, by design, can be implemented as native OS threads and make use of parallel CPU resources. Mu also has a memory model. See the Memory Model chapter for more information.

This chapter discusses Mu threads and Mu stacks. In this chapter, "thread" means "Mu thread" unless explicitly stated otherwise.

Concepts

A stack is the context of nested or recursive activations of functions.

NOTE: "Stack" here means the "control stack", or more precisely the "context" of execution. On a concrete machine, the context includes not only the stack, but also the CPU/register states. Mu abstracts the CPU state, modelling it as part of the state of the stack-top frame.

A stack has many frames, each of which is the context of one function activation. A frame contains the states of all local variables (parameters and instructions), the program counter and alloca cells (see Mu and the Memory). Each frame is associated with a version of a function.

NOTE: Because Mu allows function redefinition, a function may be redefined by the client, and newly created function activations (newly called functions) will use the new definition. But any existing function activations will still use their old definitions, thus a frame is only bound to a particular version of a function, not just a function. This is very important because Mu cannot magically translate the state of any old function activation to a new one. A redefined function may even have completely different meaning from the old one. Mu allows the client to do crazy things like redefining a factorial function to a Fibonacci function.

During on-stack replacement, the Mu client API can tell the client which version of which function any frame is executing and the value of KEEPALIVE variables. The client is responsible for translating the Mu-level states to the high-level language states.

A thread is the unit of CPU scheduling. A thread can be bound to a stack, in which case the thread executes using the stack as its context. The phrase "bind a stack to a thread" has the same meaning as "bind a thread to a stack". While a thread is executing on a stack, it changes the state of the stack, including changing the value of local variables by executing instructions, pushing or popping frames and allocating memory on the stack.

A stack can be bound to at most one thread at any moment.

A thread is always bound to one stack, with one exception: when executing a TRAP or WATCHPOINT instruction, it is temporarily unbound from its current stack. It either rebinds to a stack (may be the old stack or another stack) or terminates after returning from the trap handler.

TODO: microvm/microvm-meta#42 Extend the unbinding to undefined function handling.

State of Threads

The state of a thread include:

  • the stack it is bound to (explained in this chapter)

  • a thread-local object reference (see below)

  • a thread-local pinning multi-set (see object pinning)

    NOTE: Implementations may keep more thread-local states, such as the thread-local memory pool for the garbage collector. They are implementation details.

The thread-local object reference is an arbitrary object reference, and can be NULL. It is initialised when a thread is created. It can be read and modified by the thread itself. It can also be read and modified by the client in the trap handler, but the trap handler can only read and modify the thread-local object reference of the thread that triggered the trap. It cannot be read or modified in any other ways.

NOTE: This design ensures that:

  1. The access to the thread-local object reference itself is data-race-free.
  2. It is only a single object reference, so the reference can fit in a machine register. In this way, if the implementation reserves a register for that reference, accessing fields in the object it refers to can be as efficient as register-indexed addressing.

It also off-loads the responsibility of resizing or redefining the thread-local object to the client. If the client wishes to add more fields to that object (e.g. when more bundles are loaded), it can use watchpoints to stop existing threads and replace their thread-local object references in the trap handler.

States of Stacks and Frames

At any moment, the state of a frame is one of the following:

READY<Ts>
(Ts = T1 T2 T3 ..., a list of types) The frame is ready to resume when values of types T1 T2 T3 ... are supplied. Ts can be an empty list.
ACTIVE
The current frame is the top of a stack and a thread is executing on the stack.
DEAD
The frame is dead.

The state of a stack is the state of its top frame. In a bound stack, the top frame is in the ACTIVE state while all other frames are in the READY<Ts> state; in an unbound stack, all frames are in the READY<Ts> state, where Ts are specific to each frame. When killing a stack, all of its frames enter the DEAD state.

Calling, returning and exception throwing instructions change the state of frames, but since the stack is always running on the same stack, the state of the stack remains to be ACTIVE. CALL and CCALL change the state of the caller frame to READY<Ts> where Ts is the return type of the callee, but a new frame is created for the callee, entering the ACTIVE state immediately. RET and THROW remove frames from the top of the current stack, but resume a lower frame and change its state to ACTIVE.

Operations on remote stacks can change the state of stacks. The table below summarises important operations:

Operation Current Stack New/Destination Stack Affected frames
create new stack N/A READY<Ts> creates new frame
create new thread N/A READY<Ts> -> ACTIVE top
SWAPSTACK ACTIVE -> READY<Ts> or DEAD READY<Us> -> ACTIVE both top frames
killing a stack N/A READY<Ts> -> DEAD all frames
@uvm.thread_exit ACTIVE -> DEAD N/A all frames
trap to client ACTIVE -> READY<Ts> N/A top
popping a frame READY<Ts> -> READY<Us> N/A removes top frame
pushing a frame READY<Ts> -> READY<Us> N/A creates new frame

Stack and Thread Creation

Mu stacks and Mu threads can be created by Mu instructions @uvm.new_stack and NEWTHREAD, or the API function new_stack and new_thread.

When a stack is created, a Mu function must be provided. The stack will contain a frame created for the current version of the function (as seen by the current thread because of concurrency and the memory model). This frame is called the stack-bottom frame and the function is called the stack-bottom function.

NOTE: The stack-bottom frame is conceptually the last frame in a Mu stack and returning from that frame has undefined behaviour. But a concrete Mu implementation can still have its own frames or useful data below the stack-bottom frame. They are implementation-specific details.

The stack-bottom frame (and also the stack) is in the READY<Ts> state, where Ts are the parameter types of the stack-bottom function. The resumption point is the beginning of the function version.

When a thread is created, a stack must be provided as its initial stack. Creating a thread binds the thread to the stack, passing values or raising exception to it (explained later), thus the top frame will enter the ACTIVE state after the thread is created. A newly created thread starts execution immediately.

NOTE: Unlike Java, there is not a separate step to "start" a thread. A thread starts when it is created.

Thread Termination

A thread is terminated when it executes the @uvm.thread_exit instruction, or the client orders the current thread to terminate in a trap handler.

The @uvm.thread_exit instruction kills the current stack of the current thread.

Mu may change the value of threadref type to NULL if the thread it refers to is terminated.

Binding of Stack and Thread

Binding

Some actions, including the NEWTHREAD and the SWAPSTACK instruction, the new_thread API function and the trap handler, can bind a thread to a stack.

When binding a thread to a stack, the state of its top frame changes from READY<Ts> to ACTIVE. In this process, one of the following two actions shall be performed on the stack:

  • A binding operation can pass values of types Ts to the stack. In this case, the types Ts must match the expected types, and the stack receives the values. Ts can be an empty list.
  • A binding operation can raise an exception to the stack. In this case, the stack can be in READY<Ts> with any Ts and it receives the exception.

It gives undefined behaviour if the stack is not in the expected state.

Resumption Point

A frame in the READY<Ts> state has a resumption point. The resumption point determines how the received values and the received exception are processed when binding to a thread.

For a Mu frame, the resumption point is either the beginning of a function version, or an OSR point instruction in the function version.

  • In the former case, the Ts in READY<Ts> are the parameters of the function (also the entry block). Received values are bound to the parameters and the execution continues from the beginning of the entry block. Received exception is re-thrown.
  • In the latter case, the Ts types are determined by the concrete instructions. Specifically, Ts are the return types for CALL and CCALL, and are explicitly specified for TRAP, WATCHPOINT and SWAPSTACK. The received values are bound to the results of the OSR point instruction. The received exception is handle by the instruction or re-thrown depending on the instruction.

Undefined Mu functions behaves as defined in Mu IR.

Native frames can only enter the READY<Ts> state when it calls back to Mu. Thus the resumption point is where it will continue after the native-to-Mu call returns. The received values are the return values to the native function. Throwing exceptions to native frames has implementation-defined behaviour.

Mu gives the client a unified model of stack binding. The binding operation is only aware of the Ts types of the READY<Ts> state, but oblivious of the resumption point. Therefore it can resume any READY<Ts> stack in the same way, whether the resumption point is the beginning of a function, a call site, a trap or a swap-stack instruction, or a native frame.

Note to the Mu implementers: swap-stack, Mu-to-Mu calls and Mu-native calls may all have different calling conventions, but the implementation must present a unified "resumption protocol" to the client: all stack-binding operations work on all OSR point instructions, as long as the Ts in READY<Ts> match the passed values. In practice, some "adapter" frames may need to be inserted to convert one convention to another, but these frames must not be seen by the client. This implies that some API functions (especially stack introspection) must "lie" to the client about the presence of such frames.

For example, on x86_64, assume SWAPSTACK passes values to the other stack via rdi and rsi, but RET returns values to the caller via rax and rdx. If, during OSR, a frame pausing on the CALL instruction becomes the top frame of a stack, then the Mu implementation must also create some glue code and an adapter frame above that frame, so that when a thread SWAP-STACK to this stack, values passed in rdi and rsi can be moved to rax and rdx, respectively. This "adapter" frame must also recover callee-saved registers.

Unbinding

Some actions, including the @uvm.thread_exit, TRAP, WATCHPOINT and the SWAPSTACK instruction, can unbind a thread from a stack.

When unbinding a thread from a stack, one of the following two actions shall be performed on the stack:

An unbinding operation can leave the stack with a return types Ts. In this case, the state of its top frame changes from ACTIVE to READY<Ts> for some given Ts. The instruction becomes the resumption point of the frame.

An unbinding operation can kill the stack. In this case, the state of all frames of the stack changes from ACTIVE to DEAD. Specifically the @uvm.thread_exit kills the current stack and the SWAPSTACK instruction can do either option on the swapper.

Executing a TRAP or an enabled WATCHPOINT instruction implies an unbinding operation, leaving the top frame in a READY<Ts> state.

Swap-stack

Swap-stack is an operation that unbinds a thread from a stack and rebind that thread to a new stack. In a swap-stack operation, the stack to unbind from is called the swapper and the stack to bind to is called the swappee.

The SWAPSTACK instruction (see instruction-set.rest) performs a swap-stack operation.

A trap handler can do similar things as swap-stack by re-binding the current thread to a different stack.

Stack Destruction

The @uvm.kill_stack instruction, the kill_stack API function and all operations that perform unbinding operations can destroy a stack. Destroying a stack changes the state of all of its frames to DEAD.

If a stack becomes unreachable from roots, the garbage collector may kill the stack.

The Mu may change the value of stackref type to NULL if the stack it refers to is in the DEAD state.

Stack Introspection

Stacks in the READY<Ts> state can be introspected. Stacks in other states cannot.

The stack introspection API uses frame cursors. A frame cursor is a mutable opaque structure allocated by Mu. It refers to a Mu frame, and also keeps implementation-dependent states necessary to iterate through frames in a stack.

Note: The reason why it is mutable is that the cursor may be big. The states to be kept is specific to the implementation. Generally speaking, the more callee-saved registers there are, the bigger the cursor is. Allocating a new structure whenever moving down a frame may not scale for deep stacks.

The new_cursor API call allocates a frame cursor that refers to the top frame of a given stack, and returns a framecursorref that refers to the cursor. Then the client can use the next_frame API to move the cursor to the frame below. The copy_cursor copies the given frame cursor. The original frame cursor and the copied cursor can move down independently. This is useful when the client wishes to iterate through the stack in different paces. The close_cursor API closes the frame cursor and deallocates its resources.

It has undefined behaviour if a stack is bound to a thread or a stack is killed while there are frame cursors to its frames not closed. It has undefined behaviour if next_frame goes below the bottom frame.

Note: There are several reasons why it needs explicit closing.

  • It forces the client to avoid racing stack modification and stack introspection.
  • It will not force the Mu implementation to use a particular way to allocate such cursors. The Mu implementation can use malloc and free. If the implementation uses garbage collection for such cursors, it can still treat the close_cursor operation as a no-op.
  • An alternative is to close all related cursors automatically when the stack is re-bound. But that will involve one extra check for every swap-stack operation, which may be much more frequent than stack introspection, which is usually only used in exceptional cases.

The cur_func, cur_func_ver, cur_inst and dump_keepalives API calls take a framecursorref as argument, and returns the ID of the function, function version or current instruction of the frame, or dumps the keep-alive variables of the current instruction of the frame.

Multiple threads may introspect the stack concurrently as long as there is no concurrent modification using the on-stack replacement API (see below). However, it has undefined behaviour to operate on a closed frame cursor.

These operations can also be performed by their equivalent common instructions @uvm.meta.*.

On-stack Replacement

The client can pop and push frames from or to a stack.

The pop_frames_to API function takes a framecursorref as argument. It will pop all frames above the frame cursor, and the frame of the cursor becomes the new top frame.

Popping native frames has implementation-defined behaviour. It has undefined behaviour if a frame is popped but there are frame cursors referring to that frame.

The push_frame API function takes a stackref and a funcref as arguments. It creates a new frame on the top of the stack, using the current version (as seen by the current thread) of the given function. The resumption point is the beginning of the function version. The return types of the function must match the Ts of the state of the previous frame, which must be READY<Ts>.

There are equivalent common instructions in the IR, too.

It has undefined behaviour if

  • there are two API calls or equivalent instructions executed by two threads, and
  • one is new_cursor, next_frame, cur_func, cur_func_ver, cur_inst, dump_keepalives, pop_frames_to or push_frame, and
  • the other is pop_frames_to or push_frame, and
  • neither happens before the other.

After popping or pushing frames, the state of the stack become the state of the new top frame, which must be READY<Ts> for some Ts. The stack can be bound.

NOTE: For the ease of Mu implementation, the new function must continue from the beginning rather than an arbitrary instruction in the middle. Continuing from the middle of a function demands too much power from the code generator.

However, in most OSR scenarios, the desired behaviour is to continue from the point where the program left out for optimisation. The client can emulate the behaviour of continuing from the middle of a function by inserting a "prologue" in the high-level language in the beginning of the function. For example, in C, the client can add extra assignment expressions to initialise local variables to the previous context and use a goto statement to jump to the location to continue. Then the optimising compiler can remove unreachable code. As another example, if the client implements a JVM, it can insert Xstore instructions and a goto instruction to continue from the appropriate bytecode instruction. The optimising compiler can handle the rest.

TODO: With the goto-with-values form defined, we can extend the IR and the API so that execution can continue from an arbitrary basic block of an arbitrary function version, rather than just the beginning of a function.

Futex

Mu provides a mechanism similar to the Futex in the Linux kernel for implementing blocking locks and other synchronisation primitives.

There is a waiting queue for all memory locations that has some integer types. (See portability.rest for valid candidate types for Futex.)

The @uvm.futex.wait and the @uvm.futex.wait_timeout instructions put the current thread into the waiting queue of a memory location. Both @uvm.futex.wake and @uvm.futex.cmp_requeue wakes up threads in a waiting queue of a memory location.

NOTE: The term memory location is defined in Mu's sense and is abstract over physical memory or the virtual memory space given by the operating system. Even if a Mu implementation uses copying or replicating garbage collectors, the memory location in a heap object remains the same until the object is collected.

The Mu Futex is designed to be easy to map to the futex system call on Linux. With the presence of copying garbage collector, Mu may internally perform FUTEX_REQUEUE or FUTEX_CMP_REQUEUE operations to compensate the effect of object movements. It may put barriers around Futex-related Mu instructions when the GC is concurrently re-queuing threads.

When a thread is blocking on a futex, the state of its stack is still ACTIVE, making the impression that the thread is still "busy executing" the futex wait/wait_timeout instructions. Only the kernel knows whether it is doing an OS-level swap-stack, as it always does for context-switching.