From ff35ab7ebcc440ba4703062a6a9e89e574fcb1f7 Mon Sep 17 00:00:00 2001 From: Koen De Hondt Date: Tue, 31 Dec 2024 13:44:01 +0100 Subject: [PATCH 1/2] =?UTF-8?q?Suggested=20improvements=20for=20=E2=80=9DP?= =?UTF-8?q?haro=20Bytecode=20Design=E2=80=9D?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../5-DeeperBytecode/methodsbytecode.md | 100 +++++++++--------- 1 file changed, 50 insertions(+), 50 deletions(-) diff --git a/Part1-InterpreterAndBytecode/5-DeeperBytecode/methodsbytecode.md b/Part1-InterpreterAndBytecode/5-DeeperBytecode/methodsbytecode.md index a0158f3..1a5e3bc 100644 --- a/Part1-InterpreterAndBytecode/5-DeeperBytecode/methodsbytecode.md +++ b/Part1-InterpreterAndBytecode/5-DeeperBytecode/methodsbytecode.md @@ -1,6 +1,6 @@ -## Pharo Bytecode Design +## Pharo Bytecode Design -In this chapter we will describe in details the bytecode in Pharo. +In this chapter we will describe the bytecode in Pharo in detail. ### Bytecode Encoding and Optimizations @@ -23,10 +23,10 @@ In the following section we will go into how such optimizations take place concr #### Optimising for Common Bytecode Instructions As we said before, the variable-length bytecode encoding allows for shorter bytecode sequences for common instructions. -For example, we can take the most common bytecode from the Pharo12 release (build 1521) using the script that follows. +For example, we can take the most common bytecode from the Pharo12 release (build 1521) using the script in Listing *@commonbytecode@*. The script takes all the compiled code (methods and blocks), decodes all their instructions and groups them by their bytes. -```caption=Obtaining the most common bytecode instructions +```caption=Obtaining the most common bytecode instructions.&anchor=commonbytecode ((CompiledCode allSubInstances flatCollect: [ :e | e symbolicBytecodes ]) groupedBy: [ :symBytecode | symBytecode bytes ]) associations @@ -48,7 +48,7 @@ This tendency continues in the entire list of bytecode following an exponential The first fifty instructions happen tenths of thousands of times, while the vast majority appear less than a thousand. This observation is enough motivation to optimize such _very common_ cases. -Indeed, amongst the 255 most common instructions, 183 are already encoded as a 1 byte instruction. +Indeed, amongst the 255 most common instructions, 183 are encoded as a 1 byte instruction. #### Encoding of Single-byte instructions @@ -56,8 +56,8 @@ Instructions such as `pop` or `push self` are single instructions that do not ne The encoding of these instructions is straight forward: they are given a single byte. For example, `pop` is encoded as 216, while `push self` is encoded as 76. -There are however other common instructions that have parameters. -This is the case, for example, of the `push instance variable` bytecode that is parameterized with the index of the reference slot in the receiver (the instance variable) to push. +However, there are other common instructions that have parameters. +This is the case, for example, for the `push instance variable` bytecode that is parameterized with the index of the reference slot in the receiver (the instance variable) to push. To encode this instruction as a single byte, the index is encoded within the instruction. That is, the bytecode `push instance variable at 1` is encoded as 0, the bytecode `push instance variable at 2` is encoded as 1. @@ -78,9 +78,9 @@ An alternative way of seeing this encoding is to see that an instruction opcode Besides common instructions, another useful observation is that many instructions are usually combined together. Consider for example the statement `^ self`, which is commonly used to perform an early exit from a method, and inserted at the end of every method that does not have an explicit return. -A naïve translation of `^self` could use the following sequence of instructions. +A naive translation of `^self` could use the sequence of instructions in Listing *@returningself@*. -```caption=A common bytecode sequence for returning self +```caption=A common bytecode sequence for returning self.&anchor=returningself push self return top ``` @@ -94,14 +94,14 @@ Another source of overhead happens on the over-reliance on literals. In Pharo, each method has its own literal frame: literals and constants are not shared between methods, causing a potential redundancy and memory inefficiency. One way to minimize such overhead is to design special instructions for well-known constants. -Constants such as `nil`, `true`, `false` need to be known by the VM for several tasks such as initializing instance variables, or interpret conditional jumps. +Constants such as `nil`, `true`, `false` need to be known by the VM for several tasks such as initializing instance variables, or interpreting conditional jumps. The VM benefits from this knowledge to provide specialized instructions such as `push true` that do not fetch the `true` object from the method literal frame but from the pool of constants known from the VM. -In the same venue, immediate objects can be crafted by the VM on the fly, avoiding the storage in the literal frame. +In the same vein, immediate objects can be crafted by the VM on the fly, avoiding the storage in the literal frame. Instructions such as `push 0`, encoded as 80, represent the usage of constants that appear often, for example, in loops. -When executing those instructions, the VM create an immediate object by tagging a well-known value. +When executing those instructions, the VM creates an immediate object by tagging a well-known value. -Finally, another variation of this optimization happens on common message sends _e.g.,_ arithmetic and comparisons selectors. +Finally, another variation of this optimization happens on common message sends e.g., arithmetic and comparisons selectors. These selectors happen so often, that instead of storing the selector in the method's literal frame, they are stored in a global table of selectors called `special selectors`. The Pharo bytecode set defines `send special selector` instructions. @@ -119,16 +119,16 @@ Actually, it is only the primitives that start with a 1-based index. In contrast: - bytecode instructions are encoded in a 0-based fashion, making the value `0` a valid encoded instruction. -- all variables, temporaries and instance, use 0-based indexing. Thus, the bytecode to read the first instance variable is `push instance variable 0`. Similarly, the bytecode to read the first temporary variable is `push temporary variable 0`. +- all variables, temporary and instance, use 0-based indexing. Thus, the bytecode to read the first instance variable is `push instance variable 0`. Similarly, the bytecode to read the first temporary variable is `push temporary variable 0`. #### Temporary Variables vs Arguments -The Sista bytecode set inherits, mostly for historical reasons, several traits from previous the bytecode design. +The Sista bytecode set inherits, mostly for historical reasons, several traits from previous bytecode design. One particularly interesting trait is that method arguments are modelled as the first (read-only) temporary variables in a method. -For example, while the method that follows has syntactically one argument and one temporary variable, the underlying implementation will have two temporary variables, from which the first is an argument. +For example, while the method in Listing *@tempsversusarguments@* has syntactically one argument and one temporary variable, the underlying implementation will have two temporary variables, from which the first is an argument. -```caption=Arguments are the first temporaries in a method. +```caption=Arguments are the first temporaries in a method.&anchor=tempsversusarguments MyClass >> methodWithOneArgAndOneTemp: arg | temp | @@ -140,21 +140,21 @@ MyClass >> methodWithOneArgAndOneTemp: arg This decision impacts the bytecode design in different ways. -1. First, to get the real number of temporaries in a method we need to substract the number of arguments from it. +1. First, to get the real number of temporaries in a method we need to substract the number of arguments from it. See Listing *@numberoftemporaries@*. -```caption=Obtaining the real number of temporaries from a method. +2. We need to know the number of arguments of a method to index its temporaries. For example, reading the nth real temporary variable in a method, we need to read the temporary at offset `numArgs + nth - 1` (remember that we need to substract 1 because variable indexes are 0-based). + +```caption=Obtaining the real number of temporaries from a method.&anchor=numberoftemporaries realNumberOfTemporaries := aMethod numTemps - aMethod numArgs ``` -2. We need to know the arguments of a method to index its temporaries. For example, reading the nth real temporary variable in a method, we need to read the temporary at offset `numArgs + nth - 1`~(remember that we need to substract 1 because variable indexes are 0-based). - #### Bytecode Extension Prefixes Some bytecode instructions are limited by the encoding: for example, 2-byte instructions usually use one byte as opcode and one byte as argument, limiting the argument to a maximum of 255 values. -For example, the code that follows illustrates the code of the `long jump if false`, that jumps to a given target bytecode if a false is found in the stack. This 2-byte bytecode uses the second bytecode as a relative offset from the current bytecode. +For example, the code in Listing *@jumpiffalse@* illustrates the code of the `long jump if false`, that jumps to a given target bytecode if false is found in the stack. This 2-byte bytecode uses the second bytecode as a relative offset from the current bytecode. Such a restriction can be too limiting for some applications. In the case of our example, this forbids us from having jumps longer than 255 bytes. -```caption=Sketching the jump if false +```caption=Sketching the jump if false.&anchor=jumpiffalse extJumpIfFalse | byte offset | @@ -164,10 +164,10 @@ extJumpIfFalse ``` To solve this issue, the sista bytecode includes two prefix instructions: `extension A` and `extension B`. -Prefix instructions prefix normal instructions and work as meta-data for the following instruction: the semantics of a prefix depends on each instruction. Also, instructions that use an instruction _consume it_, zeroing its value for subsequent instructions. -The actual implementation of the `long jump if false` bytecode adds uses the value of it's prefix (if any) to reach further jump offsets. +Prefix instructions prefix normal instructions and work as meta-data for the next instruction: the semantics of a prefix depends on each instruction. Also, instructions that use another instruction _consume it_, zeroing its value for subsequent instructions. See Listing *@jumpiffalsewithextensions@*. +The actual implementation of the `long jump if false` bytecode uses the value of its prefix (if any) to reach further jump offsets. -```caption=Sketching the jump if false with extensions +```caption=Sketching the jump if false with extensions.&anchor=jumpiffalsewithextensions extJumpIfFalse byte := self fetchByte. offset := byte + (extB << 8). @@ -175,12 +175,12 @@ extJumpIfFalse self jumplfFalseBy: offset ``` -The example that follows show two bytecode sequences, where the first jumps forward by 255 bytes while the second one jumps forward by 256 bytes. +The example in Listing *@jumpmorethan255bytes@* shows two bytecode sequences, where the first jumps forward by 255 bytes while the second one jumps forward by 256 bytes. The first sequence does not require any extensions, while the second one uses an extension. -Notice that the `long jump if false` computes it's offset as `byte + (extB << 8)`. +Notice that the `long jump if false` computes its offset as `byte + (extB << 8)`. Thus, to compute an offset of 256, we should have an extension of value `1`, and a jump argument of value `0`. -```caption=Jumping more than 255 bytes +```caption=Jumping more than 255 bytes.&anchor=jumpmorethan255bytes "Jump forward 255 bytes" extJumpIfFalse 255 @@ -191,19 +191,19 @@ extJumpIfFalse 0 In the case before, the prefix allows jumps to reach jump targets up to `65535` (`255 + (255 << 8)`). To support larger values, prefixes can be composed: an instruction can have many prefixes that cummulate. -Following is the definition of the `extension A` bytecode, which takes the previous value of the extension A, shifts it 8 bits to the left and adds it to the given value. +Listing *@extensionimplementation@* shows the definition of the `extension A` bytecode, which takes the previous value of the extension A, shifts it 8 bits to the left and adds it to the given value. -```caption=The extension implementation +```caption=The extension implementation.&anchor=extensionimplementation extABytecode extA := (extA bitShift: 8) + self fetchByte. self fetchNextBytecode ``` Extensions are composed by adding many prefixes to a given instruction. -For example, the following example shows a jump with two extensions of 1 and 2, to a jump with argument 3. +For example, the example in Listing *@jumpmorethan65535bytes@* shows a jump with two extensions of 1 and 2, to a jump with argument 3. This computes the jump offset of 66051 with the formula `(((1 << 8) + 2) << 8 + 3)`. -````caption=Combining extensions to jump above 65535 bytes +````caption=Combining extensions to jump above 65535 bytes.&anchor=jumpmorethan65535bytes "Jump forward 66051 bytes" extA 1 extA 2 @@ -212,15 +212,15 @@ extJumpIfFalse 3 #### Super sends -`super` sends deserve a special section for their own, specially because the Sista Bytecode set introduced _directed_ super sends in addition to the traditional ones. -When using the `super` keyword in Pharo, a message send is issued starting the method-lookup from the superclass of the current method's class instead of the receiver's class. -This means that to perform a super-send lookup, we need to access the current method's superclass, which should be encoded in the method or bytecode in some way. +`super` sends deserve a special section for their own, because the Sista Bytecode set introduces _directed_ super sends in addition to the traditional ones. +When using the `super` keyword in Pharo, a message send is issued starting the method lookup from the superclass of the current method's class instead of the receiver's class. +This means that to perform a super send lookup, we need to access the current method's superclass, which should be encoded in the method or bytecode in some way. Pharo's bytecode set allows for encoding such information in two ways: -- **Per-method encoded super sends:** Traditionally, each Pharo method contain as last literal a reference to it's class binding. When performing a normal super send, the algorithm fetches this last literal, then it's superclass, and starts the lookup algorithm from there. -- **Per-call-site encoded super sends:** _Directed super sends_ allow to specify as a stack argument the class from where to start the lookup. Directed super sends allow to specify different lookup classes per call-site, by pushing the lookup-class as the last element on the stack. Although initially meant for super sends, this bytecode can be used to control message sends per-call-site at the expense of larger bytecode and literal frames. +- **Per-method encoded super sends:** Traditionally, each Pharo method contains as last literal a reference to its class binding. When performing a normal super send, the algorithm fetches this last literal, then its superclass, and starts the lookup algorithm from there. +- **Per-call-site encoded super sends:** _Directed super sends_ allow to specify as a stack argument the class from where to start the lookup. Directed super sends allow to specify different lookup classes per call-site, by pushing the lookup-class as the last element on the stack. Although initially meant for super sends, this bytecode can be used to control message sends per-call-site at the expense of larger bytecode and literal frames. See Listing *@directedsend@*. -```caption=A directed send bytecode sequence +```caption=A directed send bytecode sequence.&anchor=directedsend "This will lookup #some:message: starting from ClassToLookupFrom, using the receiver and argument found in the stack" push receiver push arg0 @@ -241,11 +241,11 @@ Most of the single-byte bytecode, if not all, are optimized versions of more gen - Bytecodes 224-247 are two-byte bytecode - Bytecodes 248-255 are three-byte bytecode -In this section, we will refer to `byte0` as the first byte of a instruction, `byte1` as the second byte (if exists) and `byte2` as the third byte (if exists). +In this section, we will refer to `byte0` as the first byte of a instruction, `byte1` as the second byte (if present) and `byte2` as the third byte (if present). #### Sista Optimized Bytecode -The range 0-75 encodes four diferent families of very common instructions: push receiver instance variable, push literal variable, push literal constant and push temporary variable. +The range 0-75 encodes four different families of very common instructions: push receiver instance variable, push literal variable, push literal constant and push temporary variable. Each family has many versions specialized for one particular argument, encoded as part of the instruction. | Bytes | Description | Arguments | @@ -301,8 +301,8 @@ Backjumps (and thus loops) need to be encoded with the longer version (237) | Bytes | Description | Arguments | | --- | --- | --- | | 176-183 | Unconditional jump to offset `x` | x = 0x7 | -| 184-191 | Coditional jump to offset `x` if top = `true` | x = byte0 && 0x7 | -| 192-199 | Coditional jump to offset `x` if top = `false` | x = byte0 && 0x7 | +| 184-191 | Conditional jump to offset `x` if top = `true` | x = byte0 && 0x7 | +| 192-199 | Conditional jump to offset `x` if top = `false` | x = byte0 && 0x7 | The range 200-215 encodes _store and pop_ super instructions, which are very common when using single-statement assignments. @@ -313,8 +313,8 @@ The range 200-215 encodes _store and pop_ super instructions, which are very com #### General forms -Instructions that cannot be encoded in the previous ranges -- _e.g.,_ push instance variable number 20 -- can be encoded with a more general form. -General forms can go beyond the limits of byte argument using extensions as described before. +Instructions that cannot be encoded in the previous ranges -- e.g., push instance variable number 20 -- can be encoded with a more general form. +General forms can go beyond the limits of byte arguments by using extensions as described before. The following tables illustrates such general forms. | Bytes | Description | @@ -343,7 +343,7 @@ The following tables illustrates such general forms. #### Other instructions -Finally some instructions that are rare enough do not have this distinction between the general and the optimized case. +Finally some instructions that are rare enough do not have the distinction between the general and the optimized case. Some of the most notable instructions are: - **231 - Push array:** boxes the top `x` elements in the stack into an array, and pushes the array to the stack. Depending on the second byte, this bytecode may pop the `x` elements from the stack or not. @@ -355,10 +355,10 @@ Some of the most notable instructions are: In this chapter we studied the actual encoding of Pharo instructions. Moreover, we explored many optimizations that can be done at the level of bytecode encoding. -- Pharo's bytecode set has a variable encoding with instructions taking between 1 and 3 bytes -- Encoding optimizations make methods shorter by having smaller bytecode sequencesx and less method literals -- common bytecode instructions can be shortened and made special instructions, avoiding expensive literals and arguments -- common bytecode sequences can be combined into (shorter!) super instructions too +- Pharo's bytecode set has a variable encoding with instructions taking between 1 and 3 bytes. +- Encoding optimizations make methods shorter by having smaller bytecode sequences and less method literals. +- Common bytecode instructions can be shortened and made special instructions, avoiding expensive literals and arguments. +- Common bytecode sequences can be combined into (shorter!) super instructions too. In the following chapters we will study the implementation of the Pharo interpreter and several of its portable optimizations. In later chapters we will study low-level optimizations of the interpreter thanks to the Slang framework that applies indirect threading, inlinings, and variable autolocalization. From 79f762ff58bf6eb66e3067c1f4268d46f8ff997b Mon Sep 17 00:00:00 2001 From: Koen De Hondt Date: Tue, 31 Dec 2024 13:52:01 +0100 Subject: [PATCH 2/2] Remove backtick and refer to listing 7-2 --- .../5-DeeperBytecode/methodsbytecode.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Part1-InterpreterAndBytecode/5-DeeperBytecode/methodsbytecode.md b/Part1-InterpreterAndBytecode/5-DeeperBytecode/methodsbytecode.md index 1a5e3bc..57c7d8c 100644 --- a/Part1-InterpreterAndBytecode/5-DeeperBytecode/methodsbytecode.md +++ b/Part1-InterpreterAndBytecode/5-DeeperBytecode/methodsbytecode.md @@ -65,9 +65,9 @@ Single-byte parametrized bytecode instructions are organized in ranges, often of For example, 1-byte `push instance variable` instruction is organized in a range of 16 instructions (2^4). 1-byte `push instance variable` instructions are encoded with bytes from 0 to 15, parameterized with indexes from 1 to 16 respectively. -An alternative way of seeing this encoding is to see that an instruction opcode is not the byte on itself but the most significant bits of the byte. If we consider again the range of bytecodes `push instance variable`, the most significant nibble remains always zero regardless of the bytecode, while the lowest part always changes following the index to push. +An alternative way of seeing this encoding is to see that an instruction opcode is not the byte on itself but the most significant bits of the byte. If we consider again the range of bytecodes `push instance variable`, the most significant nibble remains always zero regardless of the bytecode, while the lowest part always changes following the index to push. See Listing *@nibbles@*. -```caption=Understanding encoding and nibbles +```caption=Understanding encoding and nibbles.&anchor=nibbles "The most significant nibble is always 0 for this range of bytecode" 0 to: 15 do: [ :e | self assert: ((e >> 4) bitAnd: 16rF) = 0 ]. "The least significant nibble is always the index to push" @@ -203,7 +203,7 @@ Extensions are composed by adding many prefixes to a given instruction. For example, the example in Listing *@jumpmorethan65535bytes@* shows a jump with two extensions of 1 and 2, to a jump with argument 3. This computes the jump offset of 66051 with the formula `(((1 << 8) + 2) << 8 + 3)`. -````caption=Combining extensions to jump above 65535 bytes.&anchor=jumpmorethan65535bytes +```caption=Combining extensions to jump above 65535 bytes.&anchor=jumpmorethan65535bytes "Jump forward 66051 bytes" extA 1 extA 2