interpreter.txt


    A PYTHON INTERPRETER

Python is a programming language while CPython is the primary
implementation of Python in C (and Python - yeah, I know).

A Python interpreter: an interpreter in the Python context
is a generic term. It could mean one of the following:
i) The Python REPL
ii) Running the Python code from start to finish.

In this mini-study, the term "INTERPRETER" focuses
on the final step in the execution of a Python program.

Before the interpreter assumes control; The lexer, parser,
and compiler acts on the source text to convert it to
structured code objects that contains instructions that the
interpreter understands.

NB: Python has a compiling step

The interpreter takes the code object and follows
the instructions.

NB: The compilation step in Python execution process does
relatively less work (the interpreter does more) than in a
compiled language.

    BUILDING THE INTERPRETER

Virtual machines are software that emulates physical
computer. The Python interpreter is a **virtual machine**
which works like a stack machine: it manipulates stack(s)
to perform its operations (as opposed to a register machine
which reads and write to a particular memory locations).

    why?

The interpreter is a bytecode interpreter. It takes
instruction sets called **bytecode** as input.

The lexer, parser, and compiler generate code objects.
Each code object contains a set of instructions to be
executed - bytecode - and other information that the
interpreter requires.

    code object = bytecode + other information

In building the interpreter, the operations are
represented as methods in an Interpreter class e.g
LOAD_VALUE, ADD_TWO_VALUES, PRINT_NUMBER etc. Each of
these operations act on a stack (represented as a list
object inside the interpreter).

LOAD_VALUE: adds element to the stack
ADD_TWO_VALUES: pops two elements from the stack, add them
                together and add the result to the stack
PRINT_NUMBER: pops an element from the stack, and prints
              the element.

  VARIABLES

Variables require an instruction for storing the value of
a variable, STORE_NAME; an instruction for retrieving the
value of a variable, LOAD_NAME; and a mapping of variable
names to values.

The mapping is achieved by defining an environment instance
variable - represented as a Python dictionary - on the
Interpreter class (this is a hack for now).

STORE_NAME: pops an element from the stack and store it on
            the environment variable with the appropriate key.
LOAD_NAME: given a key, get the associated value and add it
           to the stack

  REAL PYTHON BYTECODE

One byte is used to represent each instruction while
two bytes are used to represent arguments (names and
constants).

Python source
-------------

1 def cond():
2    x = 3
3    if x < 5:
4        return 'yes'
5    else:
6        return 'no'

Equivalent bytecode for **cond**
--------------------------------

  1           0 RESUME                   0

  2           2 LOAD_CONST               1 (3)
              4 STORE_FAST               0 (x)

  3           6 LOAD_FAST                0 (x)
              8 LOAD_CONST               2 (5)
             10 COMPARE_OP               2 (<)
             14 POP_JUMP_IF_FALSE        1 (to 18)

  4          16 RETURN_CONST             3 ('yes')

  6     >>   18 RETURN_CONST             4 ('no')

Each line has about five(5) columns; and they are
explained thus:

first column - line of code in Python source
second column - bytecode index
third column - instruction itself
fourth column - argument to the instruction
fifth column - hint to what the argument means

Conditional and Loops in Python relies on jumping from
an instruction index to another.

  FRAMES

Python interpreter is a stack machine that perform operations by
pushing and popping values on the stack.

Frames are another layer of complexity housed in the Python
interpreter. They are created and destroyed on the fly
during program execution.

A frame is associated with each module, function call
and class definition.

    while each frame has a code object associated with it, a
    code object can have multiple frames.

There are also three types of stacks to consider:

- frames live on the **call stack** (think traceback of exceptions.
  Each line starting with "File "<stdin>", line number, ..."
  corresponds to a frame on the call stack).

- **data stack** is the one we've be manipulating in our make-shift
  interpreter.

- **block stack** is used for certain kinds of control flow,
  looping and exception handling.

NB: Each frame on the call stack has a DATA and BLOCK stack.

The bytecode instruction, RETURN_VALUE, which corresponds
to **return** statement in the code is used to pass values
between frames.

      i) the top value is popped off the data stack on the
         top frame on the call stack.

     ii) the entire top frame is then popped off.

    iii) the value from (i) above is pushed into the data
         stack of the next frame.


  EXAMINING THE INTERPRETER

VirtualMachine: It is the highest level of the interpreter.
It manages the call stack and mapping of instructions to operation.
Only one instance will be created each time the program is run.

Frame: An instance has one code object and manages the namespaces,
holds a reference to the calling frame and the last bytecode
instruction executed

Function: It is used in place of Python function so we can
control the creation of new frames. Every function call
creates a new frame.

Block: Wraps the three attributes of blocks (required so our interpreter
can run real Python code)