This document is based on the talk RustPython: a Python implementation in Rust Building a Python 3 interpreter in Rust by Windel Bouwman(@windelbouwman) and Shing Lyu(@shinglyu) at FOSDEM and the Lightning Talk RustPython: a Python implementation in Rust by Lukas Prokop(@meisterluk) at pyGraz.
Rust is a relatively new programming language aimed as a safe competitor of C. There are already attempts to write extension modules in rust and load them into CPython. A whole new approach would be to re-implement the Python language in rust.
This is what RustPython is about. The aim of the project is to create a Python interpreter written entirely in Rust. Until now we used many of the language features available in rust, such as vectors, hashmaps, iterators.
To implement standard library modules, we could just wrap existing rust crates. For example, this is how the json
module is implemented.
- Rust is safer than C
- In general Rust allows you to focus on the actual implementation of the library
- Learn Rust
- Learn Python intrnals
- Create a Python implementation which is more memory-safe.
- Rust Crates
- Lexer, Parser, and Abstract Syntax Tree (AST)
- Compiler
- Virtual Machine (VM)
- Import System
- Built-in Objects
The CPython Design Strategy
The overall design of RustPython follows the CPython strategy.
Stripped-off Dependancy Tree would look like this:
rustpython (RustPython interactive shell)
├── rustpython-parser (lexer, parser and AST)
├── rustpython-compiler (compiler)
│ ├── rustpython-bytecode (bytecode)
│ └── rustpython-parser (lexer, parser and AST `lalrpop`)
└── rustpython-vm (VM, compiler and built-in functions)
├── rustpython-bytecode (bytecode)
├── rustpython-compiler (compiler)
└── rustpython-parser (lexer, parser and AST)
Crates | |
---|---|
rustpython-parser | Lexer, Parser and AST |
rustpython-vm | VM, compiler and built-in functions |
rustpython | Using above crates to create an interactive shell |
- A hand coded lexer to deal with indent and dedent of Python
- The lexer converts Python source into tokens
- The parser is generated with lalrpop
- The parser converts tokens into an AST (Abstract Syntax Tree)
- The AST nodes are Rust structs and enums
- The compiler turns Python syntax (AST) into bytecode
- CPython bytecode is not stable and varies wildly between versions
- Example bytecode
import dis
def f(a, b):
return a + b + 3
if __name__ == "__main__":
dis.dis(f.__code__)
3 0 LOAD_FAST 0 (a)
2 LOAD_FAST 1 (b)
4 BINARY_ADD
6 LOAD_CONST 1 (3)
8 BINARY_ADD
10 RETURN_VALUE
- Idea: standardize this bytecode between Python implementations..?
- A fetch and execute loop
- Fetch a bytecode
- lookup in a
match
statement - perform the operation
- If the bytecode is an
import
instruction invoke the whole loop again.
- If the bytecode is an
match &instruction {
bytecode::Instruction::LoadConst { ref value } => {
let obj = self.unwrap.constant(vm, value);
self.push_value(obj);
Ok(None)
}
bytecode::Instruction::Import {
ref name,
ref symbol,
} => self.import(vm, name, symbol),
- In CPython, there is reference counting
- The type
PyObjectRef
(rustpython_vm::pyobject::PyObjectRef
) is a reference count to the actual Python object - Use Rust
Rc
(std::rc::Rc
) andRefCell
(std::cell::RefCell
) to do reference counting of Python objects - The struct
PyObject
(rustpython_vm::pyobject::PyObject
) has atype
which is aPyObjectRef
(rustpython_vm::pyobject::PyObjectRef
). And, each object has a dictionary in the form of aHashMap
(std::collections::HashMap
), which is very dynamic and fast. So, you can easily set and get attributes at will. - For certain Python types, we need to internally store a Rust payload; for instance
pub type PyRef<T> = Rc<RefCell<T>>;
pub type PyObjectRef = PyRef<PyObject>;
pub struct PyObject {
pub kind: PyObjectKind,
pub typ: Optic <PyObjectRef>,
pub dict: HashMap<String, PyObjectRef>,
}
pub enum PyObjectKind {
String {
value: String
},
Integer {
value: Biglnt,
},
Float {
value: f64,
}
Complex {
value: Complex64,
},
Bytes {
value: Vec<u8>,
},
- Builtin Python functions are implemented in Rust
fn buitin_all(vm: &mut VirtualMachine, args: PyFuncArgs) -> PyResult {
arg_check!(vm, args, required = ((iterable, None)]);
let items = vm.extract_elements(iterable)?;
for item in items {
let result = objbool::boolval(vm, item)?;
if I result {
return Ok(vm.new_bool(false));
}
}
Ok(vm.new_bool(true))
}
PyFuncArgs
(rustpython_vm::function::PyFuncArgs
) - Contains all the positional arguments and keyword arguments.arg_check
(rustpython_vm::arg_check
) - Macro that checks the types of the arguments passed into the function.