-
Notifications
You must be signed in to change notification settings - Fork 4
Cython tips
Cython compiles Python code to machine code. It's useful tool for speeding up Python code, esp. inner loops over arrays with static types. It can even run multiple threads over code that doesn't need the gil
(Python's Global Interpreter Lock). It's in active development and heavily used by many projects.
There are other ways to speed up Python code, e.g. Nuitka, Numba, Numexpr, Pyjion, PyPy, Pyston, Pythran, and Weld, not to mention running array computations in NumPy.
Cython does a lot for you but it gets tricky to use, perhaps because it combines the complexity of Python with the complexity of C and the interactions between them.
- Cython docs
- Cython Changelog
- Blog posts like What's new in Cython 0.29? reveal a lot that isn't in the docs.
- Packaging for distribution Python, distutils, and Cython.
Cython is a superset of Python that adds in C features, optional static types, and all the semantics of their interactions. So it's a great way to call C code from Python, but not simple. Cython maintains Python source compatibility, e.g. keeping Python operator precedence but adding in an address-of operator &variable
and a type cast <char *> buffer
.
When you use its C-level features, you're responsible for C-level memory management. For instance a function can accept various types of array values (numpy, array.array, Cython array, buffer interface) as a typed memoryview
and then access the elements directly without calling into the Python interpreter. You can even release the GIL and operate on the array with parallel threads.
If you get a raw C pointer to the array's buffer &array[0]
or to a local variable &count
, you must ensure those pointers remain valid as long as needed.
Python (the CPython implementation, that is) manages memory via reference counting with occasional object cycle detection. There's no memory compaction, so you don't have to lock nodes from relocating. There's no GC tracing of pointers. Python cannot ref-count raw C pointers. So while using a pointer to an array object's buffer, you must hold a reference to that Python object to keep it allocated.
A Cython function can be declared with def
, cdef
, or cpdef
:
-
def
defines a function with Python linkage. Its args and result must be Python objects. Arg type specs are optional, and if given, the function preamble code will check those types on entry so you can use the internal representations directly. -
cdef
defines a function with C linkage that cannot be called from Python code. Those args and result can be Python and C objects. -
cpdef
creates a cdef function and also a Python wrapper function.
cdef class
defines an extension class.
- It can have
__cinit__()
and__dealloc__()
"methods" which are really hooks for the extension mechanism to call the object's C-level allocation and deallocation. You can't call them directly. An extension class can also have an__init__()
method, and as with ordinary Python classes, it's possible to "new" an object without calling its__init__()
method. In contrast,__cinit__()
is guaranteed to be called exactly once. It can do additional initialization beyond C memory allocation but object construction hasn't finished building a Python object yet, so beware. -
__cinit__()
and__dealloc__()
are weird in another way: You must declare them withdef
even though they only have C linkage. - Actually all the "dunder" special methods of extension types must be declared with
def
, notcdef
, and Python uses special calling conventions to invoke them. - An extension class cannot have a
__del__()
method. - Declare an attribute
cdef dict __dict__
to enable dynamic attributes. Otherwise the instances have no__dict__
. - Declare an attribute
public
to generate get/set property accessors orreadonly
to generate a getter. There's also the more general property mechanism.
If you get the sense that Cython has a bunch of special rules to learn, that's right.