Skip to content

Commit

Permalink
Describe GC for free-threaded build in the GC design doc (#1263)
Browse files Browse the repository at this point in the history
Co-authored-by: Ezio Melotti <[email protected]>
  • Loading branch information
colesbury and ezio-melotti authored Feb 6, 2024
1 parent bdec818 commit fe3722d
Showing 1 changed file with 134 additions and 33 deletions.
167 changes: 134 additions & 33 deletions internals/garbage-collector.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,29 @@ is needed to clean these reference cycles between objects once they become
unreachable. This is the cyclic garbage collector, usually called just Garbage
Collector (GC), even though reference counting is also a form of garbage collection.

Starting in version 3.13, CPython contains two GC implementations:

* The default build implementation relies on the :term:`global interpreter
lock` for thread safety.
* The free-threaded build implementation pauses other executing threads when
performing a collection for thread safety.

Both implementations use the same basic algorithms, but operate on different
data structures. The :ref:`gc-differences` section summarizes the
differences between the two GC implementations.


Memory layout and object structure
==================================

The garbage collector requires additional fields in Python objects to support
garbage collection. These extra fields are different in the default and the
free-threaded builds.


GC for the default build
------------------------

Normally the C structure supporting a regular Python object looks as follows:

.. code-block:: none
Expand Down Expand Up @@ -107,6 +127,44 @@ isn't running at all!), and merging partitions, all with a small constant number
With care, they also support iterating over a partition while objects are being added to - and
removed from - it, which is frequently required while GC is running.

GC for the free-threaded build
------------------------------

In the free-threaded build, Python objects contain a 1-byte field
``ob_gc_bits`` that is used to track garbage collection related state. The
field exists in all objects, including ones that do not support cyclic
garbage collection. The field is used to identify objects that are tracked
by the collector, ensure that finalizers are called only once per object,
and, during garbage collection, differentiate reachable vs. unreachable objects.

.. code-block:: none
object -----> +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \
| ob_tid | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |
| pad | ob_mutex | ob_gc_bits | ob_ref_local | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD
| ob_ref_shared | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |
| *ob_type | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /
| ... |
Note that not all fields are to scale. ``pad`` is two bytes, ``ob_mutex`` and
``ob_gc_bits`` are each one byte, and ``ob_ref_local`` is four bytes. The
other fields, ``ob_tid``, ``ob_ref_shared``, and ``ob_type``, are all
pointer-sized (i.e., eight bytes on a 64-bit platform).


The garbage collector also temporarily repurposes the ``ob_tid`` (thread ID)
and ``ob_ref_local`` (local reference count) fields for other purposes during
collections.


C APIs
------

Specific APIs are offered to allocate, deallocate, initialize, track, and untrack
objects with GC support. These APIs can be found in the `Garbage Collector C API
documentation <https://docs.python.org/3.8/c-api/gcsupport.html>`_.
Expand Down Expand Up @@ -139,14 +197,11 @@ the interpreter create cycles everywhere. Some notable examples:
* When representing data structures like graphs, it is very typical for them to
have internal links to themselves.

To correctly dispose of these objects once they become unreachable, they need to be
identified first. Inside the function that identifies cycles, two doubly linked
lists are maintained: one list contains all objects to be scanned, and the other will
contain all objects "tentatively" unreachable.

To understand how the algorithm works, let’s take the case of a circular linked list
which has one link referenced by a variable ``A``, and one self-referencing object which
is completely unreachable:
To correctly dispose of these objects once they become unreachable, they need
to be identified first. To understand how the algorithm works, let’s take
the case of a circular linked list which has one link referenced by a
variable ``A``, and one self-referencing object which is completely
unreachable:

.. code-block:: python
Expand All @@ -171,10 +226,17 @@ is completely unreachable:
>>> gc.collect()
2
When the GC starts, it has all the container objects it wants to scan
on the first linked list. The objective is to move all the unreachable
objects. Since most objects turn out to be reachable, it is much more
efficient to move the unreachable as this involves fewer pointer updates.
The GC starts with a set of candidate objects it wants to scan. In the
default build, these "objects to scan" might be all container objects or a
smaller subset (or "generation"). In the free-threaded build, the collector
always operates scans all container objects.

The objective is to identify all the unreachable objects. The collector does
this by identifying reachable objects; the remaining objects must be
unreachable. The first step is to identify all of the "to scan" objects that
are **directly** reachable from outside the set of candidate objects. These
objects have a refcount larger than the number of incoming references from
within the candidate set.

Every object that supports garbage collection will have an extra reference
count field initialized to the reference count (``gc_ref`` in the figures)
Expand Down Expand Up @@ -273,23 +335,20 @@ Once the GC knows the list of unreachable objects, a very delicate process start
with the objective of completely destroying these objects. Roughly, the process
follows these steps in order:

1. Handle and clean weak references (if any). If an object that is in the unreachable
set is going to be destroyed and has weak references with callbacks, these
callbacks need to be honored. This process is **very** delicate as any error can
cause objects that will be in an inconsistent state to be resurrected or reached
by some Python functions invoked from the callbacks. In addition, weak references
that also are part of the unreachable set (the object and its weak reference
are in cycles that are unreachable) need to be cleaned
immediately, without executing the callback. Otherwise it will be triggered later,
when the ``tp_clear`` slot is called, causing havoc. Ignoring the weak reference's
callback is fine because both the object and the weakref are going away, so it's
legitimate to say the weak reference is going away first.

2. If an object has legacy finalizers (``tp_del`` slot) move them to the
1. Handle and clear weak references (if any). Weak references to unreachable objects
are set to ``None``. If the weak reference has an associated callback, the callback
is enqueued to be called once the clearing of weak references is finished. We only
invoke callbacks for weak references that are themselves reachable. If both the weak
reference and the pointed-to object are unreachable we do not execute the callback.
This is partly for historical reasons: the callback could resurrect an unreachable
object and support for weak references predates support for object resurrection.
Ignoring the weak reference's callback is fine because both the object and the weakref
are going away, so it's legitimate to say the weak reference is going away first.
2. If an object has legacy finalizers (``tp_del`` slot) move it to the
``gc.garbage`` list.
3. Call the finalizers (``tp_finalize`` slot) and mark the objects as already
finalized to avoid calling them twice if they resurrect or if other finalizers
have removed the object first.
finalized to avoid calling finalizers twice if the objects are resurrected or
if other finalizers have removed the object first.
4. Deal with resurrected objects. If some objects have been resurrected, the GC
finds the new subset of objects that are still unreachable by running the cycle
detection algorithm again and continues with them.
Expand All @@ -300,12 +359,12 @@ follows these steps in order:
Optimization: generations
=========================

In order to limit the time each garbage collection takes, the GC uses a popular
optimization: generations. The main idea behind this concept is the assumption that
most objects have a very short lifespan and can thus be collected shortly after their
creation. This has proven to be very close to the reality of many Python programs as
many temporary objects are created and destroyed very fast. The older an object is
the less likely it is that it will become unreachable.
In order to limit the time each garbage collection takes, the GC
implementation for the default build uses a popular optimization:
generations. The main idea behind this concept is the assumption that most
objects have a very short lifespan and can thus be collected soon after their
creation. This has proven to be very close to the reality of many Python
programs as many temporary objects are created and destroyed very quickly.

To take advantage of this fact, all container objects are segregated into
three spaces/generations. Every new
Expand All @@ -317,6 +376,9 @@ the same object survives another GC round in this new generation (generation 1)
it will be moved to the last generation (generation 2) where it will be
surveyed the least often.

The GC implementation for the free-threaded build does not use multiple
generations. Every collection operates on the entire heap.

In order to decide when to run, the collector keeps track of the number of object
allocations and deallocations since the last collection. When the number of
allocations minus the number of deallocations exceeds ``threshold_0``,
Expand Down Expand Up @@ -497,6 +559,45 @@ tracking status of the object.
True
.. _gc-differences:

Differences between GC implementations
======================================

This section summarizes the differences between the GC implementation in the
default build and the implementation in the free-threaded build.

The default build implementation makes extensive use of the ``PyGC_Head`` data
structure, while the free-threaded build implementation does not use that
data structure.

* The default build implementation stores all tracked objects in a doubly
linked list using ``PyGC_Head``. The free-threaded build implementation
instead relies on the embedded mimalloc memory allocator to scan the heap
for tracked objects.
* The default build implementation uses ``PyGC_Head`` for the unreachable
object list. The free-threaded build implementation repurposes the
``ob_tid`` field to store a unreachable objects linked list.
* The default build implementation stores flags in the ``_gc_prev`` field of
``PyGC_Head``. The free-threaded build implementation stores these flags
in ``ob_gc_bits``.


The default build implementation relies on the :term:`global interpreter lock`
for thread safety. The free-threaded build implementation has two "stop the
world" pauses, in which all other executing threads are temporarily paused so
that the GC can safely access reference counts and object attributes.

The default build implementation is a generational collector. The
free-threaded build is non-generational; each collection scans the entire
heap.

* Keeping track of object generations is simple and inexpensive in the default
build. The free-threaded build relies on mimalloc for finding tracked
objects; identifying "young" objects without scanning the entire heap would
be more difficult.


.. admonition:: Document History
:class: note

Expand Down

0 comments on commit fe3722d

Please sign in to comment.