TSM is a way to provide safe memory access for lockfree data structures.
Every heap address H
“has a type” because a type ID can be computed from
H
. linalloc()
allocates memory of a given type. linref_up(H, T)
ensures
that H
has type T
and isn’t reallocated with a different type.
The point is that, unlike schemes which block reuse of referenced memory,
TSM allows H
to be freed and reused by an allocation of type T
.
For example, consider a doubly-linked list used in a lockfree kernel:
- Suppose
lflist_del(A)
has readP := A->prev
and decided to writeP->next
with a CAS. IfP
is freed and reallocated as a string, the CAS can spuriously succeed if*P
is an unlucky string which resembles the expected list node.linref_up(P, "list node")
prevents this.
linref_up()
’s guarantee is weak, but the usual stronger guarantee turns
out to be irrelevant in a robust program.
lflist_enq()
can’t make allocations because its callers can’t really afford to handle allocation failures. So the list operates on “anchors” preallocated inside enqueuable objects. Successive enqueues of an object must reuse its anchor.- Lockfree progress would be impractical, at best, if
lflist_enq(A)
weren't free to modifyA
because some out of date lflist function were operating onA
. - So every lflist function must be prepared to deal with any anchor
being deleted and re-enqueued just before a
CAS
. Generally, they must at least make everyCAS
contingent on a version count that’s been live for some critical interval. linref_up()
proves that such a count exists, and it wouldn’t make the lflist simpler or more efficient if you knew thatA
also wasn’t freed for this interval.
As I describe, TSM has a simple and cheap refcounting implementation. However, it should be compatible with other schemes for keeping track of references.
Nalloc is a TSM-enabled slab allocator. It allocates “lineages” from “heritages” in O(1) uncontended.
TSM is a minor addition to a slab allocator. Nalloc is just a basic (lockfree) slab allocator plus a refcount and type object pointer per slab.
A lineage is a block of memory. Lineage L
has a type in the sense that a
pointer to a type object Y
can be computed from L
’s address.
-
Y
stores the size of lineages of its type; and a lineage initialization functionlin_init()
, to be run during allocation. -
Roughly speaking, if
linref_up(L, Y)
succeeds, the following holds untillinref_down(L, Y)
:- L was last allocated with
linalloc("Y")
, and thusY->lin_init(L)
completed. linref_up(L, Y')
will succeed iffY' == Y
- in any thread.- If
L
is freed and reallocated:- It must be reallocated with type
Y
. - nalloc won’t write to any part of
L
other than its lowest word. This means thatY->lin_init(L)
won’t run again.
- It must be reallocated with type
- L was last allocated with
-
In other words, all threads will agree that
L
has typeY
, and has been initialized according toY
. If threads cooperate to make sureL
satisfies some invariant ofY
, nalloc won’t ruin it by clobberingL
. -
See nalloc.h for a more precise description.
-
It’s called a lineage because it represents multiple generations of a C memory object.
A slab is an array of lineages and some metadata in a footer. All lineages in a given slab have the same type and thus size. All slabs have the same size and are naturally aligned, so that the slab of a lineage can be computed directly from its address.
- Each slab stores a refcount and a pointer to struct type, and
L
’s type is the type of its containing slab.linref_up(L,_)
is just aCAS2
onslab_of(L)
’s type fields.
A heritage is a pool of slabs having the same type.
- NB: they can be thread/CPU-local but wk doesn’t exploit this.
nalloc defines some internal heritages with “types” meant to represent
plain char buffers of different size classes. malloc()
is just a wrapper
around linalloc()
with one these heritages.
Lockfree slab allocation is relatively simple. nalloc elaborates on the basic idea of using lfstacks to represent free lineages in a slab and free slabs in a heritage.
linalloc(H)
pops a slab S
from heritage H
, fetches a lineage from S
, and
returns S
to H
iff S
has lineages remaining. If not, S
is said to be
“lost”.
- NB: empty slabs aren’t returned so that search is
O(1)
. The main challenge of nalloc is to make this work without leaking memory.
S
splits its free lineages into 3 sets. This is kind of a relic of old
designs, but it still has benefit.
- A “main” non-lockfree stack
S->local_blocks
. - An lfstack
S->hot_blocks
, on a separate cache line.- NB: it accumulates freed lineages and is collected in batches. It lets you avoid locked instructions and shared cache line writes.
- The maximal contiguous set of free lineages beginning at the start
of the slab is represented only as a count
S->contig_blocks
.- NB: it makes slab initializion cheaper. It encourages cache- and prefetcher-friendly access patterns, not just inside nalloc.
Slabs are initialized with all lineages in contig_blocks
.
alloc_from_slab(S)
in linalloc()
attempts to allocate from
contig_blocks
, local_blocks
, and hot_blocks
, in that order.
- If it empties local_blocks, it uses a variant of
lstack_clear()
to transfer all lineages fromhot_blocks
ontolocal_blocks
inO(1)
.
linfree(L)
pushes L
to slab_of(L)->hot_blocks
, freeing the slab when
appropriate.
The main subtlety is that linfree(L)
needs to know whether S := slab_of(L)
has been lost, in which case it must return S
to its former heritage H
or
arrange for it to be freed.
- Multiple
linfree(L)
calls can findS
lost, but only one should return it because my lockfree stack deliberately doesn’t support concurrent pushes of the same “node”. More subtly, it’s not simple forlinfree()
to agree with an in-flightlinalloc()
on whetherS
is actually lost. - The key observation is that
S->hot_blocks
is always cleared entirely with anlfstack_clear()
. Unlike a stack subjected to the well-known lockfreepop()
algorithm,S->hot_blocks
doesn’t need to store a version count. - So
S->hot_blocks.gen
instead stores an authoritative “lost” flag and the stack’s size. The flag is set whenlinalloc()
findsS->hot_blocks
empty after also exhausting the other sets. It’s unset by the nextlinfree()
toS
. - If this
linfree()
decides to freeS
, it suffices to not return it toH
. No future allocations fromS
will happen, and the finallinfree()
toS
will know to free it.
The biggest problem with nalloc is probably reporting failure in
linref_up(P)
when P
isn’t on the nalloc heap. If it fails to detect this
case, then it may spuriously write to P’s non-existant slab metadata.
- Luckily, wk can easily validate
P
because its heap is contiguous and non-shrinking. - The userspace heap isn’t, since it uses non-fixed mappings and should return memory to the system. Further, the user may have made mappings outside of nalloc’s control.
- Unfortunately, the current solution for userspace nalloc is to just not return virtual memory to the kernel. It can remap free slabs as “guard” pages to return physical memory, but the current implementation doesn’t.
- This way, if
P
was previously on the heap, thenlinref_up(P)
can expect a segfault becauseP
couldn’t have been remapped outside of nalloc - barring unsafe fixed-mapping tricks outside of nalloc. - If
P
was never on the heap, thenlinref_up(P, Y)
can still spuriously succeed ifP
is on a non-nalloc-mapped page that resembles a slab of typeY
. I don’t expect a program to generate such aP
unless it takes pointers from untrusted sources. In that case, it’s not secure.