Skip to content

Commit

Permalink
New post about Dagstuhl
Browse files Browse the repository at this point in the history
  • Loading branch information
alastairreid committed Oct 15, 2023
1 parent 88fcf20 commit b3207cf
Showing 1 changed file with 143 additions and 0 deletions.
143 changes: 143 additions & 0 deletions _posts/2023-10-15-dagstuhl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
---
layout: post
title: "Dagstuhl: Formal Methods for Correct Persistent Programming"
---

Last week, I boarded the train to Wadern, Germany (through London, Paris
and Saarbrücken) to attend
[Dagstuhl seminar 23412](https://www.dagstuhl.de/en/seminars/seminar-calendar/seminar-details/23412)
on "Formal Methods for Correct Persistent Programming".

## What is persistent programming?

Persistent programming is concerned with memory that retains its contents
after a power loss.
While magnetic disks and solid state disks satisfy that definition,
the main focus was on non-volatile memory that provides fast access
(not too much slower than DRAM) and fine granularity (notionally byte addressable
but, in practice, at the granularity of a cache line (64 bytes)).

Non-volatile memory can be built using DRAM and a battery (and some non-volatile
memories require the use of battery-backed caches for safety) but it is usually based
on some more exotic technology such as memristors, spin, ferro-electric FETs,
phase change, etc.
These technologies promise to provide much higher capacity at much lower
cost per gigabyte and much lower power consumption.
Sadly, for the field, the [most prominent non-volatile memory product](https://en.wikipedia.org/wiki/3D_XPoint)
was shut down last summer but we did not feel that this was the end of
persistent systems.

## What makes persistent programming hard?

There seem to be two main issues in persistent programming.
The first is that your code has to be able to cope with a power failure at
any moment so every write to non-volatile memory must take care that

- Any data that it refers to is also in non-volatile memory because,
otherwise, that data would be lost if the power failed.

- It is possible to recover from a power failure after
every single write. Your data structure in non-volatile
memory must never be in an inconsistent state.

These properties are checked with type systems and/or formal verification
of the software.

The second, and more challenging problem, is that, for performance
reasons, memory systems reorder memory reads and writes
so even if you write to address 'x' then address 'y' and then print
"It is now safe to turn your computer off" on the screen, those
three steps could occur in the reverse order.
That is, all modern computer systems provide some form of "weak memory consistency."
Obviously, this reordering makes it harder to ensure that
the data structure in the non-volatile memory is always in a recoverable
state.

To avoid problems with weak memory, you need to add memory fences and flush cache
contents out to the non-volatile memory but, if you do this too much,
performance will suffer.
So the trick is to add the bare minimum of fences and flushes to ensure
that your data is safe.
Unsurprisingly, this is what most of our discussion centered on with questions like

1. What exactly do the fence and flush instructions guarantee?
The more precisely we understand what they do, the closer we can get to
the edge (without stepping over).

Note that different ISAs have different properties so sometimes you
need different code for different systems to make it simultaneously
fast and sound.

2. How can we confirm that our understanding of the memory system is correct?
Power cycling is too slow to test exhaustively by repeatedly
turning off the power and checking that the
memory got the values that we expect!

3. How should we formalize our understanding of the memory
system's guarantees? The most popular approach seemed
to use the "cat" declarative notation that is supported by the
["herd"](http://diy.inria.fr/) tools and others.

4. Is one memory system stronger than another?
A program written for one memory system should also
work for any stronger memory system.

5. Given an algorithm or a C program or a machine-code program,
can we formally verify that it contains enough
fences and flushes to ensure that we can always
recover from a power failure.

Lots of fun stuff to discuss!

## What else was discussed?

A related, but subtly different, topic is intermittent computing.
This is concerned with systems that use energy harvesting such as
solar power, vibrations, RF power, etc. instead of having
mains power or a long-life battery.
The challenge here is that you may only have a few milliseconds
of power before it disappears with no warning.
To deal with this, your program must contain frequent 'checkpoints'
where the state of the system is saved to non-volatile memory
and the code between checkpoints must be idempotent because
that code will be re-executed when power is restored.
(There are usually no weak memory effects to worry about because
embedded systems are designed with a different set of performance
constraints.)

And, of course, you don't need special hardware to get persistence:
your filesystem provides a reliable way to save the state of your
system to disk.
So it's possible to create interesting persistent applications and tools that
anybody can run.


## What is Dagstuhl?

To quote the [Dagstuhl website](https://www.dagstuhl.de/en/institute/organization),
"Schloss Dagstuhl, the Leibniz Center for Informatics was originally founded in 1990
to provide a retreat for world class research and training in computer science."
In practice, this means that 30 or so researchers travel to a place out in the
Saarland countryside that provides the perfect environment for focusing on a topic and meeting
other researchers working in the field.
A few things that stand out are the excellent conference facilities; the many smaller meeting rooms
in case a group want to split off for a more focused discussion; the kitchen staff
randomly assigning seats at lunch and dinner; and the potential for walks in
the surrounding countryside or billiards in the games room.

[Dagstuhl](https://www.dagstuhl.de/en) is well known among researchers in
continental Europe but, even if you have never heard of Dagstuhl, you have
probably heard of [DBLP](https://dblp.org/) the CS publication database.

During the week, we shared Dagstuhl with a seminar on [Accountable Software
Systems](https://www.dagstuhl.de/en/seminars/seminar-calendar/seminar-details/23411).
In the mornings and evenings we would talk to those researchers about their research
and how their seminar was going.
There were some similarities to our group (some of them also work on formal methods) but
they were a much more diverse group with lawyers, control theorists, mathematicians, etc.
so they were working hard to build a shared understanding of what accountability is
while we largely understood the problems and were working on solutions.
A very different meeting!



0 comments on commit b3207cf

Please sign in to comment.