diff --git a/_posts/2023-10-15-dagstuhl.md b/_posts/2023-10-15-dagstuhl.md new file mode 100644 index 0000000000000..478ef298e0bee --- /dev/null +++ b/_posts/2023-10-15-dagstuhl.md @@ -0,0 +1,143 @@ +--- +layout: post +title: "Dagstuhl: Formal Methods for Correct Persistent Programming" +--- + +Last week, I boarded the train to Wadern, Germany (through London, Paris +and Saarbrücken) to attend +[Dagstuhl seminar 23412](https://www.dagstuhl.de/en/seminars/seminar-calendar/seminar-details/23412) +on "Formal Methods for Correct Persistent Programming". + +## What is persistent programming? + +Persistent programming is concerned with memory that retains its contents +after a power loss. +While magnetic disks and solid state disks satisfy that definition, +the main focus was on non-volatile memory that provides fast access +(not too much slower than DRAM) and fine granularity (notionally byte addressable +but, in practice, at the granularity of a cache line (64 bytes)). + +Non-volatile memory can be built using DRAM and a battery (and some non-volatile +memories require the use of battery-backed caches for safety) but it is usually based +on some more exotic technology such as memristors, spin, ferro-electric FETs, +phase change, etc. +These technologies promise to provide much higher capacity at much lower +cost per gigabyte and much lower power consumption. +Sadly, for the field, the [most prominent non-volatile memory product](https://en.wikipedia.org/wiki/3D_XPoint) +was shut down last summer but we did not feel that this was the end of +persistent systems. + +## What makes persistent programming hard? + +There seem to be two main issues in persistent programming. +The first is that your code has to be able to cope with a power failure at +any moment so every write to non-volatile memory must take care that + +- Any data that it refers to is also in non-volatile memory because, + otherwise, that data would be lost if the power failed. + +- It is possible to recover from a power failure after + every single write. Your data structure in non-volatile + memory must never be in an inconsistent state. + +These properties are checked with type systems and/or formal verification +of the software. + +The second, and more challenging problem, is that, for performance +reasons, memory systems reorder memory reads and writes +so even if you write to address 'x' then address 'y' and then print +"It is now safe to turn your computer off" on the screen, those +three steps could occur in the reverse order. +That is, all modern computer systems provide some form of "weak memory consistency." +Obviously, this reordering makes it harder to ensure that +the data structure in the non-volatile memory is always in a recoverable +state. + +To avoid problems with weak memory, you need to add memory fences and flush cache +contents out to the non-volatile memory but, if you do this too much, +performance will suffer. +So the trick is to add the bare minimum of fences and flushes to ensure +that your data is safe. +Unsurprisingly, this is what most of our discussion centered on with questions like + +1. What exactly do the fence and flush instructions guarantee? + The more precisely we understand what they do, the closer we can get to + the edge (without stepping over). + + Note that different ISAs have different properties so sometimes you + need different code for different systems to make it simultaneously + fast and sound. + +2. How can we confirm that our understanding of the memory system is correct? + Power cycling is too slow to test exhaustively by repeatedly + turning off the power and checking that the + memory got the values that we expect! + +3. How should we formalize our understanding of the memory + system's guarantees? The most popular approach seemed + to use the "cat" declarative notation that is supported by the + ["herd"](http://diy.inria.fr/) tools and others. + +4. Is one memory system stronger than another? + A program written for one memory system should also + work for any stronger memory system. + +5. Given an algorithm or a C program or a machine-code program, + can we formally verify that it contains enough + fences and flushes to ensure that we can always + recover from a power failure. + +Lots of fun stuff to discuss! + +## What else was discussed? + +A related, but subtly different, topic is intermittent computing. +This is concerned with systems that use energy harvesting such as +solar power, vibrations, RF power, etc. instead of having +mains power or a long-life battery. +The challenge here is that you may only have a few milliseconds +of power before it disappears with no warning. +To deal with this, your program must contain frequent 'checkpoints' +where the state of the system is saved to non-volatile memory +and the code between checkpoints must be idempotent because +that code will be re-executed when power is restored. +(There are usually no weak memory effects to worry about because +embedded systems are designed with a different set of performance +constraints.) + +And, of course, you don't need special hardware to get persistence: +your filesystem provides a reliable way to save the state of your +system to disk. +So it's possible to create interesting persistent applications and tools that +anybody can run. + + +## What is Dagstuhl? + +To quote the [Dagstuhl website](https://www.dagstuhl.de/en/institute/organization), +"Schloss Dagstuhl, the Leibniz Center for Informatics was originally founded in 1990 +to provide a retreat for world class research and training in computer science." +In practice, this means that 30 or so researchers travel to a place out in the +Saarland countryside that provides the perfect environment for focusing on a topic and meeting +other researchers working in the field. +A few things that stand out are the excellent conference facilities; the many smaller meeting rooms +in case a group want to split off for a more focused discussion; the kitchen staff +randomly assigning seats at lunch and dinner; and the potential for walks in +the surrounding countryside or billiards in the games room. + +[Dagstuhl](https://www.dagstuhl.de/en) is well known among researchers in +continental Europe but, even if you have never heard of Dagstuhl, you have +probably heard of [DBLP](https://dblp.org/) the CS publication database. + +During the week, we shared Dagstuhl with a seminar on [Accountable Software +Systems](https://www.dagstuhl.de/en/seminars/seminar-calendar/seminar-details/23411). +In the mornings and evenings we would talk to those researchers about their research +and how their seminar was going. +There were some similarities to our group (some of them also work on formal methods) but +they were a much more diverse group with lawyers, control theorists, mathematicians, etc. +so they were working hard to build a shared understanding of what accountability is +while we largely understood the problems and were working on solutions. +A very different meeting! + + +