Skip to content

0.2.7

Compare
Choose a tag to compare
@aphyr aphyr released this 30 Jun 16:07
· 240 commits to main since this release

This release introduces improved performance for control/exec by default. It adds new features for testing filesystem failures: nemesis/bitflip, which flips random bits in files, and lazyfs (experimental, known bugs) which loses un-fsynced writes to files. It fixes several minor bugs--for instance, failing to thread state correctly through the nemesis setup lifecycle--and catches up to new APIs and file locations in recent versions of Debian.

As an aside: getting Jepsen to run in Docker has been an ongoing tirefire for years, and the docs now recommend using plain old LXC or AWS instead.

API Changes

  • The default remote for SSH is now control.sshj, not control.clj-ssh. This has been available for a few releases now, and is significantly faster than clj-ssh. This should be a basically transparent change, but some error messages thrown during e.g. unstable connections might change, and you might encounter different behavior around how it handles host and identity keys, agents, etc.
  • The db/Process protocol is now called db/Kill; the metaprogramming hacks we had to do to call it Process were fragile under some AOT scenarios. Process remains as an alias.
  • util/await-fn now catches all Exceptions, rather than just RuntimeExceptions. It turns out some things you'd really like to retry, like SQL connection exceptions from JDBC, aren't RuntimeExceptions.
  • control.util/grepkill now uses pgrep to kill processes, rather than grep. Some tests were killing unexpected processes.

Bugfixes

  • db/Process could, under some AOT scenarios, get compiled multiple times and fail to register as the same protocol. This meant that tests could quietly fail to actually kill a process because they thought the DB didn't support the Process protocol. This should hopefully be fixed now by Even More Metaprogramming Hacks, but we recommend moving to db/Kill just in case there are more bugs along these lines.
  • lein run serve no longer trusts the local clock when listing local tests. This should fix issues with copying files from a machine in the future to one in the past, and those tests not showing up until the second node's clock catches up.
  • The control.sshj remote now respects {:dummy? true}.
  • nemesis.time's programs for bumping and strobing the clock no longer ran properly on newer platforms, thanks to a change which made it illegal to pass a time and a timezone to settimeofday. We didn't change the timezone, but it still failed to run.
  • core/run! discarded the return value of nemesis/setup! and used the original nemesis throughout the test. Now it correctly uses the returned nemesis.
  • nemesis/Validate returned invocation, not completion ops, and also did nothing after the initial setup! call. Both of these bugs are now fixed, which should provide better error guidance to users who make mistakes writing nemeses.
  • tcpdump is located in /usr/bin on more recent versions of Debian; we now use the newer path.
  • control/exec no longer incorrectly reports a command's STDIN as nil when throwing exceptions.
  • docker/bin/up works on OS X again.

New Features

  • jepsen.lazyfs, an experimental project for simulating the loss of un-fsynced writes, is now available. It does not work correctly--lazyfs has both crash and safety bugs in this version--but it still might help you find bugs.
  • nemesis/bitflip is a new nemesis which can flip a random fraction of bits in a file. Helpful for fuzzing DBs' ability to handle filesystem corruption.
  • store/fressian now serializes exceptions as data. A recurring problem in Jepsen tests is having a Throwable get into the history somewhere, and then exploding the serializer when it comes time to write the test. This is especially frustrating when nothing in the test itself logs that exception--you have no idea where it's coming from. Jepsen now serializes exceptions to data; this will not round-trip properly, but it does help you figure out the exception and operation that went wrong. These exceptions are also logged at level WARN during serialization. At the repl you can load the test and use a new utility function, jepsen.util/deepfind, to find the offending object.
  • util/rand-exp generates random, exponentially-distributed values around a given mean.
  • tests.cycle.wr now has a test constructor and docstring aligned with tests.cycle.list-append, as well as updated docs.

Small Changes

  • We used to round off milliseconds in tests' :start-time, but this causes collisions when you run multiple tests in the same second. We now use millisecond resolution again.
  • reconnect now passes through InterruptedIOException in the same way as InterruptedException, which should speed up/clarify the abort procedure when something goes wrong in e.g. DB setup using the control.sshj remote.
  • util/stop-daemon! now throws a timeout when the kill operation hangs.
  • nemesis.time now throws more informative errors when compilation fails
  • New tests for nemeses
  • Tests are a little quieter about logging now
  • Clojure 1.11.1
  • Unilog 0.7.30
  • SSHJ 0.33.0
  • Fipp 0.6.26
  • Elle 0.1.5
  • HTTP-kit 2.6.0