0.2.7
This release introduces improved performance for control/exec
by default. It adds new features for testing filesystem failures: nemesis/bitflip
, which flips random bits in files, and lazyfs
(experimental, known bugs) which loses un-fsynced writes to files. It fixes several minor bugs--for instance, failing to thread state correctly through the nemesis setup lifecycle--and catches up to new APIs and file locations in recent versions of Debian.
As an aside: getting Jepsen to run in Docker has been an ongoing tirefire for years, and the docs now recommend using plain old LXC or AWS instead.
API Changes
- The default remote for SSH is now
control.sshj
, notcontrol.clj-ssh
. This has been available for a few releases now, and is significantly faster than clj-ssh. This should be a basically transparent change, but some error messages thrown during e.g. unstable connections might change, and you might encounter different behavior around how it handles host and identity keys, agents, etc. - The
db/Process
protocol is now calleddb/Kill
; the metaprogramming hacks we had to do to call itProcess
were fragile under some AOT scenarios.Process
remains as an alias. util/await-fn
now catches allException
s, rather than justRuntimeException
s. It turns out some things you'd really like to retry, like SQL connection exceptions from JDBC, aren'tRuntimeException
s.control.util/grepkill
now usespgrep
to kill processes, rather than grep. Some tests were killing unexpected processes.
Bugfixes
db/Process
could, under some AOT scenarios, get compiled multiple times and fail to register as the same protocol. This meant that tests could quietly fail to actually kill a process because they thought the DB didn't support theProcess
protocol. This should hopefully be fixed now by Even More Metaprogramming Hacks, but we recommend moving todb/Kill
just in case there are more bugs along these lines.lein run serve
no longer trusts the local clock when listing local tests. This should fix issues with copying files from a machine in the future to one in the past, and those tests not showing up until the second node's clock catches up.- The
control.sshj
remote now respects{:dummy? true}
. nemesis.time
's programs for bumping and strobing the clock no longer ran properly on newer platforms, thanks to a change which made it illegal to pass a time and a timezone tosettimeofday
. We didn't change the timezone, but it still failed to run.core/run!
discarded the return value ofnemesis/setup!
and used the original nemesis throughout the test. Now it correctly uses the returned nemesis.nemesis/Validate
returned invocation, not completion ops, and also did nothing after the initialsetup!
call. Both of these bugs are now fixed, which should provide better error guidance to users who make mistakes writing nemeses.tcpdump
is located in/usr/bin
on more recent versions of Debian; we now use the newer path.control/exec
no longer incorrectly reports a command's STDIN asnil
when throwing exceptions.docker/bin/up
works on OS X again.
New Features
jepsen.lazyfs
, an experimental project for simulating the loss of un-fsynced writes, is now available. It does not work correctly--lazyfs has both crash and safety bugs in this version--but it still might help you find bugs.nemesis/bitflip
is a new nemesis which can flip a random fraction of bits in a file. Helpful for fuzzing DBs' ability to handle filesystem corruption.store/fressian
now serializes exceptions as data. A recurring problem in Jepsen tests is having aThrowable
get into the history somewhere, and then exploding the serializer when it comes time to write the test. This is especially frustrating when nothing in the test itself logs that exception--you have no idea where it's coming from. Jepsen now serializes exceptions to data; this will not round-trip properly, but it does help you figure out the exception and operation that went wrong. These exceptions are also logged at level WARN during serialization. At the repl you can load the test and use a new utility function,jepsen.util/deepfind
, to find the offending object.util/rand-exp
generates random, exponentially-distributed values around a given mean.tests.cycle.wr
now has a test constructor and docstring aligned withtests.cycle.list-append
, as well as updated docs.
Small Changes
- We used to round off milliseconds in tests'
:start-time
, but this causes collisions when you run multiple tests in the same second. We now use millisecond resolution again. reconnect
now passes throughInterruptedIOException
in the same way asInterruptedException
, which should speed up/clarify the abort procedure when something goes wrong in e.g. DB setup using thecontrol.sshj
remote.util/stop-daemon!
now throws a timeout when thekill
operation hangs.nemesis.time
now throws more informative errors when compilation fails- New tests for nemeses
- Tests are a little quieter about logging now
- Clojure 1.11.1
- Unilog 0.7.30
- SSHJ 0.33.0
- Fipp 0.6.26
- Elle 0.1.5
- HTTP-kit 2.6.0