Liftbridge provides lightweight, fault-tolerant message streams by implementing a durable stream augmentation for the NATS messaging system. It extends NATS with a Kafka-like publish-subscribe log API that is highly available and horizontally scalable. Use Liftbridge as a simpler and lighter alternative to systems like Kafka and Pulsar or use it to add streaming semantics to an existing NATS deployment.
See the introduction post on Liftbridge and this post for more context and some of the inspiration behind it.
- Log-based API for NATS
- Replicated for fault-tolerance
- Horizontally scalable
- Wildcard subscription support
- At-least-once delivery support
- Message key-value support
- Log compaction by key
- Single static binary (~16MB)
- Designed to be high-throughput (more on this to come)
- Supremely simple
Liftbridge is a server that implements a durable, replicated message log for NATS. Clients create a named stream which is attached to a NATS subject. The stream then records messages on that subject to a replicated write-ahead log. Multiple consumers can read back from the same stream, and multiple streams can be attached to the same subject. Liftbridge provides a Kafka-like API in front of NATS.
Liftbridge was designed to bridge the gap between sophisticated log-based messaging systems like Apacha Kafka and Apache Pulsar and simpler, cloud-native systems. There is no ZooKeeper or other unwieldy dependencies, no JVM, no complicated API, and client libraries are just gRPC. More importantly, Liftbridge aims to extend NATS with a durable, at-least-once delivery mechanism that upholds the NATS tenets of simplicity, performance, and scalability. Unlike NATS Streaming, it uses the core NATS protocol with optional extensions. This means it can be added to an existing NATS deployment to provide message durability with no code changes.
NATS Streaming provides a similar log-based messaging solution. However, it is an entirely separate protocol built on top of NATS. NATS is simply the transport for NATS Streaming. This means there is no "cross-talk" between messages published to NATS and messages published to NATS Streaming.
Liftbridge was built to augment NATS with durability rather than providing a completely separate system. NATS Streaming also provides a broader set of features such as durable subscriptions, queue groups, pluggable storage backends, and multiple fault-tolerance modes. Liftbridge aims to have a small API surface area.
The key features that differentiate Liftbridge are the shared message namespace, wildcards, log compaction, and horizontal scalability. NATS Streaming replicates channels to the entire cluster through a single Raft group. Liftbridge allows replicating to a subset of the cluster, and each stream is replicated independently. This allows the cluster to scale horizontally.
Liftbridge scales horizontally by adding more brokers to the cluster and creating more streams which are distributed among the cluster. In effect, this splits out message routing from storage and consumption, which allows Liftbridge to scale independently and eschew subject partitioning. Alternatively, streams can join a load-balance group, which effectively load balances a NATS subject among the streams in the group without affecting delivery to other streams.
High availability is achieved by replicating the streams. When a stream is
created, the client specifies a replicationFactor
, which determines the
number of brokers to replicate the stream. Each stream has a leader who is
responsible for handling reads and writes. Followers then replicate the log
from the leader. If the leader fails, one of the followers can set up to
replace it. The replication protocol closely resembles that of Kafka, so there
is much more nuance to avoid data consistency problems. See the
replication protocol documentation for
more details.
Benchmarks soon to come...
No, this project is early and still evolving.
$ go get github.com/liftbridge-io/liftbridge
Liftbridge currently relies on an externally running
NATS server. By default, it will connect
to a NATS server running on localhost. The --nats-servers
flag allows
configuring the NATS server(s) to connect to.
Also note that Liftbridge is clustered by default and relies on Raft for
coordination. This means a cluster of three or more servers is normally run
for high availability, and Raft manages electing a leader. A single server is
actually a cluster of size 1. For safety purposes, the server cannot elect
itself as leader without using the --raft-bootstrap-seed
flag, which will
indicate to the server to elect itself as leader. This will start a single
server that can begin handling requests. Use this flag with caution as it should
only be set on one server when bootstrapping a cluster.
$ liftbridge --raft-bootstrap-seed
INFO[2018-07-05 16:29:44] Server ID: kn3MGwCL3TKRNyGS9bZLgH
INFO[2018-07-05 16:29:44] Namespace: liftbridge-default
INFO[2018-07-05 16:29:44] Starting server on :9292...
INFO[2018-07-05 16:29:46] Server became metadata leader, performing leader promotion actions
Once a leader has been elected, other servers will automatically join the cluster.
We set the --data-dir
and --port
flags to avoid clobbering the first server.
$ liftbridge --data-dir /tmp/liftbridge/server-2 --port=9293
INFO[2018-07-05 16:39:21] Server ID: 32CpplyaA031EFEW1DQzx6
INFO[2018-07-05 16:39:21] Namespace: liftbridge-default
INFO[2018-07-05 16:39:21] Starting server on :9293...
We can also bootstrap a cluster by providing the explicit cluster configuration.
To do this, we provide the IDs of the participating peers in the cluster using the
--raft-bootstrap-peers
flag. Raft will then handle electing a leader.
$ liftbridge --raft-bootstrap-peers server-2,server-3
In addition to the command-line flags, Liftbridge can be fully configured using
a configuration file which is passed in using the --config
flag.
$ liftbridge --config liftbridge.conf
An example configuration file is shown below.
listen: localhost:9293
data.dir: /tmp/liftbridge/server-2
log.level: debug
# Define NATS cluster to connect to.
nats {
servers: ["nats://localhost:4300", "nats://localhost:4301"]
}
# Specify message log settings.
log {
retention.max.age: "24h"
}
# Specify cluster settings.
clustering {
server.id: server-2
raft.logging: true
raft.bootstrap.seed: true
replica.max.lag.time: "20s"
}
See the configuration documentation for full details on server configuration.
Currently, there is only a high-level Go client library available. However, Liftbridge uses gRPC for its client API, so client libraries can be generated quite easily using the Liftbridge protobuf definitions.
- Basic documentation
- Overview
- FAQ
- Config
- Cluster bootstrapping
- Core concepts
- Replication protocol
- Configuring for HA and consistency
- Feature matrix
- Setup CI
- Production-hardening
- TLS support
- Configurable acks
- Log retention by message age
- Log retention by number of messages
- Log compaction by key
- Consumer-offset checkpointing in the log
- Minimum ISR support
- Additional subscribe semantics
- Oldest
- Newest
- New messages only
- By timestamp
- By time delta
- Single-stream fanout
- Opt-in ISR replica reads
- Read-replica support
- Authentication and authorization
- Embedded NATS server option
- Better instrumentation/observability
- Derek Collison and NATS team for building NATS and NATS Streaming and providing lots of inspiration.
- Travis Jeffery for building Jocko, a Go implementation of Kafka. The Liftbridge log implementation builds heavily upon the commit log from Jocko.
- Apache Kafka for inspiring large parts of the design, particularly around replication.