-
Notifications
You must be signed in to change notification settings - Fork 137
Brooklin Architecture
Brooklin is a Java server application that is typically deployed to a cluster of machines. You can have multiple instances of Brooklin deployed on each machine or just a single instance per machine. All instances of Brooklin offer the exact same set of capabilities.
- The most fundamental concept in Brooklin is
Datastreams
. - A
Datastream
is a description of a data pipe between 2 systems; a source system from which data is streamed and a destination system to which this data is delivered. - Brooklin allows us to create as many
Datastreams
as we need to set up independent data pipes between source and destination systems. - To support high scalability, Brooklin expects the data streamed between source and destination systems to be partitioned. If the data is not partitioned, however, Brooklin considers it to be composed of a single partition.
- Also to support high scalability, Brooklin breaks every
Datastream
whose data is partitioned into multipleDatastreamTasks
, each of which limited to a subset of the partitions, that are all processed concurrently for higher throughput. - Brooklin uses ZooKeeper to store
Datastream
andDatastreamTask
information.
-
Connector
is the abstraction that represents modules that carry out the data streaming. - Different
Connector
implementations can be written to support consuming data from different source systems. - To support producing the consumed data to different destinations,
Connectors
employ a different abstraction:TransportProviders
. - An example
Connector
implementation Brooklin offers isKafkaConnector
, which is intended for consuming data from Kafka.
-
TransportProvider
is the abstraction that represents modules that produce data to destination systems. - Different
TransportProvider
implementations can be written to support producing data to different source systems. - An example
TransportProvider
implementation Brooklin offers isKafkaTransportProvider
, which is intended for producing data to Kafka.
- Brooklin
Coordinator
is the module responsible for managing the differentConnector
implementations, e.g. starting and stoppingConnectors
. - There is only a single
Coordinator
object in everyBrooklin
server app instance. - A
Coordinator
can either be leader or non-leader. - In a Brooklin cluster, only one
Coordinator
is designated leader while the rest remain as non-leaders. - Brooklin employs the Zookeeper election recipe for electing the leader
Coordinator
. - In addition to managing
Connectors
, the leaderCoordinator
is responsible for monitoring otherCoordinators
and dividing the work among the differentCoordinators
by assigning theDatastreamTasks
to them. - The leader
Coordinator
can be configured to doDatastreamTask
assignment using different strategies (implementations ofAssignmentStrategy
). - An example
AssignmentStrategy
offered by Brooklin is theLoadbalancingStrategy
, which causes the leaderCoordinator
to evenly distribute all availableDatastreamTasks
across allCoordinator
instances.
- Brooklin server application is typically deployed to one or more machines, all using ZooKeeper as the source of truth for
Datastream
andDatastreamTask
metadata. - Information about the different instances of Brooklin server app as well as their
DatastreamTask
assignments is also stored in ZooKeeper. - Every Brooklin instance exposes a REST endpoint — aka
Datastream Management Service (DMS)
— that enables CRUD operations onDatastreams
over HTTP.
A good way to understand the architecture of Brooklin is to go through an example workflow of creating a new Datastream
.
The figure below illustrates the main steps of Datastream
creation.
-
A Brooklin client sends a
Datastream
creation request to a Brooklin cluster. -
The request is routed to the
Datastream Management Service
REST endpoint of any instance of the Brooklin server app. -
The
Datastream
data is verified and written to ZooKeeper under a certain znode that the leaderCoordinator
is watching for changes. -
The leader
Coordinator
gets notified of the newDatastream
znode creation. -
The leader
Coordinator
reads the metadata of the newly createdDatastream
and it breaks down into one or moreDatastreamTasks
. It also uses theAssignmentStrategy
of theConnector
specified in theDatastream
to assign the differentDatastreamTasks
to the available instances. This assignment is also persisted in ZooKeeper. -
The affected
Coordinators
get notified of the newDatastreamTask
assignments created under their respective znodes, which they read and start processing immediately.
- Brooklin uses ZooKeeper to store information about:
- Home
- Brooklin Architecture
- Production Use Cases
- Developer Guide
- Documentation
- REST Endpoints
- Connectors
- Transport Providers
- Brooklin Configuration
- Test Driving Brooklin