Skip to content

Brooklin Architecture

Ahmed Abdul Hamid edited this page Mar 27, 2019 · 20 revisions

Introduction

Brooklin is a Java server application that is typically deployed to a cluster of machines. You can have multiple instances of Brooklin deployed on each machine or just a single instance per machine. All instances of Brooklin offer the exact same set of capabilities.

Key Concepts

Datastream

  • The most fundamental concept in Brooklin is Datastreams.
  • A Datastream is a description of a data pipe between 2 systems; a source system from which data is streamed and a destination system to which this data is delivered.
  • Brooklin allows us to create as many Datastreams as we need to set up independent data pipes between source and destination systems.
  • To support high scalability, Brooklin expects the data streamed between source and destination systems to be partitioned. If the data is not partitioned, however, Brooklin considers it to be composed of a single partition.
  • Also to support high scalability, Brooklin breaks every Datastream whose data is partitioned into multiple DatastreamTasks, each of which limited to a subset of the partitions, that are all processed concurrently for higher throughput.
  • Brooklin uses ZooKeeper to store Datastream and DatastreamTask information.

Connector

  • Connector is the abstraction that represents modules that carry out the data streaming.
  • Different Connector implementations can be written to support consuming data from different source systems.
  • To support producing the consumed data to different destinations, Connectors employ a different abstraction: TransportProviders.
  • An example Connector implementation Brooklin offers is KafkaConnector, which is intended for consuming data from Kafka.

TransportProvider

  • TransportProvider is the abstraction that represents modules that produce data to destination systems.
  • Different TransportProvider implementations can be written to support producing data to different source systems.
  • An example TransportProvider implementation Brooklin offers is KafkaTransportProvider, which is intended for producing data to Kafka.

Coordinator

  • Brooklin Coordinator is the module responsible for managing the different Connector implementations, e.g. starting and stopping Connectors.
  • There is only a single Coordinator object in every Brooklin server app instance.
  • A Coordinator can either be leader or non-leader.
  • In a Brooklin cluster, only one Coordinator is designated leader while the rest remain as non-leaders.
  • Brooklin employs the Zookeeper election recipe for electing the leader Coordinator.
  • In addition to managing Connectors, the leader Coordinator is responsible for monitoring other Coordinators and dividing the work among the different Coordinators by assigning the DatastreamTasks to them.
  • The leader Coordinator can be configured to do DatastreamTask assignment using different strategies (implementations of AssignmentStrategy).
  • An example AssignmentStrategy offered by Brooklin is the LoadbalancingStrategy, which causes the leader Coordinator to evenly distribute all available DatastreamTasks across all Coordinator instances.

Architecture

  • Brooklin server application is typically deployed to one or more machines, all using ZooKeeper as the source of truth for Datastream and DatastreamTask metadata.
  • Information about the different instances of Brooklin server app as well as their DatastreamTask assignments is also stored in ZooKeeper.
  • Every Brooklin instance exposes a REST endpoint — aka Datastream Management Service (DMS) — that enables CRUD operations on Datastreams over HTTP.

A good way to understand the architecture of Brooklin is to go through an example workflow of creating a new Datastream.

Datastream Creation Workflow

The figure below illustrates the main steps of Datastream creation.

alt text

  1. A Brooklin client sends a Datastream creation request to a Brooklin cluster.

  2. The request is routed to the Datastream Management Service REST endpoint of any instance of the Brooklin server app.

  3. The Datastream data is verified and written to ZooKeeper under a certain znode that the leader Coordinator is watching for changes.

  4. The leader Coordinator gets notified of the new Datastream znode creation.

  5. The leader Coordinator reads the metadata of the newly created Datastream and it breaks down into one or more DatastreamTasks. It also uses the AssignmentStrategy of the Connector specified in the Datastream to assign the different DatastreamTasks to the available instances. This assignment is also persisted in ZooKeeper.

  6. The affected Coordinators get notified of the new DatastreamTask assignments created under their respective znodes, which they read and start processing immediately.

ZooKeeper Data

  • Brooklin uses ZooKeeper to store information about:

alt test