Skip to content

Kafka MirrorMaker Connector

Ahmed Elbahtemy edited this page Apr 24, 2019 · 11 revisions

Contents

Overview

  • KafkaMirrorMakerConnector is intended for consuming multiple topics in a Kafka cluster via regular expression patterns.

  • It is also capable of propagating the data to be produced into multiple topic in the destination Kafka cluster

Summary

Supports Datastream Updates

Y

Checkpoint Type

Kafka consumer offsets are regularly committed to Kafka

(see commitIntervalMs)

Diagnostics Aware

Y

Flushless Produce

Since KafkaMirrorMakerConnector consumes data from the the source Kafka cluster using a Kafka consumer, it needs to periodically commit the offsets it retrieved and processed. To guarantee it is actually safe to do so, it needs to flush all the data it consumed to the DatastreamEventProducer first. In Kafka mirroring scenarios, this would cause the transport provider producing data to the destination Kafka cluster (i.e. KafkaTransportProvider) to flush the Kafka producer(s) it encapsulates — a synchronous operation that may have adverse effects on throughput in high load scenarios.

To address this concern, KafkaMirrorMakerConnector features an optional flushless produce mode, where it avoids flushing the DatastreamEventProducer altogether, and relies on the asynchronous ack notifications it receives from the DatastreamEventProducer to commit the offsets that have been delivered only.

Configuration

Shared Configuration

Configuration properties shared among all Kafka Connectors in Brooklin are documented on this page.

Connector-Specific Configuration

Configuration properties specific to KafkaMirrorMakerConnector are listed below.

Property Description Default

isFlushlessModeEnabled

Indicates whether Flushless Produce mode is enabled

false

flowControlEnabled

  • Indicates whether flow control is enabled

  • Enabling flow control allows auto-pausing topic partitions when their in-flight messages (i.e. messages sent but not yet ack’ed) exceeds maxInFlightMessagesThreshold

  • Such topic partitions are auto-resumed when their in-flight message counts are less than or equal to minInFlightMessagesThreshold

  • Enabling flow control requires setting isFlushlessModeEnabled to true

false

maxInFlightMessagesThreshold

  • The maximum allowed number of in-flight messages for every topic partition, after which it is auto-paused

  • Requires flowControlEnabled to be set to true

5000

minInFlightMessagesThreshold

  • The minimum number of in-flight messages any auto-paused topic partition must reach in order to be auto-resumed

  • Requires flowControlEnabled to be set to true

1000

topicManagerFactory

  • Fully-qualified class name of a TopicManagerFactory implementation

  • Factory is used for creating TopicManager implementations, which are used for all topic management related tasks

topicManager.*

TopicManager configuration properties

(None)

Datastream-Specific Configuration

Property Description Default

system.destination.identityPartitioningEnabled

  • This property can be defined in datastream metadata

  • If set to true, it enables identity partitioning, i.e. dispatching every consumed Kafka record to the same partition in the destination topic, as the one it belonged to in the source topic

false

Metrics

All Kafka connectors share the common metrics documented at Kafka Connectors Shared Configuration and Data Models.