Skip to content

Kafka Connector

Ahmed Elbahtemy edited this page Apr 23, 2019 · 19 revisions

Contents

Overview

KafkaConnector is intended for consuming a single Kafka topic

Summary

Supports Datastream Updates

Y

Checkpoint Type

Kafka consumer offsets are regularly committed to Kafka

(see commitIntervalMs)

Diagnostics Aware

Y

Configuration

Shared Configuration

Configuration properties shared among all Kafka connectors in Brooklin are documented on this page.

Connector-Specific Configuration

Configuration properties specific to KafkaConnector are listed below.

Property Description Default

isGroupIdHashingEnabled

  • A flag indicating whether Kafka consumer group ID should be hashed

  • If true, the consumer group ID for a datastream is set to <clusterName>.<groupIdMD5Hash>, where:

  • If false, the consumer group ID for a datastream is set to <sourceConnectionString>-to-<destConnectionString>, where:

false

whiteListedClusters

  • A comma-separated list of Kafka brokers, e.g. host1:port1,host2:port2

  • Any datastream using this connector must specify one of these Kafka brokers in its source.connectionString

(None)

Diagnostics

URL

GET /diag?q=:query&type=:component&scope=:scope&content=:componentQuery

URL Params

Required

  • query

    • Possible values are: status or allStatus

    • status retrieves data for a single Brooklin instance

    • allStatus retrieves aggregated data for all Brooklin instances in the cluster

  • type: only supported value is connector

  • scope: Name of the connector to query, as specified in brooklin.server.connectorNames

  • content

    • Represents a subquery to the component (connector) in question

    • Possible values are: datastream_state and position

      Subquery

      datastream_state

      Subquery Params

      Required

      datastreamName: Name of datastream to query

      Example

      datastream_state?datastream=:datastreamName

      Constraints

      • Datastream must exist

      Response

      Content-Type

      application/json

      Schema
      {
        "elements": [
        ],
        "paging": {
          "count": "[int]",
          "start": "[int]",
          "links": "[array]"
        }
      }

      Subquery

      position

      Subquery Params

      Optional

      • offsets

        • A boolean indicating whether to return time-based vs. offset-based position data for the retrieved DatastreamTasks

        • Default is false (time-based)

          Example

          position?offsets=true

          Constraints

          Setting it to true requires enableKafkaPositionTracker to be true

      • datastream: Name of datastream to limit results to

        Example

        position?datastream=:datastreamName

        Constraints

        Datastream must exist

      • topic: Name of Kafka topic to limit results to

        Example

        position?topic=:topicName

        Constraints

        Topic must exist

      Response

      Content-Type

      application/json

      Schema
      {
        "elements": [
        ],
        "paging": {
          "count": "[int]",
          "start": "[int]",
          "links": "[array]"
        }
      }

Metrics

All metric names are prefixed with the connector name as it appears in brooklin.server.connectorNames.

General Metrics

General metrics for all Brooklin connectors are documented on this page.

Aggregate Metrics

  • Aggregate metrics cover all datastreams in a single Brooklin instance.

  • Aggregate metrics prefix: <connectorName>.KafkaConnectorTask.aggregate.

Metric Name Description

clientPollOverTimeout

The number of times polling Kafka consumer exceeds pollTimeoutMs in calls to KafkaConsumer::poll(long) calls, by more than 1 sec

errorRate

The rate of errors encountered when data is dispatched for delivery to the destination system

eventsByteProcessedRate

The rate of bytes processed and dispatched for delivery to destination

eventsProcessedRate

The rate of Kafka record consumption

numAutoPausedPartitionsAwaitingDestTopic

The number of auto-paused topic partitions awaiting destination topic creation

numAutoPausedPartitionsOnError

The number of auto-paused topic partitions due to errors encountered during dispatch for delivery

numAutoPausedPartitionsOnInFlightMessages

The number of auto-paused topic partitions due to exceeding their maximum in-flight messages thresholds

numConfigPausedPartitions

The number of topic partitions paused manually

numPartitions

The number of Kafka topic partitions

numProcessingOverThreshold

The number of times dispatching records to destination exceeds processingDelayLogThreshold

numTopics

The number of Kafka topics

pollIntervalOverSessionTimeout

The number of polls exceeding the maximum session timeout <TODO:?>

rebalanceRate

The rate of rebalances seen by the Kafka consumer

stuckPartitions

The number of stuck topic partitions

Datastream-Specific Metrics

  • Datastream-specific metrics prefix: <connectorName>.KafkaConnectorTask.<datastreamName>.

Metric Name Description

errorRate

The rate of errors encountered when data is dispatched for delivery to the destination system

eventCountsPerPoll

The distribution (histogram) of the number of records retrieved from Kafka in every poll

eventsByteProcessedRate

The rate of bytes processed and dispatched for delivery to destination

eventsProcessedRate

The rate of Kafka record consumption

numAutoPausedPartitionsAwaitingDestTopic

The number of auto-paused topic partitions awaiting destination topic creation

numAutoPausedPartitionsOnError

The number of auto-paused topic partitions due to errors encountered during dispatch for delivery

numAutoPausedPartitionsOnInFlightMessages

The number of auto-paused topic partitions due to exceeding their maximum in-flight messages thresholds

numConfigPausedPartitions

The number of topic partitions paused manually

numPartitions

The number of Kafka topic partitions

numPolls

The rate of polls performed using the Kafka consumer

numProcessingOverThreshold

The number of times dispatching records to destination exceeds processingDelayLogThreshold

numTopics

The number of Kafka topics

pollIntervalOverSessionTimeout

The number of polls exceeding the maximum session timeout <TODO:?>

rebalanceRate

The rate of rebalances seen by the Kafka consumer

stuckPartitions

The number of stuck topic partitions

timeSinceLastEventReceivedMs

The time duration (in milliseconds) since the last non-empty ConsumerRecord was fetched using the Kafka consumer