Streaming Text Files to Kafka

Use Case Overview

In this use case, we create Brooklin datastreams to publish text file contents to a locally deployed instance of Apache Kafka.

Use Case Summary

Source: File System
Destination: Kafka
Connector: FileConnector
Transport Provider: KafkaTransportProvider

Instructions

1. Set up Kafka

Download the latest Kafka tarball and untar it.

tar -xzf kafka_2.12-2.2.0.tgz
cd kafka_2.12-2.2.0

Start a ZooKeeper server

bin/zookeeper-server-start.sh config/zookeeper.properties

Start a Kafka server

bin/kafka-server-start.sh config/server.properties

2. Set up Brooklin

Download the latest tarball (tgz) from Brooklin releases.

Untar the Brooklin tarball

tar -xzf brooklin-1.0.0.tgz
cd brooklin-1.0.0

Run Brooklin

bin/brooklin-server-start.sh config/server.properties

3. Create a Datastream

Create a datastream to stream the contents of any file of your choice to Kafka.

# Replace NOTICE below with a file path of your choice or leave it as 
# is if you would like to use the NOTICE file as an example text file
bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n first-file-datastream -s NOTICE -c file -p 1 -t kafka -m '{"owner":"test-user"}'

Here are the options we used to create this datastream:

-o CREATE                      The operation is datastream creation
-u http://localhost:32311/     Datstream Management Service URI
-n first-file-datastream       Datastream name
-s NOTICE                      Datastream source URI (source file path in this case)
-c file                        Connector name ("file" refers to FileConnector)
-p 1                           Number of source partitions
-t kafka                       Transport provider name ("kafka" refers to KafkaTransportProvider)
-m '{"owner":"test-user"}'     Datastream metadata (specifying datastream owner is mandatory)

Verify the datastream creation by requesting all datastream metadata from Brooklin using the command line REST client.
```
bin/brooklin-rest-client.sh -o READALL -u http://localhost:32311/
```
You can also view the streaming progress by querying the diagnostics REST endpoint of the Datastream Management Service.
```
curl -s "http://localhost:32311/diag?scope=file&type=connector&q=status&content=position?"
```
Additionally, you can view some more information about the different Datastreams and DatastreamTasks by querying the health monitoring REST endpoint of the Datastream Management Service.
```
curl -s "http://localhost:32311/health"
```

4. Verify the Data Transfer to Kafka

Verify a Kafka topic has been created to hold the data of your newly created datastream. The topic name will have the datastream name (i.e. first-file-datastream) as a prefix.
```
cd <kafka-dir>  # Replace with Kafka directory
bin/kafka-topics.sh --list --bootstrap-server localhost:9092
```

Print the Kafka topic contents

# Replace <topic-name> below with name of Kafka topic
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic <topic-name> --from-beginning

5. Create More Datastreams

Feel free to create more datastreams to publish more files to Kafka.

6. Stop Brooklin, Kafka, and ZooKeeper

When you are done, run the following commands to stop all running apps.

# Replace <brooklin-dir> and <kafka-dir> with Brooklin and Kafka directories, respectively
<brooklin-dir>/bin/brooklin-server-stop.sh
<kafka-dir>/bin/kafka-server-stop.sh
<kafka-dir>/bin/zookeeper-server-stop.sh

Home
Brooklin Architecture
Production Use Cases
- Mirroring Kafka Clusters
Developer Guide
Documentation
- REST Endpoints
- Connectors
- Transport Providers
  - KafkaTransportProvider
Brooklin Configuration
Test Driving Brooklin
- Streaming Text Files to Kafka
- Propagating Changes in One Directory to Another

Provide feedback

Saved searches

Use saved searches to filter your results more quickly