-
Notifications
You must be signed in to change notification settings - Fork 137
Streaming Text Files to Kafka
In this use case, we use Brooklin to create datastreams to publish text file content to a locally deployed instance of Apache Kafka.
- Source: File System
- Destination: Kafka
- Connector:
FileConnector
- Transport Provider:
KafkaTransportProvider
- Download the latest Kafka tarball and untar it.
tar -xzf kafka_2.12-2.2.0.tgz cd kafka_2.12-2.2.0
- Start a ZooKeeper server
bin/zookeeper-server-start.sh config/zookeeper.properties
- Start a Kafka server
bin/kafka-server-start.sh config/server.properties
- Download the latest tarball (tgz) from Brooklin releases to a convenient location on your computer.
- Untar the Brooklin tarball
tar -xzf brooklin-1.0.0.tgz cd brooklin-1.0.0
- Run Brooklin
bin/brooklin-server-start.sh config/server.properties
-
Create a datastream to stream the contents of any file of your choice to Kafka.
# Replace NOTICE below with a file path of your choice or leave it as # is if you would like to use the NOTICE file as an example text file bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n first-file-datastream -s NOTICE -c file -p 1 -t kafka -m '{"owner":"test-user"}'
Here are the options we used to create this datastream:
-o CREATE The operation is datastream creation -u http://localhost:32311/ Datstream Management Service URI -n first-file-datastream Datastream name -s NOTICE Datastream source URI (source file path in this case) -c file Connector name ("file" refers to FileConnector) -p 1 Number of source partitions -t kafka Transport provider name ("kafka" refers to KafkaTransportProvider) -m '{"owner":"test-user"}' Datastream metadata (specifying datastream owner is mandatory)
-
Verify the datastream creation by requesting all datastream metadata from Brooklin.
bin/brooklin-rest-client.sh -o READALL -u http://localhost:32311/
-
You can also view the streaming progress by querying the diagnostics REST endpoint of the Datastream Management Service.
curl -s "http://localhost:32311/diag?scope=file&type=connector&q=status&content=position?"
-
Verify a Kafka topic has been created to hold the data of your newly created datastream. The topic name will have the datastream name (i.e.
first-file-datastream
) as a prefix.cd <kafka-dir> # Replace with Kafka directory bin/kafka-topics.sh --list --bootstrap-server localhost:9092
-
Print the Kafka topic contents
# Replace <topic-name> below with name of Kafka topic bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic <topic-name> --from-beginning
Feel free to create more datastreams to publish more files to Kafka.
When you are done, run the following commands to stop all running apps.
# Replace <brooklin-dir> and <kafka-dir> with Brooklin and Kafka directories, respectively
<brooklin-dir>/bin/brooklin-server-stop.sh
<kafka-dir>/bin/kafka-server-stop.sh
<kafka-dir>/bin/zookeeper-server-stop.sh
- Home
- Brooklin Architecture
- Production Use Cases
- Developer Guide
- Documentation
- REST Endpoints
- Connectors
- Transport Providers
- Brooklin Configuration
- Test Driving Brooklin