-
Notifications
You must be signed in to change notification settings - Fork 137
Streaming Text Files to Kafka
In this use case, we create Brooklin datastreams to publish text file contents to a locally deployed instance of Kafka.
- Source: File System
- Destination: Kafka
- Connector:
FileConnector
- Transport Provider:
KafkaTransportProvider
Brooklin requires Java Development Kit 8+. Here are some options:
- Download the latest Kafka tarball and untar it.
tar -xzf kafka_2.12-2.2.0.tgz cd kafka_2.12-2.2.0
- Start a ZooKeeper server
bin/zookeeper-server-start.sh config/zookeeper.properties >/dev/null &
- Start a Kafka server
bin/kafka-server-start.sh config/server.properties >/dev/null &
- Download the latest tarball (tgz) from Brooklin releases.
- Untar the Brooklin tarball
tar -xzf brooklin-1.0.0.tgz cd brooklin-1.0.0
- Run Brooklin
bin/brooklin-server-start.sh config/server.properties >/dev/null 2>&1 &
-
Create a datastream to stream the contents of any file of your choice to Kafka.
# Replace NOTICE below with a file path of your choice or leave it as # is if you would like to use the NOTICE file as an example text file bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n first-file-datastream -s NOTICE -c file -p 1 -t kafkaTransportProvider -m '{"owner":"test-user"}'
Here are the options we used to create this datastream:
-o CREATE The operation is datastream creation -u http://localhost:32311/ Datstream Management Service URI -n first-file-datastream Datastream name -s NOTICE Datastream source URI (source file path in this case) -c file Connector name ("file" refers to FileConnector) -p 1 Number of source partitions -t kafkaTransportProvider Transport provider name -m '{"owner":"test-user"}' Datastream metadata (specifying datastream owner is mandatory)
-
Verify the datastream creation by requesting all datastream metadata from Brooklin using the command line REST client.
bin/brooklin-rest-client.sh -o READALL -u http://localhost:32311/
-
You can also view the streaming progress by querying the diagnostics REST endpoint of the Datastream Management Service.
curl -s "http://localhost:32311/diag?scope=file&type=connector&q=status&content=position?"
-
Additionally, you can view some more information about the different
Datastreams
andDatastreamTasks
by querying the health monitoring REST endpoint of the Datastream Management Service.curl -s "http://localhost:32311/health"
-
Verify a Kafka topic has been created to hold the data of your newly created datastream. The topic name will have the datastream name (i.e.
first-file-datastream
) as a prefix.cd kafka_2.12-2.2.0 bin/kafka-topics.sh --list --bootstrap-server localhost:9092
-
Print the Kafka topic contents
# Replace <topic-name> below with name of Kafka topic bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic <topic-name> --from-beginning
Feel free to create more datastreams to publish more files to Kafka.
-
If you wish to delete the datastream you created, you can do so by running:
bin/brooklin-rest-client.sh -o DELETE -u http://localhost:32311/ -n first-file-datastream
-
You can also explore the various operations you can perform on datastreams using the REST client utility.
bin/brooklin-rest-client.sh --help
When you are done, run the following commands to stop all running apps.
cd brooklin-1.0.0
bin/brooklin-server-stop.sh
cd kafka_2.12-2.2.0
bin/kafka-server-stop.sh
bin/zookeeper-server-stop.sh
- Home
- Brooklin Architecture
- Production Use Cases
- Developer Guide
- Documentation
- REST Endpoints
- Connectors
- Transport Providers
- Brooklin Configuration
- Test Driving Brooklin