Skip to content

Commit

Permalink
doc: add documentation for firehose local setup (#181)
Browse files Browse the repository at this point in the history
* Add documentation for firehose local setup

* Update development.md docs

* Review comment changes
  • Loading branch information
eyeofvinay authored Jul 19, 2022
1 parent af0a8b7 commit c3a3f91
Show file tree
Hide file tree
Showing 3 changed files with 67 additions and 17 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,4 +108,4 @@ This project exists thanks to all the [contributors](https://github.com/odpf/fir

## License

Firehose is [Apache 2.0](LICENSE) licensed.
Firehose is [Apache 2.0](LICENSE) licensed.
52 changes: 50 additions & 2 deletions docs/docs/contribute/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ Firehose sends critical metrics via StatsD client. Refer the[ Monitoring](../con

## Running locally

- The following guides provide a simple way to run firehose with a log sink locally.
- It uses the TestMessage (src/test/proto/TestMessage.proto) proto schema, which has already been provided for testing purposes.

```bash
# Clone the repo
$ git clone https://github.com/odpf/firehose.git
Expand All @@ -60,11 +63,56 @@ $ ./gradlew clean build

# Configure env variables
$ cat env/local.properties
```
### Configure env/local.properties

# Run the Firehose
$ ./gradlew runConsumer
Set the generic variables in the local.properties file.

```text
KAFKA_RECORD_PARSER_MODE = message
SINK_TYPE = log
INPUT_SCHEMA_PROTO_CLASS = io.odpf.firehose.consumer.TestMessage
```
Set the variables which specify the kafka server, topic name, and group-id of the kafka consumer - the standard values are used here.
```text
SOURCE_KAFKA_BROKERS = localhost:9092
SOURCE_KAFKA_TOPIC = test-topic
SOURCE_KAFKA_CONSUMER_GROUP_ID = sample-group-id
```

### Stencil Workaround
Firehose uses [Stencil](https://github.com/odpf/stencil) as the schema-registry which enables dynamic proto schemas. For the sake of this
quick-setup guide, we can work our way around Stencil setup by setting up a simple local HTTP server which can provide the static descriptor for TestMessage schema.


- Install a server service - like [this](https://github.com/http-party/http-server) one.

- Generate the descriptor for TestMessage by running the command on terminal -
```shell
./gradlew generateTestProto
```
- The above should generate a file (src/test/resources/__files/descriptors.bin), move this to a new folder at a separate location, and start the HTTP-server there so that this file can be fetched at the runtime.
- If you are using [this](https://github.com/http-party/http-server), use this command after moving the file to start server at the default port number 8080.
```shell
http-server
```
- Because we are not using the schema-registry in the default mode, the following lines should also be added in env/local.properties to specify the new location to fetch descriptor from.
```text
SCHEMA_REGISTRY_STENCIL_ENABLE = true
SCHEMA_REGISTRY_STENCIL_URLS = http://localhost:8080/descriptors.bin
SCHEMA_REGISTRY_STENCIL_CACHE_AUTO_REFRESH = false
SCHEMA_REGISTRY_STENCIL_REFRESH_STRATEGY = LONG_POLLING
```

### Run Firehose Log Sink

- Make sure that your kafka server and local HTTP server containing the descriptor is up and running.
- Run the firehose consumer through the gradlew task:
```shell
./gradlew runConsumer
```


**Note:** Sample configuration for other sinks along with some advanced configurations can be found [here](../advance/generic/)

### Running tests
Expand Down
30 changes: 16 additions & 14 deletions env/local.properties
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
#
## Generic
#
# KAFKA_RECORD_PARSER_MODE=message
# SINK_TYPE=log
# INPUT_SCHEMA_PROTO_CLASS=com.tests.TestMessage
KAFKA_RECORD_PARSER_MODE=message
SINK_TYPE=log
INPUT_SCHEMA_PROTO_CLASS=io.odpf.firehose.consumer.TestMessage
# INPUT_SCHEMA_PROTO_TO_COLUMN_MAPPING={"1":"order_number","2":"event_timestamp","3":"driver_id"}
# METRIC_STATSD_HOST=localhost
# METRIC_STATSD_PORT=8125
Expand All @@ -25,28 +25,29 @@
#
## Stencil Client
#
# SCHEMA_REGISTRY_STENCIL_ENABLE=true
# SCHEMA_REGISTRY_STENCIL_URLS=http://localhost:8000/v1/namespaces/quickstart/descriptors/example/versions/latest
SCHEMA_REGISTRY_STENCIL_ENABLE=true
SCHEMA_REGISTRY_STENCIL_URLS=http://localhost:8081/descriptors.bin
# SCHEMA_REGISTRY_STENCIL_FETCH_TIMEOUT_MS=10000
# SCHEMA_REGISTRY_STENCIL_FETCH_RETRIES=3
# SCHEMA_REGISTRY_STENCIL_FETCH_BACKOFF_MIN_MS=60000
# SCHEMA_REGISTRY_STENCIL_FETCH_AUTH_BEARER_TOKEN=tcDpw34J8d1
# SCHEMA_REGISTRY_STENCIL_CACHE_AUTO_REFRESH=false
SCHEMA_REGISTRY_STENCIL_CACHE_AUTO_REFRESH=false
SCHEMA_REGISTRY_STENCIL_REFRESH_STRATEGY=LONG_POLLING
# SCHEMA_REGISTRY_STENCIL_CACHE_TTL_MS=900000
#
#
#
#############################################
#
## Kafka Consumer
#
# SOURCE_KAFKA_BROKERS=localhost:9092
# SOURCE_KAFKA_TOPIC=test-topic
SOURCE_KAFKA_BROKERS=localhost:9092
SOURCE_KAFKA_TOPIC=test-topic
# SOURCE_KAFKA_CONSUMER_CONFIG_MAX_POLL_RECORDS=500
# SOURCE_KAFKA_ASYNC_COMMIT_ENABLE=true
# SOURCE_KAFKA_CONSUMER_CONFIG_SESSION_TIMEOUT_MS=10000
# SOURCE_KAFKA_COMMIT_ONLY_CURRENT_PARTITIONS_ENABLE=true
# SOURCE_KAFKA_CONSUMER_CONFIG_AUTO_COMMIT_ENABLE=true
# SOURCE_KAFKA_CONSUMER_GROUP_ID=sample-group-id
SOURCE_KAFKA_CONSUMER_GROUP_ID=sample-group-id
# SOURCE_KAFKA_POLL_TIMEOUT_MS=9223372036854775807
# SOURCE_KAFKA_CONSUMER_CONFIG_METADATA_MAX_AGE_MS=500
#
Expand Down Expand Up @@ -191,11 +192,12 @@
#
## Redis Sink
#
# SINK_REDIS_URLS=localhos:6379,localhost:6380
# SINK_REDIS_DATA_TYPE=List
# SINK_REDIS_URLS=localhost:6379
# SINK_REDIS_DATA_TYPE=KEYVALUE
# SINK_REDIS_KEY_TEMPLATE=Service\_%%s,1
# INPUT_SCHEMA_PROTO_TO_COLUMN_MAPPING={"6":"customer_id", "2":"order_num"}
# SINK_REDIS_LIST_DATA_PROTO_INDEX=6
# INPUT_SCHEMA_PROTO_TO_COLUMN_MAPPING={"1":"orderID", "2":"orderURL"}
# SINK_REDIS_LIST_DATA_PROTO_INDEX=2
# SINK_REDIS_KEY_VALUE_DATA_PROTO_INDEX=2
# INK_REDIS_TTL_TYPE=DISABLE
# SINK_REDIS_TTL_VALUE=0
# SINK_REDIS_DEPLOYMENT_TYPE=Standalone

0 comments on commit c3a3f91

Please sign in to comment.