Skip to content

Commit

Permalink
[HUDI-6191][DOCS] Improve passing the debezium checkpoint values to s…
Browse files Browse the repository at this point in the history
…tart job from offset (#11690)

Co-authored-by: Vova Kolmakov <[email protected]>
  • Loading branch information
wombatu-kun and Vova Kolmakov authored Jul 27, 2024
1 parent 7a03fa2 commit 0803284
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
4 changes: 2 additions & 2 deletions website/docs/configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ permalink: /docs/configurations.html
summary: This page covers the different ways of configuring your job to write/read Hudi tables. At a high level, you can control behaviour at few levels.
toc_min_heading_level: 2
toc_max_heading_level: 4
last_modified_at: 2024-07-01T15:09:57.588
last_modified_at: 2024-07-26T12:54:21.684
---


Expand Down Expand Up @@ -1996,7 +1996,7 @@ Configurations controlling the behavior of Kafka source in Hudi Streamer.
| [hoodie.streamer.source.kafka.value.deserializer.schema](#hoodiestreamersourcekafkavaluedeserializerschema) | (N/A) | Schema to deserialize the records.<br />`Config Param: KAFKA_VALUE_DESERIALIZER_SCHEMA` |
| [auto.offset.reset](#autooffsetreset) | LATEST | Kafka consumer strategy for reading data.<br />`Config Param: KAFKA_AUTO_OFFSET_RESET` |
| [hoodie.streamer.kafka.source.maxEvents](#hoodiestreamerkafkasourcemaxEvents) | 5000000 | Maximum number of records obtained in each batch.<br />`Config Param: MAX_EVENTS_FROM_KAFKA_SOURCE` |
| [hoodie.streamer.source.kafka.checkpoint.type](#hoodiestreamersourcekafkacheckpointtype) | string | Kafka checkpoint type.<br />`Config Param: KAFKA_CHECKPOINT_TYPE` |
| [hoodie.streamer.source.kafka.checkpoint.type](#hoodiestreamersourcekafkacheckpointtype) | string | Kafka checkpoint type. Value must be one of the following: string, timestamp, single_offset. Default type is string. For type string, checkpoint should be provided as: topicName,0:offset0,1:offset1,2:offset2. For type timestamp, checkpoint should be provided as long value of desired timestamp. For type single_offset, we assume that topic consists of a single partition, so checkpoint should be provided as long value of desired offset.<br />`Config Param: KAFKA_CHECKPOINT_TYPE` |
| [hoodie.streamer.source.kafka.enable.commit.offset](#hoodiestreamersourcekafkaenablecommitoffset) | false | Automatically submits offset to kafka.<br />`Config Param: ENABLE_KAFKA_COMMIT_OFFSET` |
| [hoodie.streamer.source.kafka.enable.failOnDataLoss](#hoodiestreamersourcekafkaenablefailOnDataLoss) | false | Fail when checkpoint goes out of bounds instead of seeking to earliest offsets.<br />`Config Param: ENABLE_FAIL_ON_DATA_LOSS` |
| [hoodie.streamer.source.kafka.fetch_partition.time.out](#hoodiestreamersourcekafkafetch_partitiontimeout) | 300000 | Time out for fetching partitions. 5min by default<br />`Config Param: KAFKA_FETCH_PARTITION_TIME_OUT` |
Expand Down
2 changes: 1 addition & 1 deletion website/docs/hoodie_streaming_ingestion.md
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ Checkpoints are saved in the .hoodie commit file as `streamer.checkpoint.key`.
If you need to change the checkpoints for reprocessing or replaying data you can use the following options:
- `--checkpoint` will set `streamer.checkpoint.reset_key` in the commit file to overwrite the current checkpoint.
- `--checkpoint` will set `streamer.checkpoint.reset_key` in the commit file to overwrite the current checkpoint. Format of checkpoint depends on [KAFKA_CHECKPOINT_TYPE](/docs/configurations#hoodiestreamersourcekafkacheckpointtype). By default (for type `string`), checkpoint should be provided as: `topicName,0:offset0,1:offset1,2:offset2`. For type `timestamp`, checkpoint should be provided as long value of desired timestamp. For type `single_offset`, we assume that topic consists of a single partition, so checkpoint should be provided as long value of desired offset.
- `--source-limit` will set a maximum amount of data to read from the source. For DFS sources, this is max # of bytes read.
For Kafka, this is the max # of events to read.
Expand Down

0 comments on commit 0803284

Please sign in to comment.