-
Notifications
You must be signed in to change notification settings - Fork 1
Checkpointing Queries
This feature isn't really recommended in a query system used by multiple users. Although it is possible to use this, it requires updates to a backing DynamoDB table and has not been thoroughly tested. There are better ways to do this. Rather than using checkpointing, we recommend using one of the other exposed session variables to control how records are read from the Kinesis stream. These can make a difference in the efficiency of your query.
Original explanation left below.
It is a common requirement to be able to query incremental data in Kinesis. Example, you may want to find the number of 500 errors in the last minute. You can schedule a query against the Kinesis stream to run every minute. However, you'd like the query to only process data that has arrived since the last time you checked. The checkpointing feature in Kinesis is built for this use-case and the Presto connector builds on this Kinesis feature.
The checkpointing feature uses AWS DynamoDB - your credentials must allow for creation and access of DynamoDB tables. You will need to enable the checkpointing feature by setting this parameter in ${PRESTO_HOME}etc/catalog/kinesis.properties
.
kinesis.checkpoint-enabled=true
This is also exposed as a session variable:
set session kinesis.checkpoint-enabled=true
You'll need to set the following presto session variables to use the checkpointing feature:
set session kinesis.checkpoint-logical-name="ServerErrorCounts";
set session kinesis.iteration-number=<iteration-number>;
The logical name is used to separate the checkpoints between different sessions.
For the first query in the sequence, you should set iteration-number
to 0
. For the second query, you should set iteration-number
to 1
(and so on). The second will read data that has entered the stream after iteration 0 was complete. This is best accomplished by a script / tool that updates the iteration number and submits the query one after another. You can run a query against an older checkpoint by setting the right iteration-number
.