Connector Configuration

Overview

To configure the Kinesis Connector, create a catalog properties file etc/catalog/kinesis.properties with the following contents at a minimum, replacing the properties as appropriate. Note that most properties do have reasonable defaults but the ones below will probably be changed more often.

connector.name=kinesis
kinesis.default-schema=default
kinesis.access-key=<amazon-access-key> (optional)
kinesis.secret-key=<amazon-secret-key> (optional)
kinesis.table-description-dir=etc/kinesis/  (if reading table definitions locally)
kinesis.table-descriptions-s3=s3://your.bucket/your/folder/path (if reading table definitions from S3)

The access and secret keys are optional. If they are not provided, the default credentials chain will be used to access the kinesis streams (ex: instance provider credentials).

The following configuration properties are available :

Property Name	Description
`kinesis.default-schema`	Default schema name for tables
`kinesis.table-description-dir`	Directory containing table description files
`kinesis.table-descriptions-s3`	Amazon S3 bucket URL with table description files. Leave blank to read from the directory on the server.
`kinesis.access-key`	Access key to aws account
`kinesis.secret-key`	Secret key to aws account
`kinesis.hide-internal-columns`	Controls whether internal columns are part of the table schema or not
`kinesis.aws-region`	Aws region to be used to read kinesis stream from
`kinesis.batch-size`	Maximum number of records to return in one batch. Maximum Limit 10000
`kinesis.fetch-attempts`	Read attempts made when no records returned and not caught up
`kinesis.max-batches`	Maximum batches to read from Kinesis in one single query
`kinesis.sleep-time`	Time for thread to sleep waiting to make next attempt to fetch batch
`kinesis.iter-from-timestamp`	Begin iterating from a given timestamp instead of the trim horizon (true by default)
`kinesis.iter-offset-seconds`	Number of seconds before current time to start iterating

kinesis.default-schema

Defines the schema which will contain all tables that were defined without a qualifying schema name.

This property is optional; the default is default.

kinesis.table-description-dir

References a folder within Presto deployment that holds one or more JSON files (must end with .json) which contain table description files.

This property is optional; the default is etc/kinesis.

kinesis.table-descriptions-s3

An S3 URL giving the location of the JSON table description files. When this is given, S3 will be used as the source of table description files and table-description-dir is ignored. The S3 bucket and folder will be checked every 10 minutes for updates and changed files.

This property is optional; the default is blank, which means table-description-dir will be the source of the table definitions.

kinesis.access-key

Defines the access key ID for AWS root account or IAM roles, which is used to sign programmatic requests to AWS Kinesis.

This property is optional; if not defined, connector will try to follow Defualt- Credential-Provider-Chain provided by aws in the following order -

Environment Variable: Load credentials from environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
Java System Variable: Load from java system as aws.accessKeyId and aws.secretKey
Profile Credentials File: Load from file typically located at ~/.aws/credentials
Instance profile credentials: These credentials can be used on EC2 instances, and are delivered through the Amazon EC2 metadata service.

kinesis.secret-key

Defines the secret key for AWS root account or IAM roles, which together with Access Key ID, is used to sign programmatic requests to AWS Kinesis.

This property is optional; if not defined, connector will try to follow Defualt- Credential-Provider-Chain same as above.

kinesis.aws-region

Defines AWS Kinesis regional endpoint. Selecting appropriate region may reduce latency in fetching data.

This field is optional; The default region is us-east-1 referring to end point 'kinesis.us-east-1.amazonaws.com'.

Amazon Kinesis Regions

For each Amazon Kinesis account, following availabe regions can be used:

Region name	Region	Endpoint
`us-east-1`	US East (N. Virginia)	kinesis.us-east-1.amazonaws.com
`us-west-1`	US West (N. California)	kinesis.us-west-1.amazonaws.com
`us-west-2`	US West (Oregon)	kinesis.us-west-2.amazonaws.com
`eu-west-1`	EU (Ireland)	kinesis.eu-west-1.amazonaws.com
`eu-central-1`	EU (Frankfurt)	kinesis.eu-central-1.amazonaws.com
`ap-southeast-1`	Asia Pacific (Singapore)	kinesis.ap-southeast-1.amazonaws.com
`ap-southeast-2`	Asia Pacific (Sydney)	kinesis.ap-southeast-2.amazonaws.com
`ap-northeast-1`	Asia Pacific (Tokyo)	kinesis.ap-northeast-1.amazonaws.com

kinesis.batch-size

Defines maximum number of records to return in one request to Kinesis Streams. Maximum Limit is 10000 records. If a value greater than 10000 is specified, will throw InvalidArgumentException.

This field is optional; the default value is 10000.

kinesis.fetch-attempts

Defines number of attempts made to read a batch from Kinesis Streams when no records are returned and the "millis behind latest" parameter shows we are not yet caught up. When records are returned no additional attempts are necessary.

It has been found that sometimes GetRecordResult returns empty records, when shard is not empty. That is why multiple attempts need to be made.

This field is optional; the default value is 2.

kinesis.max-batches

The maximum number of batches to read in a single query.

The default value is 1000.

kinesis.sleep-time

Defines the milliseconds for which thread needs to sleep between get-record-attempts made to fetch data. The quantity should be followed by 'ms' string.

This field is optional; the default value is 1000ms.

iter-from-timestamp

Use an initial shard iterator type of AT_TIMESTAMP starting iterOffsetSeconds before the current time. When this is false, an iterator type of TRIM_HORIZON will be used, meaning it will start from the oldest record in the stream.

The default is true.

iter-offset-seconds

When iterFromTimestamp is true, the shard iterator will start at ``iter-offset-seconds before the current time.

The default is 86400 seconds or 24 hours.

kinesis.hide-internal-columns

In addition to the data columns defined in a table description file, the connector maintains a number of additional columns for each table. If these columns are hidden, they can still be used in queries but do not show up in DESCRIBE <table-name> or SELECT *.

This property is optional; the default is true.

Internal Columns

For each defined table, the connector maintains the following columns:

Column name	Type	Description
`_shard_id`	VARCHAR	ID of the Kinesis stream shard which contains this row
`_shard_sequence_id`	VARCHAR	Sequence id within the Kinesis shard for this row
`_segment_start`	VARCHAR	Lowest sequence id in the segment (inclusive) which contains this row. This sequence id is shard specific
`_segment_end`	VARCHAR	Highest sequence id in the segment (exclusive) which contains this row. The sequence id is shard specific. If stream is open, then this is not defined
`_segment_count`	BIGINT	Running count of for the current row within the segment
`_message_valid`	BOOLEAN	True if the decoder could decode the message successfully for this row. When false, data columns mapped from the message should be treated as invalid
`_message`	VARCHAR	Message bytes as an UTF-8 encoded string
`_message_length`	BIGINT	Number of bytes in the message
`_partition_key`	VARCHAR	Partition Key bytes as an UTF-8 encoded string

For tables without a table definition file, the _message_valid column will always be true.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly