Table Definitions

Introduction

Kinesis streams stores data as stream of bytes and leaves it to producers and consumers to define how a message should be interpreted. For Presto, this data must be mapped into columns to allow queries against the data. The JSON table definition files take care of this mapping.

These files can be stored in one of two places:

A defined location on Amazon S3
Local filesystem on each node We prefer the first choice, since it's easier to keep individual Presto nodes up to date.

Table Definition

A table definition file consists of a JSON definition for a table, which corresponds to one stream in Kinesis. The name of the file can be arbitrary but must end in .json. The structure of the table definition is as follows:

{
    "tableName": ...,
    "schemaName": ...,
    "streamName": ...,
    "message": {
        "dataFormat": ...,
        "field": [
            ...
        ]
    }
}

Field	Required	Type	Description
`tableName`	required	string	Presto table name defined by this file
`schemaName`	optional	string	Schema which will contain the table. If omitted, the default schema name is used
`streamName`	required	string	Name of the Kinesis Stream that is mapped
`message`	optional	JSON object	Field definitions for data columns mapped to the message itself

Every message in a Kinesis stream can be decoded using the definition provided in the message object. The json object message in the table definition contains two fields:

Field	Required	Type	Description
`dataFormat`	required	string	Selects the decoder for this group of fields (json, csv, or raw)
`fields`	required	JSON array	A list of field definitions. Each field definition creates a new column in the Presto table and maps that column to a part of the Kinesis message

Each field definition is a JSON object. At a minimum, you'll want to provide a name, type, and a mapping. The overall structure looks like this:

{
    "name": ...,
    "type": ...,
    "dataFormat": ...,
    "mapping": ...,
    "formatHint": ...,
    "hidden": ...,
    "comment": ...
}

Field	Required	Type	Description
`name`	required	string	Name of the column in the Presto table
`type`	required	string	Presto type of the column
`dataFormat`	optional	string	Selects the column decoder for this field. Defaults to the default decoder for the message data format and column type
`mapping`	optional	string	Mapping information for the column. This is explained below
`formatHint`	optional	string	Sets a column specific format hint to the column decoder
`hidden`	optional	boolean	Hides the column from `DESCRIBE <table name>` and `SELECT *`. Defaults to `false`
`comment`	optional	string	Add a column comment which is shown with `DESCRIBE <table name>`

There is no limit on the number of field descriptions for message.

The name field is exposed to presto as the column name, while the mapping field is the portion of the message that gets mapped to that column. For JSON object messages, this refers to the field name of an object, and can be a path that drills into the object structure of the message. Additionally, you can map a field of the JSON object to a string column type, and if it is a more complex type (JSON array or JSON object) then the JSON itself will become the field value.

Example Mapping

Following is a complete example using a JSON message.

{
  "tableName": "test_table",
  "schemaName": "otherworld",
  "streamName": "test_kinesis_stream",
  "message": {
    "dataFormat": "json",
    "fields": [
      {
        "name": "client_id",
        "type": "BIGINT",
        "mapping": "body/profile/clientId",
        "comment": "The client ID field"
      },
      {
        "name": "acct_balance",
        "type": "DOUBLE",
        "mapping": "body/acct_balance",
        "comment": "Current account balance"
      },
      {
        "name": "service_type",
        "mapping": "body/profile/service_type",
        "type": "VARCHAR(20)"
      },
      {
        "name": "routing_time",
        "mapping": "header/routing_time",
        "type": "DATE",
        "dataFormat": "iso8601"
      }
    ]
  }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table Definitions

Introduction

Table Definition

Example Mapping

Clone this wiki locally