Add the InfluxDB Timestream connector #201

trevorbonas · 2024-11-01T20:22:35Z

Issue #, if available:

N/A.

Description of changes:

The InfluxDB Timestream connector has been added.

The InfluxDB Timestream connector is a Rust application that receives, translates, and ingests line protocol data into Timestream for LiveAnalytics. The connector is intended to be deployed as part of a CloudFormation stack as a Lambda function, but can also be used as a library or run locally as a Lambda function.

The connector includes the following components:

A README with documentation for how the connector translates line protocol data to Timestream records, a guide for using the connector, and documentation for different configuration options.
A SAM template that deploys a REST API Gateway, using either synchronous or asynchronous deployment, and the connector as a Lambda function.

A few things to note:

The connector splits up batches and ingests records in chunks of 100 at a time in parallel. This means that if a record in one of these chunks has an issue, an error will occur and ingestion will stop but other records in the request batch, records that were processed before or in parallel with the chunk with the erroneous record, will be successfully ingested. When asynchronous invocation is used, the entire request will be added to the Lambda's dead letter queue. Rejected records and AWS SDK for Rust errors are logged. This will help users narrow down the problem among the line protocol points in their batch.

The connector uses the InfluxDB v3 parser, since InfluxDB v3 is written in Rust and its parser is available as a crate. This parser has a few inconsistencies in behavior compared to the InfluxDB v2 parser.

93 tests have been created for the connector, 40 integration tests and 53 unit tests. The following is a list of all tests with a short summary for each:

test_mtmm_basic integration test: Tests ingesting a single line protocol point.
test_mtmm_create_database integration test: Tests ingesting a single line protocol point and creating a database.
test_mtmm_unusual_query_parameters integration test: Tests ingesting a single line protocol point with a query parameters keys with unusual spelling.
test_mtmm_no_query_parameters integration test: Tests ingesting a single line protocol point without query parameters.
test_mtmm_multiple_timestamps integration test: Tests ingesting a single line protocol point with two timestamps. This test expects failure.
test_mtmm_many_tags_many_fields integration test: Test ingesting a single line protocol point with 50 tags and 50 fields.
test_mtmm_float integration test: Tests ingesting a single line protocol point with a float value for the field.
test_mtmm_string integration test: Tests ingesting a single line protocol point with a string value for the field.
test_mtmm_bool integration test: Tests ingesting a single line protocol point with a bool value for the field.
test_mtmm_max_tag_length integration test: Tests ingesting a single line protocol point with a tag where the tag's key has length 60, the maximum allowed dimension name length, and its value is 1988 characters long. The length of the tag key and tag value together amount to the maximum size for a dimension pair, 2 kilobytes.
test_mtmm_beyond_max_tag_length integration test: Tests ingesting a single line protocol point with a tag where the tag's key has length 60, the maximum allowed dimension name length, and its value is 1989 characters long. The length of the tag key and tag value together exceed the maximum size for a dimension pair, 2 kilobytes. This test expects failure.
test_mtmm_max_field_length integration test: Tests ingesting a single line protocol point with a field where the length of the field key is the maximum measure name length, 256 chars, and the length of the field value is the maximum measure value size, 2048.
test_mtmm_beyond_max_field_length integration test: Tests ingesting a single line protocol point with a field where the length of the field key is the maximum measure name length, 256, and the length of the field value is beyond the maximum measure value size, 2048. This test expects failure.
test_mtmm_max_unique_field_keys integration test: Tests ingesting a batch of line protocol points where the number of unique field keys in the batch equals the maximum number of allowed unique measures for a single table, 1024.
test_mtmm_beyond_max_unique_field_keys integration test: Tests ingesting a batch of line protocol points where the number of unique field keys in the batch exceeds the maximum number of allowed unique measures for a single table, 1024. This test expects failure.
test_mtmm_max_unique_tag_keys integration test: Tests ingesting a batch of line protocol points where the number of unique tag keys in the batch equals the maximum number of allowed unique dimensions for a single table, 128.
test_mtmm_beyond_max_unique_tag_keys integration test: Tests ingesting a batch of line protocol points where the number of unique tag keys in the batch exceeds the maximum number of unique dimensions for a single table, 128. This test expects failure.
test_mtmm_max_table_name_length integration test: Tests ingesting a single line protocol point with a measurement with length equal to the maximum number of bytes a table name can have.
test_mtmm_beyond_max_table_name_length integration test: Tests ingesting a single line protocol point with a measurement with length exceeding the maximum number of bytes a table name can have. This test expects failure.
test_mtmm_nanosecond_precision integration test: Tests ingesting a single line protocol point with nanosecond precision.
test_mtmm_microsecond_precision integration test: Tests ingesting a single line protocol point with microsecond precision.
test_mtmm_second_precision integration test: Tests ingesting a single line protocol point with second precision.
test_mtmm_no_precision integration test: Tests ingesting a single line protocol point without precision specified.
test_mtmm_empty_point: Tests ingesting an empty string.
test_mtmm_small_timestamp integration test: Tests ingesting a single line protocol point with a single-digit millisecond timestamp.
test_mtmm_5_measurements integration test: Tests ingesting a batch of line protocol points with five measurements.
test_mtmm_100_measurements integration test: Tests ingesting a batch of line protocol points with one-hundred measurements.
test_mtmm_5000_batch integration test: Tests ingesting a batch of 5000 line protocol points with a single measurement.
test_mtmm_no_credentials integration test: Tests ingesting without AWS credentials. This test expects a panic.
test_mtmm_incorrect_credentials integration test: Tests ingesting with incorrect AWS credentials. This tests expects a panic.
test_mtmm_custom_dimension_partition_key_optional_enforcement integration test: Tests ingesting a single line protocol point and specifying a valid configuration for a custom dimension partition key with optional enforcement.
test_mtmm_custom_dimension_partition_key_required_enforcement_accepted integration test: Tests ingesting a single line protocol point and specifying a valid configuration for a custom dimension partition key with required enforcement and a successful ingestion.
test_mtmm_custom_dimension_partition_key_required_enforcement_rejected integration test: Tests ingesting a single line protocol point and specifying a valid configuration for a custom dimension partition key with required enforcement and an unsuccessful ingestion. This test expects failure.
test_mtmm_custom_dimension_partition_key_no_dimension integration test: Tests ingesting a single line protocol point and specifying a configuration for a custom dimension partition key without a dimension specified. This test expects failure.
test_mtmm_custom_dimension_partition_key_no_enforcement integration test: Tests ingesting a single line protocol point and specifying a configuration for a custom dimension partition key without an enforcement configuration specified. This test expects failure.
test_mtmm_custom_measure_partition_key integration test: Tests ingesting a single line protocol point and specifying a valid configuration for a custom measure partition key.
test_mtmm_custom_measure_partition_key_with_dimension integration test: Tests ingesting a single line protocol point and specifying a valid configuration for a custom measure partition key with a dimension specified. The dimension should be ignored.
test_mtmm_custom_measure_partition_key_with_enforcement integration test: Tests ingesting a single line protocol point and specifying a valid configuration for a custom measure partition key with an enforcement configuration specified. The enforcement configuration should be ignored.
test_stmm_basic integration test: Tests ingesting a single line protocol point using the single-table multi measure schema.
test_stmm_varying_metrics integration test: Tests ingesting a batch with 100 different measurements using single-table multi measure schema.
test_get_precision_query_string_parameters_array unit test: Tests getting the precision from an incoming request where queryStringParameters is an array, with the precision contained within the array.
test_get_precision_query_string_parameters_object unit test: Tests getting the precision from an incoming request where queryStringParameters is a JSON object, with the precision contained within the object with the key "precision".
test_get_precision_query_string_parameters_object_nanoseconds unit test: Tests getting the precision from an incoming request where queryStringParameters is a JSON object and the precision is nanoseconds.
test_get_precision_query_string_parameters_object_microseconds unit test: Tests getting the precision from an incoming request where queryStringParameters is a JSON object and the precision is microseconds.
test_get_precision_query_string_parameters_object_seconds unit test: Tests getting the precision from an incoming request where queryStringParameters is a JSON object and the precision is seconds.
test_get_precision_query_parameters_array unit test: Tests getting the precision from an incoming request where queryStringParameters is instead named queryParameters and is a JSON object and the precision is provided as an array.
test_get_precision_query_parameters_object unit test: Tests getting the precision from an incoming request where queryStringParameters is instead named queryParameters and is a JSON object and the precision is provided as a JSON object.
test_get_precision_incorrect_query_parameters_key unit test: Tests getting the precision from an incoming request where the queryStringParameters key is neither queryStringParameters nor queryParameters. This test expects failure.
test_get_precision_incorrect_precision_key unit test: Tests getting the precision from an incoming request where the precision key has been named incorrectly.
test_parse_field_integer unit test: Tests parsing a line protocol point with an integer field value.
test_parse_field_float unit test: Tests parsing a line protocol point with a float field value.
test_parse_field_string_double_quote unit test: Tests parsing a line protocol point with a string field value using double quotes.
test_parse_field_string_single_quote unit test: Tests parsing a line protocol point with a string field value using single quotes.
test_parse_field_boolean unit test: Tests parsing a line protocol point with a boolean field value.
test_parse_field_boolean_invalid: Tests parsing a line protocol point with an invalid boolean field value. This test expects failure.
test_parse_measurement_unescaped_equals unit test: Tests parsing a line protocol point where the measurement name includes an unescaped equals sign.
test_parse_measurement_underscore_begin unit test: Tests parsing a line protocol point where the measurement name begins with an underscore.
test_parse_no_fields unit test: Tests parsing a line protocol point without any fields.
test_parse_multiple_fields unit test: Tests parsing a line protocol point with multiple fields.
test_parse_multiple_measurements unit test: Test parsing a line protocol point with multiple measurement names. This test expects failure. This test is currently skipped due to an inconsistency between the InfluxDB v2 and v3 line protocol parsers. An issue relating to this inconsistency has been opened here.
test_parse_no_timestamp unit test: Tests parsing a line protocol point without a timestamp. This test expects failure.
test_parse_non_unix_timestamp unit test: Tests parsing a line protocol point with a non-unix timestamp. This test expects failure.
test_parse_timestamp_with_quotes unit test: Tests parsing a line protocol point where the timestamp is in double quotations. This test expects failure.
test_parse_no_whitespace unit test: Tests parsing a line protocol point with no whitespace between components. This test expects failure.
test_parse_multiple_timestamps unit test: Tests parsing a line protocol point with multiple timestamps. This test expects failure.
test_parse_batch unit test: Tests parsing multiple line protocol points with integer field values.
test_parse_emojis unit test: Tests parsing a line protocol point with emojis.
test_parse_escaped_comma unit test: Tests parsing a line protocol point with escaped commas included in different parts of the point.
test_parse_escaped_equals unit test: Tests parsing a line protocol point with escaped equals signs in different parts of the point.
test_parse_unescaped_equals_measurement unit test: Tests parsing a line protocol point with an unescaped equals sign in the measurement name.
test_parse_escaped_equals_measurement unit test: Tests parsing a single line protocol point with an unescaped equals sign in the measurement name.
test_parse_escaped_space unit test: Tests parsing a line protocol point with escaped spaces in different parts of the point.
test_parse_measurement_escaped_space unit test: Tests parsing a line protocol point with an escaped space in the measurement name.
test_parse_escaped_double_quote_field_value unit test: Tests parsing a line protocol point with escaped quotes in the field value.
test_parse_non_escaped_double_quote_field_value unit test: Tests parsing a line protocol point with unescaped quotes in the field value. This test expects failure.
test_parse_escaped_backslash_field_value unit test: Tests parsing a line protocol point with escaped backslashes in the field value.
test_parse_escaped_backslash_not_field_value unit test: Tests parsing a line protocol point with escaped backslashes in different parts of the point, excluding the field value.
test_parse_single_point_single_comment unit test: Tests parsing a line protocol point with one comment included.
test_parse_single_point_multiple_comments unit test: Tests parsing a line protocol point with two comments included.
test_parse_multiple_points_single_comment unit test: Tests parsing two line protocol points with one comment included.
test_parse_multiple_points_multiple_comments unit test: Tests parsing two line protocol points with two comments included.
test_parse_nanoseconds_timestamp unit test: Tests parsing a line protocol point with a nanosecond timestamp.
test_parse_microseconds_timestamp unit test: Tests parsing a line protocol point with a microsecond timestamp.
test_parse_milliseconds_timestamp unit test: Tests parsing a line protocol point with a millisecond timestamp.
test_parse_seconds_timestamp unit test: Tests parsing a line protocol point with a second timestamp.
test_parse_empty unit test: Tests parsing empty line protocol.
test_mtmm_single_record unit test: Tests building a single record with multi-table multi measure schema.
test_mtmm_single_destination unit test: Tests building records where all records go to the same table with multi-table multi measure schema.
test_mtmm_multi_record unit test: Tests building records where records to go different tables with multi-table multi measure schema.
test_mtmm_empty_dimensions unit test: Tests building records with empty dimensions for multi-table multi measure schema.
test_mtmm_varying_timestamp_records unit test: Tests building records where records have varying timestamps for multi-table multi measure schema.
test_stmm_single_record unit test: Tests building a single record with one measure for single-table multi measure schema.
test_stmm_multi_record unit test: Tests building records with different measurement names destined for the same table with single-table multi measure schema.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Signed-off-by: forestmvey <[email protected]>

*Issue #, if available:* N/A. *Description of changes:* - A pre-commit hook has been added that uses `aws-secrets` in order to prevent secrets from being committed. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

*Issue #, if available:* N/A *Description of changes:* - SAM template `template.yml` added. - Asynchronous and synchronous invocation supported. - Documentation for how the connector can be deployed using the SAM template added. - `lambda_runtime` crate added. - `LambdaEvent<serde_json::Value>` used instead of `lambda_http::Request`, in order to support more types of requests, instead of just AWS service integrations. - Tests added for handling different kinds of `queryParameters` keys in requests. - DLQ implemented for asynchronous invocation. - Integration tests changed to check the newly returned `serde_json::Value` struct. - [x] Unit tests passed. - [x] Integration tests passed. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- Users can now define partition keys for their newly-created tables. Partition key configuration is controlled using three environment variables, `custom_partition_key_type`, `custom_partition_key_dimension`, and `enforce_custom_partition_key`. - Environment variables added for CDPK support. - SAM template parameters added for CDPK support. - Documentation added for CDPK support. - Integration tests added for CDPK support. - Integration test asserts moved to end of tests, in order to ensure resources are cleaned up. - [x] Unit tests passed. - [x] Integration tests passed.

*Issue #, if available:* N/A. *Description of changes:* - Deployment permissions have been updated according to the required permissions as discovered by testing deploying the connector using its SAM template. - `samconfig.toml` added with default stack deployment options. - "Troubleshooting" section added to the README with two known errors users may encounter. - Issue with cross-platform Rust compilation. - ConflictExceptions occurring with concurrent instances of the connector. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- The environment variable `local_invocation` has been added and is set to `true` by default when the Lambda function is run with `cargo lambda watch`. This environment enables responses to be returned in a format `cargo lambda` expects, in the 2.0 formatting. Otherwise, the 1.0 format will be returned, which the synchronous API Gateway expects. - `cargo fmt` run. - [x] Tested locally. - [x] Tested with synchronous invocation. - [x] Tested with asynchronous invocation.

*Issue #, if available:* N/A. *Description of changes:* - Ingestion to multiple tables done in parallel. - Ingestion of 100 records to a single table done in parallel. - Chunking of records into batches of 100 done in parallel. - Limit on maximum possible number of threads added. - Print statement for each 100 records removed. - Logging option added to SAM template. - All `println!` calls changed to `info!`. - Default logging level for the connector changed to `INFO`. - Trace statements that measure the execution time of each function added. - Instructions added to README for how to configure logging levels. - Database and tables are no longer checked if their corresponding environment variables for creation are not set. - The checking of whether tables exist has been moved within the asynchronous code block used to ingest records to a table. This checking is now done in parallel and the hashmap is looped through once, instead of twice. - Instructions added to README for how to reduce stack costs. - [x] Integration tests passed. - [x] Unit tests passed. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

* Initial implementation for adding single-table mapping. Signed-off-by: forestmvey <[email protected]> * Adding environment variables and updating documentation. Signed-off-by: forestmvey <[email protected]> * Adding tests and revising single-table mapping. Signed-off-by: forestmvey <[email protected]> * Adding single table example to README. Signed-off-by: forestmvey <[email protected]> * Fixing measure-name using line protocol metric name for single table mapping. Signed-off-by: forestmvey <[email protected]> * Fix typo in README. Signed-off-by: forestmvey <[email protected]> * Fixing documentation typos and test setting wrong environment variable. Signed-off-by: forestmvey <[email protected]> * Fixing table for single table multi measure records example in README. Signed-off-by: forestmvey <[email protected]> * Fix invalid line protocol examples with additional comma separating the timestamp. Signed-off-by: forestmvey <[email protected]> --------- Signed-off-by: forestmvey <[email protected]>

* Set default table mapping to multi-table for the InfluxDB Timestream Connector. Signed-off-by: forestmvey <[email protected]> * Fixing table formatting in README for InfluxDB Timestream Connector. Signed-off-by: forestmvey <[email protected]> * Updating README to reference multi-table for the default table mapping. Signed-off-by: forestmvey <[email protected]> --------- Signed-off-by: forestmvey <[email protected]>

forestmvey and others added 9 commits September 25, 2024 10:50

Initial commit for the InfluxDB Timestream Connector.

cfce8f0

Signed-off-by: forestmvey <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the InfluxDB Timestream connector #201

Add the InfluxDB Timestream connector #201

trevorbonas commented Nov 1, 2024 •

edited

Loading

Add the InfluxDB Timestream connector #201

Are you sure you want to change the base?

Add the InfluxDB Timestream connector #201

Conversation

trevorbonas commented Nov 1, 2024 • edited Loading

trevorbonas commented Nov 1, 2024 •

edited

Loading