Skip to content

lucienlu-aws/amazon-kinesis-producer

 
 

Repository files navigation

Kinesis Producer Library

Build Status

Introduction

The Amazon Kinesis Producer Library (KPL) performs many tasks common to creating efficient and reliable producers for Amazon Kinesis. By using the KPL, customers do not need to develop the same logic every time they create a new application for data ingestion.

For detailed information and installation instructions, see the article Developing Producer Applications for Amazon Kinesis Using the Amazon Kinesis Producer Library in the Amazon Kinesis Developer Guide.

Back-pressure

Please see this blog post for details about writing efficient and reliable producers using the KPL. This blogpost contains details about overhead in various situations in which you might be using the KPL including back-pressure considerations.

The KPL can consume enough memory to crash itself if it gets pushed too many records without time to process them. As a protection against this, we ask that every customer implement back-pressure to protect the KPL process. Once the KPL starts getting too many records in it's buffer it will spend most of it's CPU cycles on record management, rather than record processing making the problem worse. This is highly dependent on the customer record sizes, rates, configurations, host CPU and memory limits.

When deciding the limits of your KPL instance, please consider your MAX record size, MAX request rate spikes, host memory availability, and TTL. If you are buffering requests before going into the KPL, consider that as well since that still puts memory pressure on the host system. If the KPL buffer grows too large it may be forcibly crashed due to memory exhaustion.

Sample Back-pressure implementation:

ClickEvent event = inputQueue.take();
        String partitionKey = event.getSessionId();
        String payload =  event.getPayload();
        ByteBuffer data = ByteBuffer.wrap(payload.getBytes("UTF-8"));
        while (kpl.getOutstandingRecordsCount() > MAX_RECORDS_IN_FLIGHT) {
            Thread.sleep(SLEEP_BACKOFF_IN_MS);
        }
        recordsPut.getAndIncrement();

        ListenableFuture<UserRecordResult> f =
                kpl.addUserRecord(STREAM_NAME, partitionKey, data);
        Futures.addCallback(f, new FutureCallback<UserRecordResult>() {
          ...
          ...

Sample above is provided as an example implementation. Please take your application and use cases into consideration before applying logic

Recommended Upgrade for All Users of 0.15.0 - 0.15.6 Amazon Kinesis Producer

⚠️ It's highly recommended for users of version 0.15.0 - 0.15.6 of the Amazon Kinesis Producer to upgrade to version 0.15.7 . A bug has been identified in versions prior from 0.15.0 - 0.15.6 is causing memory leak issue.

ℹ️ Amazon Kinesis Producer versions prior to 0.15.0 are not impacted.

Recommended Settings for Streams larger than 800 shards

The KPL is an application for ingesting data to your Kinesis Data Streams. As your streams grow you may find the need to tune the KPL to enable it to accommodate the growing needs of your applications. Without optimized configurations your KPL processes will see inefficient CPU usage and delays in writing records into KDS. For streams larger than 800 shards, we recommend the following settings:

  • ThreadingModel= “POOLED”
  • MetricsGranularity= “stream”
  • ThreadPoolSize=128

We recommend performing sufficient testing before applying these changes to production, as every customer has different usage patterns

Required KPL Update – v0.15.0

KPL 0.15.0 now incorporates StreamARN in the Kinesis requests, such as PutRecords and ListShards, to take advantage of Kinesis Data Streams (KDS) enhanced availability as the result of service cellularization. Version 0.15.0 adds STS as the new dependency; by using STS, customers can benefit from StreamARN without modifying any code.

Required KPL Update – v0.14.0

KPL 0.14.0 now uses ListShards API, making it easier for your Kinesis Producer applications to scale. Kinesis Data Streams (KDS) enables you to scale your stream capacity without any changes to producers and consumers. After a scaling event, producer applications need to discover the new shard map. Version 0.14.0 replaces the DescribeStream with the ListShards API for shard discovery. ListShards API supports 100TPS per stream compared to DescribeStream that supports 10TPS per account. For an account with 10 streams using KPL v0.14.0 will provide you a 100X higher call rate for shard discovery, eliminating the need for a DescribeStream API limit increase for scaling. You can find more information on the ListShards API in the Kinesis Data Streams documentation.

Required Upgrade

Starting on February 9, 2018 Amazon Kinesis Data Streams will begin transitioning to certificates issued by Amazon Trust Services (ATS). To continue using the Kinesis Producer Library (KPL) you must upgrade the KPL to version 0.12.6 or later.

If you have further questions please open a GitHub Issue, or create a case with the AWS Support Center.

This is a restatement of the notice published in the Amazon Kinesis Data Streams Developer Guide

Release Notes

0.15.12

  • #593 Replace all usage of sys_siglist with strsignal as sys_siglist is deprecated
  • #594 Check if dimension.value is blank before using it in metrics manager
  • #596 Update getOldestRecordTimeMillis to avoid potential NullPointerException
  • #600 Fix build issue after cpp sdk upgrade
  • #597 Bump cpp sdk from 1.11.62 to 1.11.420 and java sdk from 1.12.772 to 1.12.773
  • #570 Bump commons-lang from 2.6 to 3.14.0
  • #598 Bump commons-io:commons-io from 2.13.0 to 2.17.0 in /java/amazon-kinesis-producer to address CVE vulnerability
  • #589 Bump com.google.protobuf:protobuf-java from 3.21.12 to 3.25.5 in /java/amazon-kinesis-producer to address CVE vulnerability
  • #579 Bump com.google.guava:guava from 31.1-jre to 33.3.0-jre in /java/amazon-kinesis-producer to address CVE vulnerability
  • #588 Bump com.amazonaws:aws-java-sdk-core from 1.12.382 to 1.12.772 in /java/amazon-kinesis-producer and java/amazon-kinesis-producer-sample

0.15.11

  • #576 Improve retry logic during stream scaling
  • #571 Upgrade ch.qos.logback:logback-classic from 1.3.0 to 1.3.12

0.15.10

  • #560 Reverting to remove a bug with using Stream ARN. Please stay tuned for a future release before using Stream ARN.
  • #526 Drop dependency on jaxb for converting binary arrays to hex
  • 1.1.18Update GSR dependency - To address CVE vulnerability

0.15.9

  • #552 Add StreamARN parameter to support CAA
    • StreamARN parameter can be now be used to benefit from Cross account access for KPL requests.

0.15.8

  • #537 Update to latest version of Glue Schema Registry library

0.15.7

  • #498 Fix some memory leak cases in legacy code
    • Upgrade SDK version to avoid s2n_cleanup related memory leak
    • Fix resource cleanup on KPL end to avoid memory leak

0.15.6

  • #490 Updating aws cpp sdk version

0.15.5

  • #482 Remove the stream arn parameter when the next token is present

0.15.4

0.15.3

  • #478 Update AWS SDK CPP version

0.15.2

  • #471 Upgrade Java dependencies

0.15.1

  • #469 Use AWS CodeBuild to compile C++ binary

0.15.0

  • #465
    • Revert the upgrade of jakarta.xml.bind to be backward-compatible with Java8
    • Add more logs to verify that IMDSV2 is used correctly for getting region info for KPL running in EC2 instances
  • #463
    • Use sts to construct stream arn
    • Exit KPL if STS call fails to avoid dual mode
    • Deprecate IMDSv1 calls for obtaining EC2 metadata
  • #444
    • Update bootstrap.sh to work on three platforms

0.14.13

  • #440
    • Upgrade the dependencies used in bootstrap + Java dependencies
    • Correct the log level discrepancy for the warnings

0.14.12

  • #425 Fix build issues in CI
  • #424 Fix build issues in CI
  • #423 Upgrade GSR version to 1.1.9
  • #420 Fix cpp branch
  • #419 Fix aws-cpp branch
  • #418 Fix travis build
  • #416 Configure dependabot
  • #415 Fix travis build
  • #414 Fix travis build

0.14.11

  • #409 Bump protobuf-java from 3.11.4 to 3.16.1 in /java/amazon-kinesis-producer
  • #408 Update curl version from 7.77 to 7.81
  • #395 Configure dependabot
  • #391 Fixing travis build issues
  • #388 Fixing build issues due to stale CA certs

0.14.10

  • #386 Upgraded Glue schema registry from 1.1.1 to 1.1.5
  • #384 Upgraded logback-classic from 1.2.0 to 1.2.6
  • #323 Upgraded junit from 4.12 to 4.13.1

0.14.9

  • #370 Upgraded build script dependencies
    • Upgraded version of openssl from 1.0.1m to 1.0.2u
    • Upgraded version of boost from 1.61 to 1.76
    • Upgraded version of zlib from 1.2.8 to 1.2.11
  • #377 Added an optimization to filter out closed shards.

0.14.8

  • PR #331 Fixed a typo in README.md
  • PR #363 Upgrading hibernate-validator to 6.0.20.Final
  • PR #365 Upgrading logback-classic to 1.2.0
  • PR #367 Upgrading Glue Schema Registry to 1.1.1

0.14.7

  • PR #350 Upgrading Guava to 29.0-jre
  • PR #352 Upgrading Commons IO to 2.7
  • PR #351 Adding support for proxy configurations
  • PR #356 Fixing build issues in Travis CI

0.14.6

  • [PR #341] Updating Java SDK version in KPL to 1.11.960.

0.14.5

  • [PR #339] Fixing KPL not emmiting Kinesis PutRecords call context metrics.

0.14.4

  • [PR #334] Add support for building multiple architectures, specifically arm64.
    • This now supports AWS Graviton based instances.
    • Bumped Boost slightly to a version that includes Arm support and added the architecture to the path for kinesis_producer.
  • [PR #335] Fixed logging for native layer allowing to enable debug/trace logs.

0.14.3

  • [PR #327] Adding support for timeout on user records at Java layer.
    • New optional KPL config parameter userRecordTimeoutInMillis which can be used to timeout records at the java layer queued for processing.
  • [PR #328] Changing CloudWatch client retry strategy to use default SDK retry strategy with exponential backoff.
  • [PR #324] Adding KPL metric to track the time for oldest user record in processing at the java layer.
  • [PR #318] Fixing bug where KPL goes into a continuous retry storm if the stream is deleted and re-created.

0.14.2

  • [PR #320] Adding support for Glue Schema Registry.
    • Serialize and send schemas along with records, support for compression and auto-registration of schemas.
  • [PR #316] Bumping junit from 4.12 to 4.13.1
  • [PR #312] Adding new parameter in KPL config to allow cert path to be overridden.
  • [PR #310] Fixing bug to make the executor service to use 4*num_cores threads.
  • [PR #307] Dependency Upgrade
    • Upgrade Guava to 26.0-jre
    • Update BOOST C++ Libraries link as cert expired on the older link

0.14.1

  • [PR #302] Dependency Upgrade
    • upgrade org.hibernate.validator:hibernate-validator 6.0.2.Final -> 6.0.18.Final
    • upgrade com.google.guava:guava 18.0 -> 24.1.1-jre
  • [PR #300] Fix Travis CI build issues
  • [PR #298] Upgrade google-protobuf to 3.11.4

0.14.0

  • Note: Windows platform will be unsupported going forward for this library.
  • [PR #280] When aggregation is enabled and all the buffer time is consumed for aggregating User records into Kinesis records, allow some additional buffer time for aggregating Kinesis Records into PutRecords calls.
  • [PR #260] Added endpoint for China Ningxia region (cn-northwest-1).
  • [PR #277] Changed mechanism to update the shard map
    • Switched to using ListShards instead of DescribeStream, as this is a more scalable API
    • Reduced the number of unnecessary shard map invalidations
    • Reduced the number of unnecessary update shard map calls
    • Reduced logging noise for aggregated records landing on an unexpected shard
  • [PR #276] Updated AWS SDK from 1.0.5 to 1.7.180
  • [PR #275] Improved the sample code to avoid need to edit code to run.
  • [PR #274] Updated bootstrap.sh to build all dependencies and pack binaries into the jar.
  • [PR #273] Added compile flags to enable compiling aws-sdk-cpp with Gcc7.
  • [PR #229] Fixed bootstrap.sh to download dependent libraries directly from source.
  • [PR #246] [PR #264] Various Typos

0.13.1

  • Including windows binary for Apache 2.0 release.

0.13.0

  • [PR #256] Update KPL to Apache 2.0

0.12.11

Java

  • Bump up the version to 0.12.11.

Older release notes moved to CHANGELOG.md

Supported Platforms and Languages

The KPL is written in C++ and runs as a child process to the main user process. Precompiled native binaries are bundled with the Java release and are managed by the Java wrapper.

The Java package should run without the need to install any additional native libraries on the following operating systems:

  • Linux distributions with glibc 2.9 or later
  • Apple OS X 10.13 and later

Note the release is 64-bit only.

Sample Code

A sample java project is available in java/amazon-kinesis-sample.

Compiling the Native Code

Rather than compiling from source, Java developers are encouraged to use the KPL release in Maven, which includes pre-compiled native binaries for Linux, macOS.

To build the native components and bundle them into the jar, you can run the ./bootstrap.sh which will download the dependencies, build them, then build the native binaries, bundle them into the java resources folder, and then build the java packages. This must be done on the platform you are planning to execute the jars on.

Using the Java Wrapper with the Compiled Native Binaries

There are two options. You can either pack the binaries into the jar like we did for the official release, or you can deploy the native binaries separately and point the java code at it.

Pointing the Java wrapper at a Custom Binary

The KinesisProducerConfiguration class provides an option setNativeExecutable(String val). You can use this to provide a path to the kinesis_producer[.exe] executable you have built. You have to use backslashes to delimit paths on Windows if giving a string literal.

About

Amazon Kinesis Producer Library

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 61.8%
  • Java 35.0%
  • CMake 1.4%
  • Shell 1.2%
  • Python 0.6%