kinesis -> lambda -> ES only posting some records #1

csakoda · 2015-11-03T06:01:25Z

I setup a simple config to test this out:

logstash-forwarder -> logstash -> kinesis -> lambda -> ES

For batches of size 1, this works fine, log lines are pushed to ES in a few seconds.

For batch size > 1, I am seeing inconsistent behavior, with one of some of my records being pushed. It's often about 5 records that make it through but it's not consistent.

In my lambda logs I sometimes see Error: Error: socket hang up a number of times, corresponding to the number of missing records, but this too is inconsistent.

I've confirmed that the expected # of the records appear in the kinesis stream and are transmitted to the lambda job.

What am I doing wrong? Is my kinesis stream too small? My lambda timeout too short?

The text was updated successfully, but these errors were encountered:

csakoda · 2015-11-03T06:50:55Z

Found that switching to the bulk API in the lambda script was enough to fix it.

Do you folks have any estimations on performance of this sample? Is it meant to work at scale? Or just connect the dots.. ?

srisub-amzn · 2015-11-04T10:07:49Z

Thanks for trying out the sample!

The code is meant to demonstrate how Lambda could be used for ES data ingestion. It is simplified for the purposes of clarity and is not tuned to work at scale.

csakoda · 2015-11-04T17:30:02Z

Cool, I figured as much.

It would be super helpful to see a sample that ingested aggregated KPL records. I see the records come through as base64 encoded JSON, delimited by..some hex values? I can't figure out what from casually reading the KPL code.

Is there anything like that coming?

Thanks again!

srisub-amzn · 2015-12-02T14:34:01Z

I realized that this thread was left hanging. Could you elaborate what you mean by aggregated KPL records?

deanjez · 2015-12-24T22:29:53Z

I appreciate these code samples. I'm interested in a version of this lambda function that utilises the ES bulk API also. I assume what is referred to above is that the existing function, iterates through the collection of Kinesis records and makes a separate HTTP request to ES to index each document. This would be very expensive at scale regarding I/O on the ES cluster. Iterating through the Kinesis record collection and building an ES bulk API request within the Lambda function would be an improved alternative.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kinesis -> lambda -> ES only posting some records #1

kinesis -> lambda -> ES only posting some records #1

csakoda commented Nov 3, 2015

csakoda commented Nov 3, 2015

srisub-amzn commented Nov 4, 2015

csakoda commented Nov 4, 2015

srisub-amzn commented Dec 2, 2015

deanjez commented Dec 24, 2015

kinesis -> lambda -> ES only posting some records #1

kinesis -> lambda -> ES only posting some records #1

Comments

csakoda commented Nov 3, 2015

csakoda commented Nov 3, 2015

srisub-amzn commented Nov 4, 2015

csakoda commented Nov 4, 2015

srisub-amzn commented Dec 2, 2015

deanjez commented Dec 24, 2015