Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kinesis -> lambda -> ES only posting some records #1

Open
csakoda opened this issue Nov 3, 2015 · 5 comments
Open

kinesis -> lambda -> ES only posting some records #1

csakoda opened this issue Nov 3, 2015 · 5 comments

Comments

@csakoda
Copy link

csakoda commented Nov 3, 2015

I setup a simple config to test this out:

logstash-forwarder -> logstash -> kinesis -> lambda -> ES

For batches of size 1, this works fine, log lines are pushed to ES in a few seconds.

For batch size > 1, I am seeing inconsistent behavior, with one of some of my records being pushed. It's often about 5 records that make it through but it's not consistent.

In my lambda logs I sometimes see Error: Error: socket hang up a number of times, corresponding to the number of missing records, but this too is inconsistent.

I've confirmed that the expected # of the records appear in the kinesis stream and are transmitted to the lambda job.

What am I doing wrong? Is my kinesis stream too small? My lambda timeout too short?

@csakoda
Copy link
Author

csakoda commented Nov 3, 2015

Found that switching to the bulk API in the lambda script was enough to fix it.

Do you folks have any estimations on performance of this sample? Is it meant to work at scale? Or just connect the dots.. ?

@srisub-amzn
Copy link
Contributor

Thanks for trying out the sample!

The code is meant to demonstrate how Lambda could be used for ES data ingestion. It is simplified for the purposes of clarity and is not tuned to work at scale.

@csakoda
Copy link
Author

csakoda commented Nov 4, 2015

Cool, I figured as much.

It would be super helpful to see a sample that ingested aggregated KPL records. I see the records come through as base64 encoded JSON, delimited by..some hex values? I can't figure out what from casually reading the KPL code.

Is there anything like that coming?

Thanks again!

@srisub-amzn
Copy link
Contributor

I realized that this thread was left hanging. Could you elaborate what you mean by aggregated KPL records?

@deanjez
Copy link

deanjez commented Dec 24, 2015

I appreciate these code samples. I'm interested in a version of this lambda function that utilises the ES bulk API also. I assume what is referred to above is that the existing function, iterates through the collection of Kinesis records and makes a separate HTTP request to ES to index each document. This would be very expensive at scale regarding I/O on the ES cluster. Iterating through the Kinesis record collection and building an ES bulk API request within the Lambda function would be an improved alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants