-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kinesis -> lambda -> ES only posting some records #1
Comments
Found that switching to the bulk API in the lambda script was enough to fix it. Do you folks have any estimations on performance of this sample? Is it meant to work at scale? Or just connect the dots.. ? |
Thanks for trying out the sample! The code is meant to demonstrate how Lambda could be used for ES data ingestion. It is simplified for the purposes of clarity and is not tuned to work at scale. |
Cool, I figured as much. It would be super helpful to see a sample that ingested aggregated KPL records. I see the records come through as base64 encoded JSON, delimited by..some hex values? I can't figure out what from casually reading the KPL code. Is there anything like that coming? Thanks again! |
I realized that this thread was left hanging. Could you elaborate what you mean by aggregated KPL records? |
I appreciate these code samples. I'm interested in a version of this lambda function that utilises the ES bulk API also. I assume what is referred to above is that the existing function, iterates through the collection of Kinesis records and makes a separate HTTP request to ES to index each document. This would be very expensive at scale regarding I/O on the ES cluster. Iterating through the Kinesis record collection and building an ES bulk API request within the Lambda function would be an improved alternative. |
I setup a simple config to test this out:
logstash-forwarder -> logstash -> kinesis -> lambda -> ES
For batches of size 1, this works fine, log lines are pushed to ES in a few seconds.
For batch size > 1, I am seeing inconsistent behavior, with one of some of my records being pushed. It's often about 5 records that make it through but it's not consistent.
In my lambda logs I sometimes see
Error: Error: socket hang up
a number of times, corresponding to the number of missing records, but this too is inconsistent.I've confirmed that the expected # of the records appear in the kinesis stream and are transmitted to the lambda job.
What am I doing wrong? Is my kinesis stream too small? My lambda timeout too short?
The text was updated successfully, but these errors were encountered: