accept references to payloads in put requests #1218

fommil · 2024-12-09T12:10:21Z

Describe the feature

many builders require their input to be owned, which means that in order to handle a possible retry under failure, the caller must make a copy that is most likely going to be thrown away. This is very wasteful, especially in high throughput applications and when writing large payloads.

Use Case

here's an example of writing records to kinesis. The batch here has to be fully owned, containing the payloads. Which means we need to clone it just incase we need to retry it.

                PutRecordsRequestEntry::builder()
                    .data(Blob::new(data))
                    .partition_key(common_partition_key)
                    .build()
                    .unwrap(), // test code

...

        let putter = kinesis
            .put_records()
            .stream_name(&name)
            .set_records(Option::Some(batch));

        let mut success = false;
        let mut attempts = 0;
        while !success {
            // annoying that we have to clone the data to be able to retry
            let results = putter.clone().send().await.unwrap(); // test code
            let failures = results.failed_record_count;
            if failures.unwrap_or(0) == 0 {
                success = true;
            } else {
                tracing::debug!("RETRYING A WRITE TO THE KINESIS STREAM {name}");
                // we may have dupe records now, how exciting! it is possible to
                // only resend the ones that failed, but it's not worth it here.
                attempts += 1;
                sleep(Duration::from_secs(1)).await;
                if attempts >= 10 {
                    panic!("Too many retry attempts writing to kinesis stream {name}");
                }
            }
        }

Proposed Solution

I don't think builders are a very idiomatic way of doing a lot of this in the rust ecosystem, although it seems to be the AWS way. If there were no builders, or struct objects, we would be able to say that the methods could simply take referencse rather than owned objects. However, I doubt we'll see builders done away with... so it would be good enough if select responses returned the input parameter instead of completely consuming. That behaves effectively the same as taking a reference.

Or take Rc wrappers of everything in the builders... or push that to the smithy layer and use Rc wrapped bytes in the Blob.

Other Information

No response

Acknowledgements

I may be able to implement this feature request
This feature might incur a breaking change

A note for the community

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue, please leave a comment

The text was updated successfully, but these errors were encountered:

aajtodd · 2024-12-09T16:18:44Z

Thanks for the issue.

I don't think builders are a very idiomatic way of doing a lot of this in the rust ecosystem, although it seems to be the AWS way.

I'd argue that builders are fairly common in Rust but that's neither here nor there. The use of builders in the Rust SDK stems in part from dealing with model evolutions and being able to provide strong backwards compatibility guarantees.

I don't think there is much we can do here but I would ask why you are implementing retries on top of the SDK which already retries requests?

fommil · 2024-12-13T10:48:26Z

Unfortunately, the PutRecords batch call cannot be retried automatically, because it supports partial success... so the retry you're @aajtodd talking about probably only works up until a partial success. I would imagine all endpoints with partial success are therefore susceptible to this same problem.

In this particular case, I think what is happening is that within the batch, the puts to shards that are being rescaled are failing, but the puts to shards that are fine are succeeding. A more efficient retry user-level logic would retry only the puts that failed the first time, but it's sufficient to retry everything. Regardless, to be able to avoid data loss, one must copy all the data and throw it away in most cases, because the builder consumes the input. The total payload can be up to 5MB (see PutRecords), broken into as many as 500 individual blobs, so it's not an insignificant cost for a high throughput application. (And remember this is all happening in parallel, consider how much data that is for a stream with 1,000 shards, albeit across many producers...)

There probably is something that can be done here, though... adding the request payloads to the failed responses would for sure solve this particular issue. That way there's no need to clone every 5MB upload before sending it, we know we will get back the requests that failed. That would be a PutRecordsResultEntry and adding an optional field failed_request: Option<PutRecordsRequestEntry> would be great, only populated on failure.

fommil added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Dec 9, 2024

aajtodd added p2 This is a standard priority issue response-requested Waiting on additional info and feedback. Will move to 'closing-soon' in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Dec 9, 2024

github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to 'closing-soon' in 7 days. label Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accept references to payloads in put requests #1218

accept references to payloads in put requests #1218

fommil commented Dec 9, 2024 •

edited

Loading

aajtodd commented Dec 9, 2024

fommil commented Dec 13, 2024 •

edited

Loading

accept references to payloads in put requests #1218

accept references to payloads in put requests #1218

Comments

fommil commented Dec 9, 2024 • edited Loading

Describe the feature

Use Case

Proposed Solution

Other Information

Acknowledgements

A note for the community

Community Note

aajtodd commented Dec 9, 2024

fommil commented Dec 13, 2024 • edited Loading

fommil commented Dec 9, 2024 •

edited

Loading

fommil commented Dec 13, 2024 •

edited

Loading