Unable to fetch files from S3 #1200

pkasravi · 2024-09-30T20:52:31Z

Describe the bug

I am trying to read many small files from S3 in parallel but am experiencing timeout errors. My timeout configuration consists of only setting operation timeout at 25 seconds. This is more than enough time to complete my workload (explanation below). The “timed out” requests never reach S3 as the keys are not present in the server side logs. It seems the SDK is throttling requests causing them to time out before they’re executed.

Regression Issue

Select this option if this issue appears to be a regression.

Expected Behavior

My workload is 2369 files where each file is <=8MB. My environment consists of a p4de EC2 instance in the same account as the bucket being accessed.

Reading a single 8MB file should take <=1ms. A single NIC on a p4de achieves 100 Gbps throughput, this means:
8MB * 8 / 1000 = 0.064Gb
0.064Gb / 100Gbps = 0.00064s = 0.64ms = ~1ms

Furthermore, a p4de has 96 cores and tokio-rs is configured to use all cores. This means only 96 requests can be processed at a time. This means theoretically in 25ms I should be able to complete all 2369 requests.
2369 tasks / 96 cores * 1ms = 24.677ms = ~25ms.

Using an operation timeout of 25ms AND 25s both give timeout errors. I expect 25s to be more than enough to not experience any timeouts ever.

Current Behavior

Error from application layer

RuntimeError: Failed to fetch. Bucket: XXXX Key: XXXX

Caused by:
    0: dispatch failure
    1: other
    2: an error occurred while loading credentials
    3: an unexpected error occurred communicating with IMDS
    4: error trying to connect: HTTP connect timeout occurred after 1s
    5: HTTP connect timeout occurred after 1s
    6: timed out

From debug logs, nothing stands out prior to the error. Except for getting a lot of these

DEBUG aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: current throughput: 0 B/s is below mini
mum: 1 B/s

Reproduction Steps

Client initialization

let timeout_config = TimeoutConfig::builder()
    .operation_timeout(Duration::from_secs(25))
    .build();

let config = aws_config::from_env()
    .region(aws_config::Region::new("us-west-2"))
    .timeout_config(timeout_config)
    .load()
    .await;
if cfg!(debug_assertions) {
    tracing_subscriber::fmt::init();
}
aws_sdk_s3::Client::new(&config)

Reading function

pub async fn read_from_s3(client: &Client, bucket: &str, key: &str) -> Result<Vec<u8>> {
    let mut res = client
        .get_object()
        .bucket(bucket.to_string())
        .key(key)
        .send()
        .await
        .with_context(|| format!("Failed to fetch. Bucket: {} Key: {}", bucket, key))?;
    let mut body_bytes = Vec::new();
    while let Some(chunk) = res.body.next().await {
        body_bytes.extend_from_slice(&chunk.unwrap());
    }
    Ok(body_bytes)
}

Driver code

let mut tasks = vec![];                                                                                                                       
for filename in tasks {
    tasks.push(tokio::spawn(async move {
        let buffer: Vec<u8> = read_from_s3(&client_clone, &bucket_clone, &filename).await?;
    }));
}
let results = join_all(tasks).await;

Possible Solution

No response

Additional Information/Context

No response

Version

aws-sdk-s3 v1.47.0
aws-config v1.5.5

Environment details (OS name and version, etc.)

AL2 5.10.214-202.855.amzn2.x86_64

Logs

No response

The text was updated successfully, but these errors were encountered:

landonxjames · 2024-09-30T21:08:21Z

3: an unexpected error occurred communicating with IMDS

The above suggests that the SDK is failing to communicate with EC2's IMDS to get your credentials. This likely has nothing to do with calls to S3 at all since the error is being returned way before the service would actually be called. Are you sure that IMDS is correctly configured to provide credentials or that it is the credential provider you want?

Here is some info on troubleshooting IMDS: https://repost.aws/knowledge-center/ec2-linux-metadata-retrieval

Here is our guide on credential retrieval and the order creds are resolved in: https://docs.aws.amazon.com/sdk-for-rust/latest/dg/credproviders.html

pkasravi · 2024-09-30T23:02:45Z

3: an unexpected error occurred communicating with IMDS
The above suggests that the SDK is failing to communicate with EC2's IMDS to get your credentials. This likely has nothing to do with calls to S3 at all since the error is being returned way before the service would actually be called. Are you sure that IMDS is correctly configured to provide credentials or that it is the credential provider you want?

Here is some info on troubleshooting IMDS: https://repost.aws/knowledge-center/ec2-linux-metadata-retrieval

Here is our guide on credential retrieval and the order creds are resolved in: https://docs.aws.amazon.com/sdk-for-rust/latest/dg/credproviders.html

I don't think it's a credentials issue because if it was the very first request would fail and I wouldn't see successful responses in the server side logs. I am seeing ~800-900 completed requests out of the 2369.

ysaito1001 · 2024-10-03T17:17:12Z

According to the offline discussion, this is due to requests for S3 getting throttled. The solution was to add a semaphore around spawning the read tasks to control the number of concurrency and to switch back to the default client config.

IMDS and stalled stream protection were irrelevant to the observed failure.

github-actions · 2024-10-03T17:17:28Z

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

pkasravi added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 30, 2024

Velfi removed the needs-triage This issue or PR still needs to be triaged. label Oct 3, 2024

ysaito1001 closed this as completed Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to fetch files from S3 #1200

Unable to fetch files from S3 #1200

pkasravi commented Sep 30, 2024

landonxjames commented Sep 30, 2024 •

edited

Loading

pkasravi commented Sep 30, 2024

ysaito1001 commented Oct 3, 2024

github-actions bot commented Oct 3, 2024

Unable to fetch files from S3 #1200

Unable to fetch files from S3 #1200

Comments

pkasravi commented Sep 30, 2024

Describe the bug

Regression Issue

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

Version

Environment details (OS name and version, etc.)

Logs

landonxjames commented Sep 30, 2024 • edited Loading

pkasravi commented Sep 30, 2024

ysaito1001 commented Oct 3, 2024

github-actions bot commented Oct 3, 2024

landonxjames commented Sep 30, 2024 •

edited

Loading