Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time not attributed to boto3 #1230

Open
daveisfera opened this issue Oct 9, 2024 · 6 comments
Open

Time not attributed to boto3 #1230

daveisfera opened this issue Oct 9, 2024 · 6 comments

Comments

@daveisfera
Copy link

New Relic profile incorrectly shows a few ms for the time to download/upload a file that takes several seconds

Description
boto3 was instrumented with the fix in #1178 but the time used by the download/upload of the file from/to S3 is attributed to the Python code that calls that rather than that function

Expected Behavior
Time would be correctly attributed to the boto3 code rather than the caller

Steps to Reproduce

  1. Run Python code that uses boto3 to download/upload a file from/to S3
  2. View profile information in New Relic
  3. Observe that time show for the GET and HEAD calls to S3 are very small rather than the multiple seconds it takes

Your Environment
Python 3.12.7 with Debian Bookworm

@workato-integration
Copy link

@hmstepanek
Copy link
Contributor

Hey @daveisfera I was looking at this and I'm not necessarily seeing huge amounts of time missing like you are but my test env might be a little closer to the s3 bucket than yours. If you add the following function trace around the s3 download/upload call-what do you see in terms of time breakdown? Does the function trace capture the missing time or no?

    with newrelic.agent.FunctionTrace("download_file", 'Botocore/s3'):
        client.download_file(Bucket=ARN, Key='aiobotocore/dummy.bin', Filename='/tmp/dummy.bin')

@hmstepanek
Copy link
Contributor

I did a little more digging on this today with a larger file and I started to see the large growth in unaccounted seconds. I believe this time is file io time, which is captured if you add a function wrapper around the whole thing as I suggested in my previous comment. Given that we did not intend to capture file io time in our instrumentation, I'm marking this as a feature request instead of a bug.

I've consulted with the other language agent teams at new relic and none of them capture file io time either. I can think of some cases where it would be helpful to capture the file io time which is; if it could be reduced by using a zip file to compress the data that is written or if your system had some sort of file handle contention. I'll bring this up in our team discussion and see how people feel about it. It's not hard to implement but neither is adding a decorator-it's just whether it would be helpful for everyone to have it by default as it is more data to capture and send.

@hmstepanek hmstepanek removed the bug label Jan 23, 2025
@daveisfera
Copy link
Author

The download happens in a separate thread, so I assumed that's what actually causing it, but regardless of why it's not being accounted for, it's really confusing in the profile and makes it hard to see what's actually going on from a performance perspective.
For example, I was just looking at data in New Relic for an endpoint that was taking 2-5 seconds and New Relic is saying that ~80% of the time is "in the function", but almost all of that is actually in downloading files from S3 and it's not telling me that. It's the second biggest user at ~19% so 99% of the time spent in those 2-5 seconds is actually boto3 related, but instead it's just nebulously assigned to the calling function and I now know to look for this issue, but it took me a while to figure this out and means that the data being shown isn't correct/helpful.

@hmstepanek
Copy link
Contributor

I understand how that would be confusing. I've added this to our feature backlog but also you are welcome to open a PR to fix it-just need to add a function trace around the whole s3 function. I'm not sure if we have a fileio type or something similar but we could create one-whoever implements this will have to investigate that more when they do. There are lots of examples of other wrappers around various aws sdk functions in newrelic/hooks/external_botocore.py. We might be able to sneak it in with this quarter of work-we'll have to see if we have time.

@daveisfera
Copy link
Author

I'm not familiar with the code so don't even know where I'd make any change to support this, but I'm pretty sure it worked in the past, so I'm almost positive it's from the threading changes that boto3/botocore made rather than anything to do with I/O.
But S3 operations are a pretty core piece of functionality so I've got to imagine that this is impacting quite a few New Relic users that are running Python code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants