Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid copy in SentryEnvelopeItem.envelopeItemStream #2417

Merged
merged 17 commits into from
Nov 26, 2024

Conversation

denrase
Copy link
Collaborator

@denrase denrase commented Nov 18, 2024

#skip-changelog

📜 Description

Removed SentryEnevlopeItem.envelopeItemStream and do proper yielding of data chunks in envelope.

💡 Motivation and Context

Did some CPU profiling related to #2379 and one of the top functions is SentryEnevlopeItem.envelopeItemStream. Noticed the TODO from @vaind and checked an alternative implementation where we pre-allocate the array size and just set the data.

As far as i could tell with a test, this approach is faster (see test below).

Relates to #2379
Closes #1666

💚 How did you test it?

Looped an event 100 times in the sample app. I did see a difference in the CPU profiler, but could not pinpoint if this indeed was an improvement, as the percentages were not really comparable with consecutive runs.

y7dJD Es9C8

So I also added a test comparing the two methods with an 1MB large attachment iterating the methods 10.000 times with an avg.

Bildschirmfoto 2024-11-18 um 15 18 08

📝 Checklist

  • I reviewed submitted code
  • I added tests to verify changes
  • No new PII added or SDK only sends newly added PII if sendDefaultPii is enabled
  • I updated the docs if needed
  • All tests passing
  • No breaking changes

Copy link

codecov bot commented Nov 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.90%. Comparing base (4c13d97) to head (4170431).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2417      +/-   ##
==========================================
- Coverage   87.01%   86.90%   -0.12%     
==========================================
  Files         259      259              
  Lines        9233     9224       -9     
==========================================
- Hits         8034     8016      -18     
- Misses       1199     1208       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@denrase denrase changed the title Avoid copy inSentryEnevlopeItem.envelopeItemStream Avoid copy in SentryEnvelopeItem.envelopeItemStream Nov 18, 2024
Copy link
Contributor

github-actions bot commented Nov 18, 2024

Android Performance metrics 🚀

  Plain With Sentry Diff
Startup time 523.61 ms 560.65 ms 37.04 ms
Size 6.49 MiB 7.56 MiB 1.07 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
1ac008b 370.04 ms 435.58 ms 65.54 ms
4ad2751 374.58 ms 462.00 ms 87.42 ms
43760f9 321.78 ms 366.77 ms 44.99 ms
fec92cc 399.96 ms 479.26 ms 79.30 ms
c9d3212 301.34 ms 361.58 ms 60.24 ms
5603ab2 309.84 ms 345.20 ms 35.36 ms
3e9fb0e 329.14 ms 359.16 ms 30.02 ms
6d317ea 303.46 ms 356.06 ms 52.60 ms
1a93825 347.31 ms 424.54 ms 77.23 ms
bffc2c5 348.00 ms 399.89 ms 51.89 ms

App size

Revision Plain With Sentry Diff
1ac008b 6.33 MiB 7.27 MiB 954.13 KiB
4ad2751 6.16 MiB 7.14 MiB 1010.86 KiB
43760f9 6.15 MiB 7.13 MiB 1000.49 KiB
fec92cc 6.35 MiB 7.33 MiB 1005.56 KiB
c9d3212 6.16 MiB 7.14 MiB 1010.90 KiB
5603ab2 5.94 MiB 6.95 MiB 1.01 MiB
3e9fb0e 5.94 MiB 6.95 MiB 1.01 MiB
6d317ea 5.94 MiB 6.92 MiB 1001.74 KiB
1a93825 6.27 MiB 7.20 MiB 956.36 KiB
bffc2c5 6.34 MiB 7.28 MiB 966.31 KiB

Previous results on branch: enha/performance-improvements

Startup times

Revision Plain With Sentry Diff
4397f58 456.42 ms 496.91 ms 40.50 ms
dbc646a 466.50 ms 510.43 ms 43.93 ms
49cb755 433.67 ms 489.27 ms 55.60 ms
1fc275d 459.56 ms 491.37 ms 31.80 ms
5b905e0 464.42 ms 500.12 ms 35.71 ms

App size

Revision Plain With Sentry Diff
4397f58 6.49 MiB 7.56 MiB 1.07 MiB
dbc646a 6.49 MiB 7.56 MiB 1.07 MiB
49cb755 6.49 MiB 7.56 MiB 1.07 MiB
1fc275d 6.49 MiB 7.56 MiB 1.07 MiB
5b905e0 6.49 MiB 7.56 MiB 1.07 MiB

Copy link
Contributor

github-actions bot commented Nov 18, 2024

iOS Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1256.96 ms 1281.58 ms 24.63 ms
Size 8.38 MiB 9.78 MiB 1.40 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
6f3717a 1259.84 ms 1280.90 ms 21.06 ms
e6b16cd 1226.20 ms 1246.52 ms 20.32 ms
8fa3934 1252.79 ms 1272.41 ms 19.62 ms
cc80714 1205.53 ms 1223.90 ms 18.37 ms
0a82a1e 1233.56 ms 1244.56 ms 11.00 ms
3334ac1 1259.22 ms 1275.40 ms 16.17 ms
256df44 1252.49 ms 1274.27 ms 21.78 ms
cf91c9d 1217.08 ms 1233.00 ms 15.92 ms
b728df4 1287.43 ms 1293.94 ms 6.51 ms
61e71ec 1243.14 ms 1257.21 ms 14.07 ms

App size

Revision Plain With Sentry Diff
6f3717a 8.33 MiB 9.62 MiB 1.29 MiB
e6b16cd 8.33 MiB 9.54 MiB 1.22 MiB
8fa3934 8.09 MiB 9.07 MiB 1000.86 KiB
cc80714 8.33 MiB 9.40 MiB 1.07 MiB
0a82a1e 8.29 MiB 9.37 MiB 1.08 MiB
3334ac1 8.10 MiB 9.17 MiB 1.08 MiB
256df44 8.38 MiB 9.71 MiB 1.33 MiB
cf91c9d 8.33 MiB 9.40 MiB 1.07 MiB
b728df4 8.15 MiB 9.15 MiB 1020.72 KiB
61e71ec 8.33 MiB 9.40 MiB 1.07 MiB

Previous results on branch: enha/performance-improvements

Startup times

Revision Plain With Sentry Diff
5b905e0 1236.26 ms 1261.00 ms 24.74 ms
dbc646a 1246.50 ms 1274.39 ms 27.89 ms
4397f58 1256.04 ms 1281.84 ms 25.80 ms
49cb755 1250.18 ms 1279.17 ms 28.99 ms
1fc275d 1236.58 ms 1255.40 ms 18.81 ms

App size

Revision Plain With Sentry Diff
5b905e0 8.38 MiB 9.78 MiB 1.40 MiB
dbc646a 8.38 MiB 9.77 MiB 1.39 MiB
4397f58 8.38 MiB 9.78 MiB 1.40 MiB
49cb755 8.38 MiB 9.77 MiB 1.40 MiB
1fc275d 8.38 MiB 9.78 MiB 1.40 MiB

@denrase denrase marked this pull request as ready for review November 18, 2024 16:55
@denrase denrase requested review from vaind and buenaflor and removed request for stefanosiano, buenaflor, martinhaintz and krystofwoldrich November 18, 2024 16:56
Copy link
Contributor

@buenaflor buenaflor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the existing tests valid for this change or does it make sense to add additional ones?

Comment on lines 137 to 144
final result = Uint8List(totalLength);
result.setRange(0, itemHeader.length, itemHeader);
result.setRange(
itemHeader.length,
itemHeader.length + newLine.length,
newLine,
);
result.setRange(itemHeader.length + newLine.length, totalLength, data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a copy though.

How about we restructure the whole thing and get rid of envelopeItemStream() altogether? IIRC it wasn't necessary last time I looked at it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could even get rid of the SentryEnvelopeItemHeader class, which would get rid of the awkward relationship between SentryEnvelopeItemHeader and SentryEnvelopeItem, and probably simplifies the code quite a bit.

I think the C# SDK also doesn't have two seperate classes for it.

Copy link
Collaborator Author

@denrase denrase Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and we did use yield here before. This was changed in this PR @ueman Do you remember if the issue described in this comment did occur?

// Each item needs to be encoded as one unit.
// Otherwise the header already got yielded if the content throws
// an exception.

Because the header, in order to get the data length, it would access have to access the dataFactory, so any error would occur here, but we don't check for invalid length and we swallow this error. I will update this.

/// Stream binary data of `Envelope` item.
Stream<List<int>> envelopeItemStream() async* {
  yield utf8.encode(jsonEncode(await header.toJson())); // accessing _CachedItem.getDataLength, therefore pre-caching getData
  yield utf8.encode('\n');
  yield await dataFactory(); // dataFactory == _CachedItem.getData
}
   
class _CachedItem {
  _CachedItem(this._dataFactory);

  final Future<List<int>> Function() _dataFactory;
  List<int>? _data;

  Future<List<int>> getData() async {
    _data ??= await _dataFactory();
    return _data!;
  }

  Future<int> getDataLength() async {
    try {
      return (await getData()).length;
    } catch (_) {
      return -1;
    }
  }
}

I checked sentry-dotnet. The header it just an dictionary in EnvelopeItem.cs, where on iOS and Android we have classes that mirror the JSON structure.

@ueman Could you elaborate on what you meant by akward and how the C# approach would be beneficial here? Just want to understand what you meant so we can decide if this refactor is worth it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the envelopeItemStream returning a concatenated list and moved the actual yielding of data into the envelope. Also removed the complexity with _CachedItem and the length closure in the SentryEnvelopeItemHeader, we don't need this.

Copy link
Collaborator

@ueman ueman Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember if the issue described in this comment did occur?

// Each item needs to be encoded as one unit.
// Otherwise the header already got yielded if the content throws
// an exception.

I did run into the issue where I could create the header, but not the item itself, which then resulted in a broken envelop. I don't know of a specific example though. It's probably more relevant for sync than for async calls.

Could you elaborate on what you meant by akward and how the C# approach would be beneficial here? Just want to understand what you meant so we can decide if this refactor is worth it.

What I mean by that is that you can probably get rid of the _CachedItem and friends, since you no longer need to read size etc twice, since you could do all of that in just one class. I can't really tell whether it's worth it though.

@denrase denrase requested a review from buenaflor November 19, 2024 13:49
@denrase denrase requested review from ueman and vaind November 19, 2024 13:49
@denrase
Copy link
Collaborator Author

denrase commented Nov 19, 2024

@buenaflor Added a test for the "new" behaviour where we skip items that throw in their respective data factory.

Copy link
Collaborator

@vaind vaind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some suggestions here and in #2444

How do you feel about taking this one step further and changing FileSystemTransport.send() to preallocate a single uint8list for the whole envelope instead of creating multiple in-between lists with addAll()?

https://github.com/getsentry/sentry-dart/blob/main/flutter/lib/src/file_system_transport.dart

CHANGELOG.md Outdated Show resolved Hide resolved
dart/lib/src/sentry_envelope.dart Outdated Show resolved Hide resolved
@denrase
Copy link
Collaborator Author

denrase commented Nov 25, 2024

@vaind Let's merge your improvements in #2444 into her, than we could merge.

We don't know the size of the list for the whole envelope in advance., and we need to iterate the items with their data factories for that. Would you suggest instead having some kind of fixed size buffer that we increase if needed, so that don't need an additional copy step?

@vaind
Copy link
Collaborator

vaind commented Nov 25, 2024

@vaind Let's merge your improvements in #2444 into her, than we could merge.

We don't know the size of the list for the whole envelope in advance., and we need to iterate the items with their data factories for that. Would you suggest instead having some kind of fixed size buffer that we increase if needed, so that don't need an additional copy step?

let's leave the transport for now, i'll follow up

CHANGELOG.md Outdated Show resolved Hide resolved
@denrase denrase merged commit 5896109 into main Nov 26, 2024
133 checks passed
@denrase denrase deleted the enha/performance-improvements branch November 26, 2024 10:02
@denrase
Copy link
Collaborator Author

denrase commented Nov 26, 2024

@vaind @ueman Thank you both for insights & contributions here! 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid data copies in envelope serialization
4 participants