Improve memory usage for computing hybrid results in the clear #1488

akoshelev · 2024-12-10T23:57:56Z

This operation was OOMing on 500M inputs

memory allocation of 73551314960 bytes failed
Aborted

The reason for that was that we were keeping all the information from original reports, including AAD and match key.
The improved algorithm only keeps breakdown keys and values to perform attribution.

My back of the envelope calculation says that this improvement is enough to get us to 1 billion reports. Each entry would take ~8 bytes and for 1B entries we are looking at 8Gb memory consumption which should be doable

I ran it on 500M and it finished successfully.

Same reason as for many other recent PRs - save memory for inputs larger than 200M, the footprint is significant

eriktaubeneck

looks good!

codecov · 2024-12-11T00:27:19Z

Codecov Report

Attention: Patch coverage is 93.33333% with 3 lines in your changes missing coverage. Please review.

Project coverage is 93.28%. Comparing base (bb543f9) to head (1acf7bd).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
ipa-core/src/bin/in_the_clear.rs	0.00%	2 Missing ⚠️
ipa-core/src/test_fixture/hybrid.rs	97.67%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1488   +/-   ##
=======================================
  Coverage   93.28%   93.28%           
=======================================
  Files         239      239           
  Lines       43532    43546   +14     
=======================================
+ Hits        40608    40624   +16     
+ Misses       2924     2922    -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

akoshelev added 3 commits December 10, 2024 15:51

Make hybrid_in_the_clear use iterators, rather than slices

0d7cc5f

Same reason as for many other recent PRs - save memory for inputs larger than 200M, the footprint is significant

Only store what's needed for attribution logic

12c6e8c

Final version

1acf7bd

eriktaubeneck approved these changes Dec 11, 2024

View reviewed changes

akoshelev merged commit ff37b8a into private-attribution:main Dec 11, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve memory usage for computing hybrid results in the clear #1488

Improve memory usage for computing hybrid results in the clear #1488

akoshelev commented Dec 10, 2024

eriktaubeneck left a comment

codecov bot commented Dec 11, 2024 •

edited

Loading

Improve memory usage for computing hybrid results in the clear #1488

Improve memory usage for computing hybrid results in the clear #1488

Conversation

akoshelev commented Dec 10, 2024

eriktaubeneck left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 11, 2024 • edited Loading

Codecov Report

codecov bot commented Dec 11, 2024 •

edited

Loading