Improve memory usage for computing hybrid results in the clear #1488
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This operation was OOMing on 500M inputs
The reason for that was that we were keeping all the information from original reports, including AAD and match key.
The improved algorithm only keeps breakdown keys and values to perform attribution.
My back of the envelope calculation says that this improvement is enough to get us to 1 billion reports. Each entry would take ~8 bytes and for 1B entries we are looking at 8Gb memory consumption which should be doable
I ran it on 500M and it finished successfully.