-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cookie Monster #2
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DelphianCalamity, I just left a few comments or clarification questions. It's amazing that you made that work :) The diff is huge so I couldn't review everything yet, I'm rather trying to probe if I understand ARA correctly, and hopefully that will help me have a data model that's close enough to the code. I'll take another pass once I have the theory ready to check that the optimizations really spend the right thing!
content/browser/attribution_reporting/aggregatable_attribution_utils.cc
Outdated
Show resolved
Hide resolved
// Populate partition.report_value_pairs[*].report for all source_keys | ||
for (auto& outer : source_counts_per_sourcekey) { | ||
auto source_key = outer.first; | ||
auto& report_value_pair = partition.report_value_pairs[source_key]; | ||
auto& trigger_keypieces = trigger_keypieces_per_source[source_key]; | ||
|
||
for (auto& inner : outer.second) { | ||
auto source_keypiece = inner.first; | ||
auto source_count = inner.second; | ||
|
||
// Extend the source key_pieces for source_key | ||
for (auto& trigger_keypiece : trigger_keypieces) { | ||
source_keypiece |= trigger_keypiece; | ||
} | ||
|
||
double contribution_value = (source_count / total_sources_count) * report_value_pair.value; | ||
report_value_pair.report.emplace_back(source_keypiece, contribution_value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so first we build source_counts_per_sourcekey
, that is a list (actually dict) of sources, for each source we have a histogram that maps each keypiece to the number of times it appears in the whole attribution window.
And next we browse all the sources. If there is a trigger associated to that source we concatenate the trigger_keypiece and map it to the right contribution value. But what if there is no trigger associated to that source? E.g. source_key
does not appear in trigger_keypieces_per_source
. Should we still give some value to the untouched source_keypiece
? Maybe yes, it's a sort of unattributed impression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should always be a value for a given source_key. There should be a trigger for every existing source. I do some sanity check to catch some errors here
// Sanity check: every source_key from trigger_data must exist in source aggregation_keys too |
but I should do some more to ensure this doesn't happen.
I was actually hoping ARA does that already tbh earlier but will check
double Partition::compute_sensitivity(const char* sensitivity_metric) { | ||
double individual_sensitivity = 0; | ||
if (std::strcmp(sensitivity_metric, "L1") == 0) { | ||
for (auto& pair : report_value_pairs) { | ||
for (auto& bucket : pair.second.report) { | ||
individual_sensitivity += bucket.value(); | ||
} | ||
} | ||
return individual_sensitivity; | ||
} else if (std::strcmp(sensitivity_metric, "L2") == 0) { | ||
// TODO(kelly) | ||
// return AggregatableResult::kInternalError; | ||
} else { | ||
// return AggregatableResult::kInternalError; | ||
} | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one looks good for now, but I'll keep an eye on it until I have the full proof written. It's fine for impressions but I'm a bit worried about the sensitivity for conversions
} else if (std::strcmp(kSensitivityMetric, "L2") == 0) { | ||
// TODO(kelly) | ||
// sum up histograms, raise each bucket value to power of 2 sum them up and sqrt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but let's wait for that one to be sure that we don't mess up with a non-linearity due to the square/sqrt
No description provided.