Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid using floating point arithmetic for privacy budget #77

Open
csharrison opened this issue Feb 3, 2025 · 4 comments
Open

Avoid using floating point arithmetic for privacy budget #77

csharrison opened this issue Feb 3, 2025 · 4 comments
Labels
discuss Needs working group discussion

Comments

@csharrison
Copy link
Contributor

This is a fairly minor issue, but it has been bugging me since reading the spec. We capture the value of epsilon in PrivateAttributionConversionOptions as a double, which implies (though it is not specified) that we will be doing accounting in double space.

Since we will be doing repeated addition / subtraction here, there is a chance for (small) errors to accumulate, meaning that our guarantee of epsilon may be (very) slightly off. My preference instead would be to have the interface be in milliEpsilons and do accounting only with integers.1

Note: an alternative would be to round epsilon to a fixed precision first, and then do error-freeepsilon computations using rounded values. This has the benefit of enabling a possibly "cleaner" interface for developers, but is worse in my opinion because the rounding is not very transparent.

Footnotes

  1. I actually wish we had a rational type, but unfortunately it doesn't exist.

@martinthomson
Copy link
Member

Good question. I've had similar thoughts. Though I'd managed to suppress them. You have to shave things pretty finely to end up with problems here. Sure, a three-way split is going to accumulate an error, but it is so small as to be a non-event. In many cases, you won't even have an error.

Is dividing by 1k going to give people the resolution they require? Or would microEpsilons be necessary? Obviously no one is going to want to spend just one of those, but it makes things even less prone to accumulated errors.

If we accept some amount of floating point error, then that might push more toward less absolute actions at the enforcement end. Reducing contributions proportionally, rather than dropping them, for example. Let's say that you had someone spend 1/3 then 1/2 of their budget. The 1/3 will have rounding errors, but I've observed that spending the remaining 1/6 doesn't result in problems. You have to get pretty exotic with your budget computations before you are even off by the (floating point) epsilon of 2^-304 or whatever that is.

@csharrison
Copy link
Contributor Author

Actually I realized another issue that will need some thought, which is the budget calculation is computed as epsilon * value / maxValue. Even if these are all integers the quotient isn't, necessarily. This makes me think the real options are basically:

  1. Use floating point
  2. Fixed precision + conservative rounding
  3. Implement rational arithmetic

(3) is obviously not practical. Unfortunately for (2) we don't have fixed precision primitives but it can be simulated by rounding to integral values and using something like microEpsilons.

but I've observed that spending the remaining 1/6 doesn't result in problems

Can you expand on this? What do you mean by problems? Is it related to #16?

@martinthomson
Copy link
Member

but I've observed that spending the remaining 1/6 doesn't result in problems

Can you expand on this? What do you mean by problems? Is it related to #16?

This is just a general observation that - in a lot of cases anyway - the errors involved in multiple floating point additions amount to zero. Not close to zero, but zero. That's likely because the rounding either cancels out or it never ends up being large enough to tickle that 52nd bit of the mantissa. That is, maybe it isn't worth the stress involved with rationals, or even microEpsilons.

It might be worth quantifying risks.

  • On the budgeting end, there is a risk that the sum of budget spends is rounded down, leading to additional privacy loss equivalent to the error.
  • Conversely, if the sum of spends is rounded up, there is a risk that a value is lost completely when the budget overflows by an infinitesimal amount. (I'm coming around to your view on partial values here.)

This is all testable though. I seriously doubt that any budget would result in a significant privacy loss if split multiple ways, as long as there was some minimum spend on each query. (That's a fun exercise, work out what sequence of floating point values ensure the greatest difference between an actual and realized sum. In practice, as long as you don't allow spending in increments of less than $2^{-53}$, then I think we'll manage.)

Similarly, the consequences of rounding errors leading to budget overflow and loss of data is

a. Manageable (it's possible to compute how different spending patterns lead to bad rounding and avoid that)
b. Probably far less of a concern than the other sources of bias we already have

There is also a risk that contributions to histogram need to be split in ways that result in non-integer contributions. There, I think our best recourse is Math.floor(), rather than rounding, though we might also define integer-based algorithms.

Of course, after writing all this, I'm still of the "don't overthink it" persuasion, despite having thoroughly overthought it.

@martinthomson martinthomson added the discuss Needs working group discussion label Feb 4, 2025
@martinthomson
Copy link
Member

This popped up just today: https://github.com/tc39/proposal-math-sum

(To be clear, I'm not proposing that we use that. I'm suggesting that we don't.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Needs working group discussion
Projects
None yet
Development

No branches or pull requests

2 participants