Avoid using floating point arithmetic for privacy budget #77

csharrison · 2025-02-03T17:19:11Z

This is a fairly minor issue, but it has been bugging me since reading the spec. We capture the value of epsilon in PrivateAttributionConversionOptions as a double, which implies (though it is not specified) that we will be doing accounting in double space.

Since we will be doing repeated addition / subtraction here, there is a chance for (small) errors to accumulate, meaning that our guarantee of epsilon may be (very) slightly off. My preference instead would be to have the interface be in milliEpsilons and do accounting only with integers.¹

Note: an alternative would be to round epsilon to a fixed precision first, and then do error-freeepsilon computations using rounded values. This has the benefit of enabling a possibly "cleaner" interface for developers, but is worse in my opinion because the rounding is not very transparent.

I actually wish we had a rational type, but unfortunately it doesn't exist. ↩

The text was updated successfully, but these errors were encountered:

martinthomson · 2025-02-04T01:27:35Z

Good question. I've had similar thoughts. Though I'd managed to suppress them. You have to shave things pretty finely to end up with problems here. Sure, a three-way split is going to accumulate an error, but it is so small as to be a non-event. In many cases, you won't even have an error.

Is dividing by 1k going to give people the resolution they require? Or would microEpsilons be necessary? Obviously no one is going to want to spend just one of those, but it makes things even less prone to accumulated errors.

If we accept some amount of floating point error, then that might push more toward less absolute actions at the enforcement end. Reducing contributions proportionally, rather than dropping them, for example. Let's say that you had someone spend 1/3 then 1/2 of their budget. The 1/3 will have rounding errors, but I've observed that spending the remaining 1/6 doesn't result in problems. You have to get pretty exotic with your budget computations before you are even off by the (floating point) epsilon of 2^-304 or whatever that is.

csharrison · 2025-02-04T03:25:46Z

Actually I realized another issue that will need some thought, which is the budget calculation is computed as epsilon * value / maxValue. Even if these are all integers the quotient isn't, necessarily. This makes me think the real options are basically:

Use floating point
Fixed precision + conservative rounding
Implement rational arithmetic

(3) is obviously not practical. Unfortunately for (2) we don't have fixed precision primitives but it can be simulated by rounding to integral values and using something like microEpsilons.

but I've observed that spending the remaining 1/6 doesn't result in problems

Can you expand on this? What do you mean by problems? Is it related to #16?

martinthomson · 2025-02-04T03:58:03Z

but I've observed that spending the remaining 1/6 doesn't result in problems

Can you expand on this? What do you mean by problems? Is it related to #16?

This is just a general observation that - in a lot of cases anyway - the errors involved in multiple floating point additions amount to zero. Not close to zero, but zero. That's likely because the rounding either cancels out or it never ends up being large enough to tickle that 52nd bit of the mantissa. That is, maybe it isn't worth the stress involved with rationals, or even microEpsilons.

It might be worth quantifying risks.

On the budgeting end, there is a risk that the sum of budget spends is rounded down, leading to additional privacy loss equivalent to the error.
Conversely, if the sum of spends is rounded up, there is a risk that a value is lost completely when the budget overflows by an infinitesimal amount. (I'm coming around to your view on partial values here.)

This is all testable though. I seriously doubt that any budget would result in a significant privacy loss if split multiple ways, as long as there was some minimum spend on each query. (That's a fun exercise, work out what sequence of floating point values ensure the greatest difference between an actual and realized sum. In practice, as long as you don't allow spending in increments of less than $2^{-53}$, then I think we'll manage.)

Similarly, the consequences of rounding errors leading to budget overflow and loss of data is

a. Manageable (it's possible to compute how different spending patterns lead to bad rounding and avoid that)
b. Probably far less of a concern than the other sources of bias we already have

There is also a risk that contributions to histogram need to be split in ways that result in non-integer contributions. There, I think our best recourse is Math.floor(), rather than rounding, though we might also define integer-based algorithms.

Of course, after writing all this, I'm still of the "don't overthink it" persuasion, despite having thoroughly overthought it.

martinthomson · 2025-02-06T01:32:36Z

This popped up just today: https://github.com/tc39/proposal-math-sum

(To be clear, I'm not proposing that we use that. I'm suggesting that we don't.)

martinthomson added the discuss Needs working group discussion label Feb 4, 2025

csharrison mentioned this issue Feb 4, 2025

Spec basic privacy budgeting #79

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid using floating point arithmetic for privacy budget #77

Avoid using floating point arithmetic for privacy budget #77

csharrison commented Feb 3, 2025

martinthomson commented Feb 4, 2025

csharrison commented Feb 4, 2025

martinthomson commented Feb 4, 2025

martinthomson commented Feb 6, 2025

Avoid using floating point arithmetic for privacy budget #77

Avoid using floating point arithmetic for privacy budget #77

Comments

csharrison commented Feb 3, 2025

Footnotes

martinthomson commented Feb 4, 2025

csharrison commented Feb 4, 2025

martinthomson commented Feb 4, 2025

martinthomson commented Feb 6, 2025