Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute Attack should report confidence that training set is not more vulnerable than test #166

Open
jim-smith opened this issue May 31, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request waiting This issue is waiting for something else to be completed (see issue for details)

Comments

@jim-smith
Copy link
Contributor

jim-smith commented May 31, 2023

At moment we effectively run a worst-case attack where a simulated attacker has the model which outputs probabilities, and has a record with the target label and with just the value for one feature missing.
A `competent' published model may increase the likelihood that an attacker can estimate the missing value for a record more reliably than they could without the model.

So this uses is, is this risk different for items that were in the training set than it is for the general population?

We assess this risk separately for each attribute - assuming the TRE may set a different risk appetite for each.

Procedure:

  1. Compute the number of vulnerable train and test records ($v_{tr}, v_{te}$ respectively)
  2. Assess the proportion $p_{tr}$ of 'vulnerable' training set items: $p_{tr} = v_{tr}/ $n_{tr}$
  3. Assess the proportion of 'vulnerable' test set items $p_{te} = v_{te}/n_{te}$

Currently we report the ratio of the two fractions$ \frac { p_{tr} }}{p_{te}}$

We should report the probability that the observed differences of proportions is significant

  • using a one tailed test I.e. is the training data more vulnerable

-- some code examples in metrics.py for pdf, or description here

  • Null hypothesis $p_{tr} > p_{te}$
  • pooled proportion $p = \frac{ v_{tr} + v_{te}} / {n_{tr} + n{te}} $
  • standard error $SE = \sqrt{ p * ( 1 - p ) * [ (1/n_{tr}) + (1/n_{te}) ] }
  • test statistic $z = (p1 - p2) / SE $
  • P-value is the probability that the z-score is less than $z$

using norm from scipy.stats,

probability = norm.cdf(z, loc=0,scale=SE)

Then for report we have to decide whether to use 95% or 99% confidence

@jim-smith jim-smith changed the title Attributte Attribute Attack should report confidence that training set is not more vulnerable than test May 31, 2023
@jim-smith jim-smith added the enhancement New feature or request label May 31, 2023
@jim-smith jim-smith added this to the Release 1.0.6 milestone May 31, 2023
@jim-smith jim-smith removed this from the Release 1.0.6 milestone Jul 14, 2023
@jim-smith jim-smith added the waiting This issue is waiting for something else to be completed (see issue for details) label Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request waiting This issue is waiting for something else to be completed (see issue for details)
Projects
None yet
Development

No branches or pull requests

2 participants