You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about the mechanism proposed in your paper.
In the design of the membership inference attack, a requirement is that the adversary must have a reference dataset drawn from the same distribution as the target model's training data. So in the implementation, for any dataset available in SDGYM (e.g. adult, insurance), one sample is used as the adversary's prior information and another is used as the training set for the generative model that produces the synthetic data, the size of each depending on config params sizeRawT and sizeRawA.
In practice, however, when building a generative model, it's beneficial to use the entire dataset available to train to better learn the underlying distribution. It seems that the mechanism proposed hinges on a) using generative models that do not necessarily require large training sets, such as those listed in the configs (BayesNet, PrivBayes) or b) having very large training sets such that GANs or large language models have enough data for stable training after sampling. I'd love to hear your thoughts on this.
Would you also be able to share the config parameters used to generate results for CTGAN and PATEGAN in your paper? Thanks!
The text was updated successfully, but these errors were encountered:
Hello,
I have a question about the mechanism proposed in your paper.
In the design of the membership inference attack, a requirement is that the adversary must have a reference dataset drawn from the same distribution as the target model's training data. So in the implementation, for any dataset available in SDGYM (e.g. adult, insurance), one sample is used as the adversary's prior information and another is used as the training set for the generative model that produces the synthetic data, the size of each depending on config params
sizeRawT
andsizeRawA
.In practice, however, when building a generative model, it's beneficial to use the entire dataset available to train to better learn the underlying distribution. It seems that the mechanism proposed hinges on a) using generative models that do not necessarily require large training sets, such as those listed in the configs (BayesNet, PrivBayes) or b) having very large training sets such that GANs or large language models have enough data for stable training after sampling. I'd love to hear your thoughts on this.
Would you also be able to share the config parameters used to generate results for CTGAN and PATEGAN in your paper? Thanks!
The text was updated successfully, but these errors were encountered: