Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DP Federated Learning with different samples each round #688

Open
agarciall opened this issue Nov 14, 2024 · 1 comment
Open

DP Federated Learning with different samples each round #688

agarciall opened this issue Nov 14, 2024 · 1 comment

Comments

@agarciall
Copy link

Issue Description

Hello everyone, I hope you're doing well. I'm trying to implement Differential Privacy (DP) with Opacus in a federated learning training structure where, in each round, each client has a different part of the original dataset. This is done to simulate a scenario where clients receive different information over time, for example, in an IDS with IoT data.

I call make_private() in each client round:

# Create DataLoaders for each partition
train_loaders = []
for partition in partitions:
    train_loader = DataLoader(Subset(list(zip(x_train, y_train)), partition), batch_size=64, shuffle=True)
    train_loaders.append(train_loader)

# Training the model with differential privacy for multiple rounds
train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []
epsilon_values = []

start_time = time.time()

for round in range(num_rounds):
    # Get the DataLoader for the current round
    train_loader = train_loaders[round]
    
    # Ensure the model is in training mode
    model.train()

    # Make the model, optimizer, and DataLoader private
    model, optimizer, train_loader = privacy_engine.make_private(
        module=model,
        optimizer=optimizer,
        data_loader=train_loader,
        noise_multiplier=1.0,
        max_grad_norm=1.0,
    )   
    
    # ... [rest of training and validation code, nothing important happens here]

    epsilon = privacy_engine.get_epsilon(1e-5)
    epsilon_values.append(epsilon)
    
    print(f'Round [{round+1}/{num_rounds}], Loss: {train_losses[-1]:.4f}, Val Loss: {val_losses[-1]:.4f}, Train Acc: {train_accuracies[-1]:.4f}, Val Acc: {val_accuracies[-1]:.4f}, Epsilon: {epsilon:.2f}')

When i run the code, it works but I get this Warning:

-Round [2/10], Loss: 3.4578, Val Loss: 2.5558, Train Acc: 0.6266, Val Acc: 0.8665, Epsilon: 0.53
/srv/jupyterhub/jupyter-env/env/lib/python3.12/site-packages/opacus/privacy_engine.py:151: UserWarning: PrivacyEngine detected new dataset object. Was: <torch.utils.data.dataset.Subset object at 0x785883f61520>, got: <torch.utils.data.dataset.Subset object at 0x78587810a690>. Privacy accounting works per dataset, please initialize new PrivacyEngine if you're using different dataset. You can ignore this warning if two datasets above represent the same logical dataset
warnings.warn(

Questions

-Does this mean I can ignore the warning? The data is actually part of the same original dataset, but I'm not sure what it refers to as "logical dataset".
-Do you have any idea of how to solve the "using different data each round" problem?

Thank you very much for your help!

@EnayatUllah
Copy link
Contributor

Thanks for your question! I am a little confused with the implementation and the supposed intention of federated learning; it seems that each round is using a different dataset, whereas what you want is that each client has different dataset right? Does each round correspond to a round in Federated Averaging?

Ignoring the above, regarding your questions about using different datasets in each round.

  1. You can ignore the warning. However, how do you plan to report the final epsilon from per-round epsilon_values? Also, you should get an error since your model input to make_private is an instance of GradSampleModule (i.e. was output of privacy_engine in the the last round). Do you have additional logic in which you do not use the model directly?

  2. About different data in each round -- I think what you have is fine, you can get rid of the warning by setting privacy_engine.dataset = train_loader in each round if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants