-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pFedHN baseline #2377
base: main
Are you sure you want to change the base?
Add pFedHN baseline #2377
Conversation
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
This reverts commit 7372b8d.
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @achiverram28,
Thanks for adding all the new content to the PR. You'll see I made a commit just before my review. It mostly addressed some formatting issues in the README.md
.
Please find below a few comments. They are mostly about how to set the device
for clients and server. I wasn't able to run the code (python3 -m pFedHN.main
) due to device mismatch. My suggestion is to let the client device to be set in a simpler way, then let the user specify the server device from the config.
Please ping me when you have a chance to implement the changes. They shouldn't take long.
Also, as discussed earlier with you, for this baseline the preferable way of showing the results is by means of tables (as in the paper). Including line plots is welcome.
fraction_fit: 0.1 | ||
min_fit_clients: 5 | ||
min_available_clients: 5 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
server_device: cpu | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other baselines introduce this to set the device to be used by the server. Use this in your server.py
instead of utils.get_client()
(that i recommend to remove).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code should be ready to support the server running on either CPU or GPU regardless of the device the clients use. Currently the code doesn't work if clients don't use the gpu (fails in the strategy), when a client uses gpu my attempt to run python3 -m pFedHN.main
failed on the client side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I have given device as torch.device("cuda:0" if torch.cuda.is_available() else "cpu") and with respect to that I have taken care in the rest in the server.py
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @achiverram28.
You'll see I've pushed some changes. They are mostly quite minor. This is the summary of changes:
- moved
num_rounds
andnum_nodes
outside ofclient
in the config. Then renamednum_nodes
tonum_clients
. This primarily has implications inmain.py
andserver.py
- It’s better to set the periodicity at which the federated evaluation (i.e. configure_valuate() and
aggregate_evaluate()
) take place, via the config. Please see the newevaluate_every
passed as input argument to your custom strategy. Note that it’s redundant to have this type of check inaggrgate_evaluate()
since it’s never called ifconfigure_evaluate()
doesn’t return instructions. So I have removed that if statement from insideaggregate_evaluate()
. - Formatting fixes
Please double check if you think looks good.
For the next review could you:
- please ensure all types all defined in public methods and functions.
- I understand you need a custom server class for your baseline. Could you document better
pFedHNServer
? Specially the additional lines added tofit_round()
andevaluate_round()
? - Please include the
FedAvg
results when you have some time. It should be as simple as doing the below. But could you do this reusing your top-levelmain.py
? instead of having (effectively a new project) it all incomparison_experiments/
?- defining a new config file (maybe
conf/fedavg.yaml
?) - launch simulation using the default server.
- the idea (as with other baselines already merged) would be to run the
FedAvg
experiments as:
- defining a new config file (maybe
python -m pFedHN.main --config-name fedavg # which will point to a file in conf/fedavg
Also, i've run the CIFAR-10 experiment four times:
python -m pFedHN.main model.local=True model.variant=1 server.lr=5e-2
and observed quite different results sometimes ( run attempt 1,3 in one machine and 2,4 in another):
- attempt 1: best seen
avg_acc
is0.8926
@ round 4380 - attempt 2: best seen
avg_acc
is0.7406
@ round 1350 -- later rounds go as low as~0.3
- attempt 3: best seen
avg_acc
is0.8831
@ round 4530 - attempt 4: best seen
avg_acc
is0.8841
@ round 4920
| Batch size | 64 | | ||
| Classes per client | 2| | ||
| Total clients | 50 | | ||
| Tocal epoch(client-side) | 50 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth clarifying that this actually means "number of batches". If I see your train()
function in trainer.py
, each "epoch" uses a single batch of the trainloader
. This seems to be exactly what Algorithm 1 in the paper indicates. Maybe it would be better if you rename it to "local step"? (and make that change in the code+config as well)
baselines/pFedHN/pFedHN/trainer.py
Outdated
|
||
# inner updates -> obtaining theta_tilda | ||
for _i in range(epochs): | ||
net.train() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move this outside the for
loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup would do that
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Signed-off-by: achiverram28 <[email protected]>
Hi @achiverram28, Just checking in here. Are you still eager to complete this baseline? Best regards |
Description
Creating a new baseline called pFedHN by reproducing the original paper
Checklist
#contributions
)Any other comments?