Official tensorflow and keras model support #194

ewenw · 2023-01-13T22:24:27Z

Adds official tensorflow / Keras model training and testing support
Refactors aggregator to be framework-independent - TF and Torch now share the same aggregator and all logic is performed in numpy
Creates a model zoo for Keras models
Adds torch to tf dataset converter
Adds cifar10 and femnist tensorflow FL benchmarks
Misc code style improvements
Removes current async implementation because it is broken by the aggregator change. Our next contribution will be re-write the async simulator and add back the correct functionality.
Adds unit test for main gradient aggregation logic

Checks

I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/
I've made sure the following tests are passing.
Testing Configurations
- Dry Run (20 training rounds & 1 evaluation round)
- Cifar 10 (20 training rounds & 1 evaluation round)
- Femnist (20 training rounds & 1 evaluation round)

fanlai0990 · 2023-01-20T04:15:19Z

Hi, Ewen. Thank you VERY much for your contribution! I have done my first-round review. Please also help to review it so that we can unblock other PRs. @AmberLJC @IKACE Thank you.

AmberLJC · 2023-01-20T19:50:27Z

Thanks for your contribution, Ewen~ This is awesome!

I just has 3 small comments.

Why async aggregation is removed?
I need to specify the package version overrides==3.1.0 to avoid TypeError (does anyone else has the same problem, or it's just me?)
There is an issue for converting numpy to torch tensor (comments left in the code)

ewenw · 2023-01-20T20:11:03Z

Thanks for your contribution, Ewen~ This is awesome!

I just has 3 small comments.

Why async aggregation is removed?

I need to specify the package version overrides==3.1.0 to avoid TypeError (does anyone else has the same problem, or it's just me?)

There is an issue for converting numpy to torch tensor (comments left in the code)

Thanks for your review Amber!

There are two reason: Due to the function changes of the aggregator, the existing async aggregator would require quite some changes to be compatible. Also in our next contribution, we will implement a new async simulation functionalities according to [Async simulation] Implementation idea for task scheduling #174. There will be minor changes to the aggregator and the client to support async.
I did not notice this (perhaps my env setup is a bit different from the recommendation), but will add overrides==3.1.0.
I don't see your comment on this, did you publish it?

AmberLJC · 2023-01-20T20:14:08Z

Thanks for your contribution, Ewen~ This is awesome!
I just has 3 small comments.

Why async aggregation is removed?

I need to specify the package version overrides==3.1.0 to avoid TypeError (does anyone else has the same problem, or it's just me?)

There is an issue for converting numpy to torch tensor (comments left in the code)

Thanks for your review Amber!

There are two reason: Due to the function changes of the aggregator, the existing async aggregator would require quite some changes to be compatible. Also in our next contribution, we will implement a new async simulation functionalities according to [Async simulation] Implementation idea for task scheduling #174. There will be minor changes to the aggregator and the client to support async.

I did not notice this (perhaps my env setup is a bit different from the recommendation), but will add overrides==3.1.0.

I don't see your comment on this, did you publish it?

Make sense
Yea, please
Please check now. (small bug)

fedscale/cloud/internal/torch_model_adapter.py

IKACE · 2023-01-24T05:31:58Z

Hi Ewen, thank you so much for the contribution to FedScale!! I have 2 minor feedbacks:

As Amber mentioned, version issue of overrides package seems to cause some problems. And I can verify that installing overrides==3.1.0 fixes this.
When running tf_femnist.yml config I run into the issue of ValueError: Input 0 of layer "resnet50" is incompatible with the layer: expected shape=(None, 32, 32, 3), found shape=(None, 28, 28, 3). I think the issue comes from that model input size is set fixed to [32, 32, 3] in tensorflow_model_provider.py. However, the Femnist dataset input size is [28, 28, 3]. Maybe we should not hardcode the model input size. Please refer to comment in the code.

fedscale/utils/models/tensorflow_model_provider.py

ewenw · 2023-01-24T17:09:48Z

Hi Ewen, thank you so much for the contribution to FedScale!! I have 2 minor feedbacks:

As Amber mentioned, version issue of overrides package seems to cause some problems. And I can verify that installing overrides==3.1.0 fixes this.

When running tf_femnist.yml config I run into the issue of ValueError: Input 0 of layer "resnet50" is incompatible with the layer: expected shape=(None, 32, 32, 3), found shape=(None, 28, 28, 3). I think the issue comes from that model input size is set fixed to [32, 32, 3] in tensorflow_model_provider.py. However, the Femnist dataset input size is [28, 28, 3]. Maybe we should not hardcode the model input size. Please refer to comment in the code.

Hi @IKACE this is a great point! I've made the TF model input shapes configurable based on the params. Please check my latest commit here.
Just to let you know, I've had to change the dataloader rescaling of FEMNIST to 32x32 instead of 28x28 because many keras models (i.e. mobilenet and resnet) do not support dimensions <32.

The overrides package version has been addressed in an earlier commit.

IKACE · 2023-01-25T07:37:49Z

Hi @IKACE this is a great point! I've made the TF model input shapes configurable based on the params. Please check my latest commit here. Just to let you know, I've had to change the dataloader rescaling of FEMNIST to 32x32 instead of 28x28 because many keras models (i.e. mobilenet and resnet) do not support dimensions <32.

The overrides package version has been addressed in an earlier commit.

Hi Ewen, thank you so much for the update!!

Just one small thing, I think the parser only recognizes the first integer of the new input_shape, which seems to cause a problem. The driver.py may need a minor change for it to work (replace "=" with whitespace on driver.py#L93 ).

ewenw · 2023-01-25T14:45:31Z

Just one small thing, I think the parser only recognizes the first integer of the new input_shape, which seems to cause a problem. The driver.py may need a minor change for it to work (replace "=" with whitespace on driver.py#L93 ).

I just made the change in driver.py. That's strange, because this command works for me python $FEDSCALE_HOME/fedscale/cloud/execution/executor.py --ps_ip=localhost ... --input_shape 32 32 3 --this_rank=1 --num_executors=1 and the parser recognizes all the dimensions.

IKACE · 2023-01-25T15:13:22Z

I just made the change in driver.py. That's strange, because this command works for me python $FEDSCALE_HOME/fedscale/cloud/execution/executor.py --ps_ip=localhost ... --input_shape 32 32 3 --this_rank=1 --num_executors=1 and the parser recognizes all the dimensions.

Thanks the for change! I think previously driver.py is outputing --input_shape=32 32 3 which causes the problem. But now everything looks perfect on my side!!

fanlai0990 · 2023-01-25T15:18:46Z

Thank you so much all! I think we can merge this PR now.

ewenw · 2023-01-25T15:26:32Z

Thank you so much all! I think we can merge this PR now.

Great! Feel free to merge it (I do not have merge access).

fanlai0990 · 2023-01-25T15:29:17Z

Again, thank you so much for your tremendous support!

ewenw added 5 commits January 13, 2023 17:08

Official tensorflow and keras model support

4fd0d61

Move device logic to torch client

d03c637

Address comments

465add2

Styling comments

036cee4

Remove model wrapper base

34385a1

ewenw marked this pull request as ready for review January 13, 2023 22:26

fanlai0990 requested review from fanlai0990 and AmberLJC January 13, 2023 23:40

fanlai0990 mentioned this pull request Jan 13, 2023

Dataloader support for TF #189

Open

AmberLJC reviewed Jan 20, 2023

View reviewed changes

fedscale/cloud/internal/torch_model_adapter.py Outdated Show resolved Hide resolved

ewenw added 2 commits January 23, 2023 10:12

Address minor comments

8e31391

Add minor type fix

e200ae4

IKACE reviewed Jan 24, 2023

View reviewed changes

fedscale/utils/models/tensorflow_model_provider.py Outdated Show resolved Hide resolved

IKACE reviewed Jan 24, 2023

View reviewed changes

fedscale/utils/models/tensorflow_model_provider.py Outdated Show resolved Hide resolved

Make input shape customizable

1d5ff4f

ewenw requested review from IKACE and AmberLJC and removed request for fanlai0990 and IKACE January 24, 2023 17:10

Driver param whitespace

bcae7aa

fanlai0990 merged commit ce64266 into SymbioticLab:master Jan 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Official tensorflow and keras model support #194

Official tensorflow and keras model support #194

ewenw commented Jan 13, 2023 •

edited

Loading

fanlai0990 commented Jan 20, 2023

AmberLJC commented Jan 20, 2023 •

edited

Loading

ewenw commented Jan 20, 2023

AmberLJC commented Jan 20, 2023 •

edited

Loading

IKACE commented Jan 24, 2023

ewenw commented Jan 24, 2023

IKACE commented Jan 25, 2023 •

edited

Loading

ewenw commented Jan 25, 2023 •

edited

Loading

IKACE commented Jan 25, 2023

fanlai0990 commented Jan 25, 2023

ewenw commented Jan 25, 2023

fanlai0990 commented Jan 25, 2023

Official tensorflow and keras model support #194

Official tensorflow and keras model support #194

Conversation

ewenw commented Jan 13, 2023 • edited Loading

Checks

fanlai0990 commented Jan 20, 2023

AmberLJC commented Jan 20, 2023 • edited Loading

ewenw commented Jan 20, 2023

AmberLJC commented Jan 20, 2023 • edited Loading

IKACE commented Jan 24, 2023

ewenw commented Jan 24, 2023

IKACE commented Jan 25, 2023 • edited Loading

ewenw commented Jan 25, 2023 • edited Loading

IKACE commented Jan 25, 2023

fanlai0990 commented Jan 25, 2023

ewenw commented Jan 25, 2023

fanlai0990 commented Jan 25, 2023

ewenw commented Jan 13, 2023 •

edited

Loading

AmberLJC commented Jan 20, 2023 •

edited

Loading

AmberLJC commented Jan 20, 2023 •

edited

Loading

IKACE commented Jan 25, 2023 •

edited

Loading

ewenw commented Jan 25, 2023 •

edited

Loading