Skip to content

Latest commit

 

History

History
55 lines (36 loc) · 2.16 KB

README.md

File metadata and controls

55 lines (36 loc) · 2.16 KB

Hello TensorFlow

Example of using NVIDIA FLARE to train an image classifier using federated averaging (FedAvg) and TensorFlow as the deep learning training framework.

NOTE: This example uses the MNIST handwritten digits dataset and will load its data within the trainer code.

See the Hello TensorFlow example documentation page for details on this example.

To run this example with the FLARE API, you can follow the hello_world notebook, or you can quickly get started with the following:

1. Install NVIDIA FLARE

Follow the Installation instructions to install NVFlare.

Install additional requirements (if you already have a specific version of nvflare installed in your environment, you may want to remove nvflare in the requirements to avoid reinstalling nvflare):

pip3 install tensorflow

2. Run the experiment

Run the script using the job API to create the job and run it with the simulator:

python3 fedavg_script_runner_tf.py

3. Access the logs and results

You can find the running logs and results inside the simulator's workspace:

$ ls /tmp/nvflare/jobs/workdir

Notes on running with GPUs

For running with GPUs, we recommend using NVIDIA TensorFlow docker

If you choose to run the example using GPUs, it is important to note that by default, TensorFlow will attempt to allocate all available GPU memory at the start. In scenarios where multiple clients are involved, you have to prevent TensorFlow from allocating all GPU memory by setting the following flags.

TF_FORCE_GPU_ALLOW_GROWTH=true TF_GPU_ALLOCATOR=cuda_malloc_async

If you possess more GPUs than clients, a good strategy is to run one client on each GPU. This can be achieved using the -gpu argument if using the nvflare simulator command, e.g., nvflare simulator -n 2 -gpu 0,1 [job].