diff --git a/docs/_static/css/custom.css b/docs/_static/css/custom.css new file mode 100644 index 0000000000..098d13475f --- /dev/null +++ b/docs/_static/css/custom.css @@ -0,0 +1,3 @@ +.toctree-expand { + display: none; +} diff --git a/docs/about/blogs_publications.md b/docs/about/blogs_publications.md new file mode 100644 index 0000000000..5ef2c27ce8 --- /dev/null +++ b/docs/about/blogs_publications.md @@ -0,0 +1,11 @@ +Blogs & Publications +==================== + +* [Federated learning enables big data for rare cancer boundary detection, Dec 2022](https://www.nature.com/articles/s41467-022-33407-5) +* [How OpenFL Can Boost Your Federated Learning Project, 2022](https://www.intel.com/content/www/us/en/developer/articles/technical/how-openfl-boost-your-federated-learning-project.html) +* [OpenFL: the open federated learning library, Oct 2022](https://iopscience.iop.org/article/10.1088/1361-6560/ac97d9/pdf) +* [Federated Learning With OpenFL for Microservices Applications, Aug 2022](https://blogs.vmware.com/opensource/2022/08/31/federated-learning-with-openfl-for-microservices-applications-2/) +* [A Path Towards Secure Federated Learning, Apr 2022](https://medium.com/openfl/a-path-towards-secure-federated-learning-c2fb16d5e66e) +* [Go Federated with OpenFL: Put your Deep Learning pipeline on Federated rails, Oct 2021](https://towardsdatascience.com/go-federated-with-openfl-8bc145a5ead1) +* [Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data, Jul 2020](https://www.nature.com/articles/s41598-020-69250-1) + diff --git a/docs/about/features.rst b/docs/about/features.rst new file mode 100644 index 0000000000..622fff24a6 --- /dev/null +++ b/docs/about/features.rst @@ -0,0 +1,113 @@ +.. # Copyright (C) 2020-2024 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +========== +Features +========== + +.. _running_a_federation: + +--------------------- +Running a Federation +--------------------- + +|productName| has multiple options for setting up a federation and running experiments, depending on the users needs. + +Task Runner + Define an experiment and distribute it manually. All participants can verify model code and FL plan prior to execution. + The federation is terminated when the experiment is finished. Formerly known as the aggregator-based workflow. + `For more info `_ + + .. toctree:: + :hidden: + + features_index/taskrunner + +Interactive + Setup long-lived components to run many experiments in series. Recommended for FL research when many changes to model, dataloader, or hyperparameters are expected. + Formerly known as the director-based workflow. + `For more info `_ + + .. toctree:: + :hidden: + + features_index/interactive + +Workflow Interface (Experimental) + Formulate the experiment as a series of tasks, or a flow. Every flow begins with the start task and concludes with end. + Heavily influenced by the interface and design of Netflix's Metaflow, the popular framework for data scientists. + `For more info `_ + + .. toctree:: + :hidden: + + features_index/workflowinterface + +.. _aggregation_algorithms: + +----------------------- +Aggregation Algorithms +----------------------- + +FedAvg + Paper: `McMahan et al., 2017 `_ + Default aggregation algorithm in |productName|. Multiplies local model weights with relative data size and averages this multiplication result. + +FedProx + Paper: `Li et al., 2020 `_ + + FedProx in |productName| is implemented as a custom optimizer for PyTorch/TensorFlow. In order to use FedProx, do the following: + + 1. PyTorch: + + - replace your optimizer with SGD-based :class:`openfl.utilities.optimizers.torch.FedProxOptimizer` + or Adam-based :class:`openfl.utilities.optimizers.torch.FedProxAdam`. + Also, you should save model weights for the next round via calling `.set_old_weights()` method of the optimizer + before the training epoch. + + 2. TensorFlow: + + - replace your optimizer with SGD-based :py:class:`openfl.utilities.optimizers.keras.FedProxOptimizer`. + + For more details, see :code:`../openfl-tutorials/Federated_FedProx_*_MNIST_Tutorial.ipynb` where * is the framework name. + +FedOpt + Paper: `Reddi et al., 2020 `_ + + FedOpt in |productName|: :ref:`adaptive_aggregation_functions` + +FedCurv + Paper: `Shoham et al., 2019 `_ + + Requires PyTorch >= 1.9.0. Other frameworks are not supported yet. + + Use :py:class:`openfl.utilities.fedcurv.torch.FedCurv` to override train function using :code:`.get_penalty()`, :code:`.on_train_begin()`, and :code:`.on_train_end()` methods. + In addition, you should override default :code:`AggregationFunction` of the train task with :class:`openfl.interface.aggregation_functions.FedCurvWeightedAverage`. + See :code:`PyTorch_Histology_FedCurv` tutorial in :code:`../openfl-tutorials/interactive_api` directory for more details. + +.. _federated_evaluation: + +--------------------- +Federated Evaluation +--------------------- + +Evaluate the accuracy and performance of your model on data distributed across decentralized nodes without comprimising data privacy and security. `For more info `_ + +.. toctree:: + :hidden: + + features_index/fed_eval + +.. _privacy_meter: + +--------------------- +Privacy Meter +--------------------- + +Quantitatively audit data privacy in statistical and machine learning algorithms. `For more info `_ + +.. toctree:: + :hidden: + + features_index/privacy_meter + \ No newline at end of file diff --git a/docs/federated_evaluation.rst b/docs/about/features_index/fed_eval.rst similarity index 52% rename from docs/federated_evaluation.rst rename to docs/about/features_index/fed_eval.rst index b38687d269..89507cd8fd 100644 --- a/docs/federated_evaluation.rst +++ b/docs/about/features_index/fed_eval.rst @@ -1,15 +1,18 @@ -Federated Evaluation with OpenFL -================================= +.. # Copyright (C) 2020-2024 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +Federated Evaluation with |productName| +======================================= Introduction to Federated Evaluation ------------------------------------- Model evaluation is an essential part of the machine learning development cycle. In a traditional centralized learning system, all evaluation data is collected on a localized server. Because of this, centralized evaluation of machine learning models is a fairly straightforward task. However, in a federated learning system, data is distributed across multiple decentralized devices or nodes. In an effort to preserve the security and privacy of the distributed data, it is infeasible to simply aggregate all the data into a centralized system. Federated evaluation offers a solution by assessing the model at the client side and aggregating the accuracy without ever having to share the data. This is crucial for ensuring the model's effectiveness and reliability in diverse and real-world environments while respecting privacy and data locality -OpenFL's Support for Federated Evaluation ------------------------------------------ +|productName|'s Support for Federated Evaluation +------------------------------------------------- -OpenFL, a flexible framework for Federated Learning, has the capability to perform federated evaluation by modifying the federation plan. In this document, we will show how OpenFL can facilitate this process through its task runner API (aggregator-based workflow), where the model evaluation is distributed across various collaborators before being sent to the aggregator. For the task runner API, this involves minor modifications to the ``plan.yaml`` file, which defines the workflow and tasks for the federation. In particular, the federation plan should be defined to run for one forward pass and perform only aggregated model validation +|productName|, a flexible framework for Federated Learning, has the capability to perform federated evaluation by modifying the federation plan. In this document, we will show how OpenFL can facilitate this process through its task runner API (aggregator-based workflow), where the model evaluation is distributed across various collaborators before being sent to the aggregator. For the task runner API, this involves minor modifications to the ``plan.yaml`` file, which defines the workflow and tasks for the federation. In particular, the federation plan should be defined to run for one forward pass and perform only aggregated model validation In general pipeline is as follows: @@ -19,7 +22,7 @@ In general pipeline is as follows: 4. **Aggregation**: The aggregator collects and aggregates these metrics to assess overall model performance. Example Using the Task Runner API (Aggregator-based Workflow) -------------------------------------------------------------------- +-------------------------------------------------------------- To demonstrate usage of the task runner API (aggregator-based workflow) for federated evaluation, consider the `Hello Federation example `_. This sample script creates a simple federation with two collaborator nodes and one aggregator node, and executes based on a user specified workspace template. We provide a ``torch_cnn_mnist_fed_eval`` template, which is a federated evaluation template adapted from ``torch_cnn_mnist``. @@ -28,72 +31,10 @@ This script can be directly executed as follows: .. code-block:: console python test_hello_federation.py --template torch_cnn_mnist_fed_eval - + In order to adapt this template for federated evaluation, the following modifications were made to ``plan.yaml``: -.. code-block:: yaml - - # Copyright (C) 2020-2023 Intel Corporation - # Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. - - aggregator : - defaults : plan/defaults/aggregator.yaml - template : openfl.component.Aggregator - settings : - init_state_path : save/torch_cnn_mnist_init.pbuf - best_state_path : save/torch_cnn_mnist_best.pbuf - last_state_path : save/torch_cnn_mnist_last.pbuf - ######################## - rounds_to_train : 1 - ######################## - log_metric_callback : - template : src.mnist_utils.write_metric - - collaborator : - defaults : plan/defaults/collaborator.yaml - template : openfl.component.Collaborator - settings : - delta_updates : false - opt_treatment : RESET - - data_loader : - defaults : plan/defaults/data_loader.yaml - template : src.ptmnist_inmemory.PyTorchMNISTInMemory - settings : - collaborator_count : 2 - data_group_name : mnist - batch_size : 256 - - task_runner : - defaults : plan/defaults/task_runner.yaml - template : src.pt_cnn.PyTorchCNN - - network : - defaults : plan/defaults/network.yaml - - assigner : - ######################## - template : openfl.component.RandomGroupedAssigner - settings : - task_groups : - - name : validate - percentage : 1.0 - tasks : - - aggregated_model_validation - ######################## - - tasks : - ######################## - aggregated_model_validation: - function : validate - kwargs : - apply : global - metrics : - - acc - ######################## - - compression_pipeline : - defaults : plan/defaults/compression_pipeline.yaml +.. literalinclude:: ../../../openfl-workspace/torch_cnn_mnist_fed_eval/plan/plan.yaml Key Changes for Federated Evaluation: @@ -107,4 +48,4 @@ This sample script will create a federation based on the `torch_cnn_mnist_fed_ev --- -Congratulations, you have successfully performed federated evaluation across two decentralized collaborator nodes. +Congratulations, you have successfully performed federated evaluation across two decentralized collaborator nodes. \ No newline at end of file diff --git a/docs/running_the_federation.rst b/docs/about/features_index/interactive.rst similarity index 51% rename from docs/running_the_federation.rst rename to docs/about/features_index/interactive.rst index efd2a0fe40..cc01f67c58 100644 --- a/docs/running_the_federation.rst +++ b/docs/about/features_index/interactive.rst @@ -1,1257 +1,673 @@ -.. # Copyright (C) 2020-2023 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -.. _running_the_federation: - -****************** -Run the Federation -****************** - -OpenFL currently offers two ways to set up and run experiments with a federation: the Director-based workflow and Aggregator-based workflow. The Director-based workflow introduces a new and more convenient way to set up a federation and brings "long-lived" components in a federation ("Director" and "Envoy"), while the Aggregator-based workflow is advised for scenarios where the workload needs to be verified prior to execution. - -`Director-Based Workflow`_ - Setup long-lived components to run many experiments in series. Recommended for FL research when many changes to model, dataloader, or hyperparameters are expected - - -`Aggregator-Based Workflow`_ - Define an experiment and distribute it manually. All participants can verify model code and :ref:`FL plan ` prior to execution. The federation is terminated when the experiment is finished - - -.. _director_workflow: - - -Director-Based Workflow -======================= - -A director-based workflow uses long-lived components in a federation. These components continue to be available to distribute more experiments in the federation. - -- The *Director* is the central node of the federation. This component starts an *Aggregator* for each experiment, sends data to connected collaborator nodes, and provides updates on the status. -- The *Envoy* runs on collaborator nodes connected to the *Director*. When the *Director* starts an experiment, the *Envoy* starts the *Collaborator* to train the global model. - - -The director-based workflow comprises the following roles and their tasks: - - - `Director Manager: Set Up the Director`_ - - `Collaborator Manager: Set Up the Envoy`_ - - `Experiment Manager: Describe an Experiment`_ - -Follow the procedure in the director-based workflow to become familiar with the setup required and APIs provided for each role in the federation: *Experiment manager (Data scientist)*, *Director manager*, and *Collaborator manager*. - -- *Experiment manager* (or Data scientist) is a person or group of people using OpenFL. -- *Director Manager* is ML model creator's representative controlling Director. -- *Collaborator manager* is Data owner's representative controlling Envoy. - -.. note:: - The Open Federated Learning (|productName|) interactive Python API enables the Experiment manager (data scientists) to define and start a federated learning experiment from a single entry point: a Jupyter\*\ notebook or a Python\*\ script. - - See `Interactive Python API (Beta)`_ for details. - -An overview of this workflow is shown below. - -.. figure:: ./source/openfl/director_workflow.svg - -.. centered:: Overview of the Director-Based Workflow - - -.. # Copyright (C) 2020-2023 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - - -.. _establishing_federation_director: - -Director Manager: Set Up the Director -------------------------------------- - -The *Director manager* sets up the *Director*, which is the central node of the federation. - - - :ref:`plan_agreement_director` - - :ref:`optional_step_create_pki_using_step_ca` - - :ref:`step0_install_director_prerequisites` - - :ref:`step1_start_the_director` - -.. _plan_agreement_director: - -OPTIONAL STEP: Director's Plan Agreement -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -In order to carry out a secure federation, the Director must approve the FL Plan before starting the experiment. This check could be enforced with the use of the setting :code:`review_experiment: True` in director config. Refer to **director_config_review_exp.yaml** file under **PyTorch_Histology** interactive API example. -After the Director approves the experiment, it starts the aggregator and sends the experiment archive to all the participanting Envoys for review. -On the other hand, if the Director rejects the experiment, the experiment is aborted right away, no aggregator is started and the Envoys don't receive the experiment archive at all. - -.. _optional_step_create_pki_using_step_ca: - -OPTIONAL STEP: Create PKI Certificates Using Step-CA -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The use of mutual Transport Layer Security (mTLS) is recommended for deployments in untrusted environments to establish participant identity and to encrypt communication. You may either import certificates provided by your organization or generate certificates with the :ref:`semi-automatic PKI ` provided by |productName|. - -.. _step0_install_director_prerequisites: - -STEP 1: Install Open Federated Learning (|productName|) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Install |productName| in a virtual Python\*\ environment. See :ref:`install_package` for details. - -.. _step1_start_the_director: - -STEP 2: Start the Director -^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Start the Director on a node with at least two open ports. See :ref:`openfl_ll_components` to learn more about the Director entity. - -1. Create a Director workspace with a default config file. - - .. code-block:: console - - fx director create-workspace -p path/to/director_workspace_dir - - This workspace will contain received experiments and supplementary files (Director config file and certificates). - -2. Modify the Director config file according to your federation setup. - - The default config file contains the Director node FQDN, an open port, path of certificates, and :code:`sample_shape` and :code:`target_shape` fields with string representation of the unified data interface in the federation. - -3. Start the Director. - - If mTLS protection is not set up, run this command. - - .. code-block:: console - - fx director start --disable-tls -c director_config.yaml - - If you have a federation with PKI certificates, run this command. - - .. code-block:: console - - fx director start -c director_config.yaml \ - -rc cert/root_ca.crt \ - -pk cert/priv.key \ - -oc cert/open.crt - - - -.. _establishing_federation_envoy: - -Collaborator Manager: Set Up the Envoy --------------------------------------- - -The *Collaborator manager* sets up the *Envoys*, which are long-lived components on collaborator nodes. When started, Envoys will try to connect to the Director. Envoys receive an experiment archive and provide access to local data. - - - :ref:`plan_agreement_envoy` - - :ref:`optional_step_sign_pki_envoy` - - :ref:`step0_install_envoy_prerequisites` - - :ref:`step1_start_the_envoy` - -.. _plan_agreement_envoy: - -OPTIONAL STEP: Envoy's Plan Agreement -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -In order to carry out a secure federation, each of the Envoys must approve the experiment before it is started, after the Director's approval. This check could be enforced with the use of the parameter :code:`review_experiment: True` in envoy config. Refer to **envoy_config_review_exp.yaml** file under **PyTorch_Histology** interactive API example. -If any of the Envoys rejects the experiment, a :code:`set_experiment_failed` request is sent to the Director to stop the aggregator. - -.. _optional_step_sign_pki_envoy: - -OPTIONAL STEP: Sign PKI Certificates (Optional) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The use of mTLS is recommended for deployments in untrusted environments to establish participant identity and to encrypt communication. You may either import certificates provided by your organization or use the :ref:`semi-automatic PKI certificate ` provided by |productName|. - - -.. _step0_install_envoy_prerequisites: - -STEP 1: Install |productName| -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Install |productName| in a Python\*\ virtual environment. See :ref:`install_package` for details. - - -.. _step1_start_the_envoy: - -STEP 2: Start the Envoy -^^^^^^^^^^^^^^^^^^^^^^^ - -1. Create an Envoy workspace with a default config file and shard descriptor Python\*\ script. - - .. code-block:: console - - fx envoy create-workspace -p path/to/envoy_workspace_dir - -2. Modify the Envoy config file and local shard descriptor template. - - - Provide the settings field with the arbitrary settings required to initialize the shard descriptor. - - Complete the shard descriptor template field with the address of the local shard descriptor class. - - .. note:: - The shard descriptor is an object to provide a unified data interface for FL experiments. - The shard descriptor implements :code:`get_dataset()` method as well as several additional - methods to access **sample shape**, **target shape**, and **shard description** that may be used to identify - participants during experiment definition and execution. - - :code:`get_dataset()` method accepts the dataset_type (for instance train, validation, query, gallery) and returns - an iterable object with samples and targets. - - User's implementation of ShardDescriptor should be inherented from :code:`openfl.interface.interactive_api.shard_descriptor.ShardDescriptor`. It should implement :code:`get_dataset`, :code:`sample_shape` and :code:`target_shape` methods to describe the way data samples and labels will be loaded from disk during training. - -3. Start the Envoy. - - If mTLS protection is not set up, run this command. - - .. code-block:: console - - ENVOY_NAME=envoy_example_name - - fx envoy start \ - -n "$ENVOY_NAME" \ - --disable-tls \ - --envoy-config-path envoy_config.yaml \ - -dh director_fqdn \ - -dp port - - If you have a federation with PKI certificates, run this command. - - .. code-block:: console - - ENVOY_NAME=envoy_example_name - - fx envoy start \ - -n "$ENVOY_NAME" \ - --envoy-config-path envoy_config.yaml \ - -dh director_fqdn \ - -dp port \ - -rc cert/root_ca.crt \ - -pk cert/"$ENVOY_NAME".key \ - -oc cert/"$ENVOY_NAME".crt - - -.. _establishing_federation_experiment_manager: - -Experiment Manager: Describe an Experiment ------------------------------------------- - -The process of defining an experiment is decoupled from the process of establishing a federation. -The Experiment manager (or data scientist) is able to prepare an experiment in a Python environment. -Then the Experiment manager registers experiments into the federation using `Interactive Python API (Beta)`_ -that is allow to communicate with the Director using a gRPC client. - - -.. _interactive_python_api: - -Interactive Python API (Beta) ------------------------------ - -The Open Federated Learning (|productName|) interactive Python API enables the Experiment manager (data scientists) to define and start a federated learning experiment from a single entry point: a Jupyter\*\ notebook or a Python script. - - - `Prerequisites`_ - - `Define a Federated Learning Experiment`_ - - `Federation API`_ - - `Experiment API`_ - - `Start an FL Experiment`_ - - -.. _prerequisites: - -Prerequisites -^^^^^^^^^^^^^ - -The Experiment manager requires the following: - -Python Intepreter - Create a virtual Python environment with packages required for conducting the experiment. The Python environment is replicated on collaborator nodes. - -A Local Experiment Workspace - Initialize a workspace by creating an empty directory and placing inside the workspace a Jupyter\*\ notebook or a Python script. - - Items in the workspace may include: - - - source code of objects imported into the notebook from local modules - - local test data stored in a **data** directory - - certificates stored in a **cert** directory - - .. note:: - - This workspace will be archived and transferred to collaborator nodes. Ensure only relevant source code or resources are stored in the workspace. - **data** and **cert** directories will not be included in the archive. - - -.. _federation_api_define_fl_experiment: - -Define a Federated Learning Experiment -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The definition process of a federated learning experiment uses the interactive Python API to set up several interface entities and experiment parameters. - -The following are the interactive Python API to define an experiment: - - - `Federation API`_ - - `Experiment API`_ - - `Start an FL Experiment`_ - - `Observe the Experiment Execution`_ - -.. note:: - Each federation is bound to some Machine Learning problem in a sense that all collaborators dataset shards should allow to solve the same data science problem. - For example object detection and semantic segmentation problems should be solved in different federations. \ - - -.. _federation_api: - -Federation API -"""""""""""""" - -The *Federation* entity is designed to be a bridge between a notebook and *Director*. - - -1. Import the Federation class from openfl package - - .. code-block:: python - - from openfl.interface.interactive_api.federation import Federation - - -2. Initialize the Federation object with the Director node network address and encryption settings. - - .. code-block:: python - - federation = Federation( - client_id: str, director_node_fqdn: str, director_port: str - tls: bool, cert_chain: str, api_cert: str, api_private_key: str) - - .. note:: - You may disable mTLS in trusted environments or enable mTLS by providing paths to the certificate chain of the API authority, aggregator certificate, and a private key. - - -.. note:: - Methods available in the Federation API: - - - :code:`get_dummy_shard_descriptor`: creates a dummy shard descriptor for debugging the experiment pipeline - - :code:`get_shard_registry`: returns information about the Envoys connected to the Director and their shard descriptors - -.. _experiment_api: - -Experiment API -"""""""""""""" - -The *Experiment* entity registers training-related objects, federated learning (FL) tasks, and settings. - -1. Import the FLExperiment class from openfl package - - .. code-block:: python - - from openfl.interface.interactive_api.experiment import FLExperiment - -2. Initialize the experiment with the following parameters: a federation object and a unique experiment name. - - .. code-block:: python - - fl_experiment = FLExperiment(federation: Federation, experiment_name: str) - -3. Import these supplementary interface classes: :code:`TaskInterface`, :code:`DataInterface`, and :code:`ModelInterface`. - - .. code-block:: python - - from openfl.interface.interactive_api.experiment import TaskInterface, DataInterface, ModelInterface - - -.. _experiment_api_modelinterface: - -Register the Model and Optimizer ( :code:`ModelInterface` ) - -Instantiate and initialize a model and optimizer in your preferred deep learning framework. - - .. code-block:: python - - from openfl.interface.interactive_api.experiment import ModelInterface - MI = ModelInterface(model, optimizer, framework_plugin: str) - -The initialized model and optimizer objects should be passed to the :code:`ModelInterface` along with the path to correct Framework Adapter plugin inside the |productName| package -or from local workspace. - -.. note:: - The |productName| interactive API supports *TensorFlow* and *PyTorch* frameworks via existing plugins. - User can add support for other deep learning frameworks via the plugin interface and point to your implementation of a :code:`framework_plugin` in :code:`ModelInterface`. - - -.. _experiment_api_taskinterface: - -Register FL Tasks ( :code:`TaskInterface` ) - -An FL task accepts the following objects: - - - :code:`model` - will be rebuilt with relevant weights for every task by `TaskRunner` - - :code:`data_loader` - data loader that will provide local data - - :code:`device` - a device to be used for execution on collaborator machines - - :code:`optimizer` (optional) - model optimizer; only for training tasks - -Register an FL task and accompanying information. - - .. code-block:: python - - TI = TaskInterface() - - task_settings = { - 'batch_size': 32, - 'some_arg': 228, - } - @TI.add_kwargs(**task_settings) - @TI.register_fl_task(model='my_model', data_loader='train_loader', - device='device', optimizer='my_Adam_opt') - def foo(my_model, train_loader, my_Adam_opt, device, batch_size, some_arg=356): - # training or validation logic - ... - -FL tasks return a dictionary object with metrics: :code:`{metric name: metric value for this task}`. - -.. note:: - The |productName| interactive API currently allows registering only standalone functions defined in the main module or imported from other modules inside the workspace. - - The :code:`TaskInterface` class must be instantiated before you can use its methods to register FL tasks. - - - :code:`@TI.register_fl_task()` needs tasks argument names for :code:`model`, :code:`data_loader`, :code:`device` , and :code:`optimizer` (optional) that constitute a *task contract*. This method adds the callable and the task contract to the task registry. - - :code:`@TI.add_kwargs()` should be used to set up arguments that are not included in the contract. - - -.. _experiment_api_datainterface: - -Register Federated Data Loader ( :code:`DataInterface` ) - -A *shard descriptor* defines how to read and format the local data. Therefore, the *data loader* contains the batching and augmenting data logic, which are common for all collaborators. - -Subclass :code:`DataInterface` and implement the following methods. - - .. code-block:: python - - class CustomDataLoader(DataInterface): - def __init__(self, **kwargs): - # Initialize superclass with kwargs: this array will be passed - # to get_data_loader methods - super().__init__(**kwargs) - # Set up augmentation, save required parameters, - # use it as you regular dataset class - validation_fraction = kwargs.get('validation_fraction', 0.5) - ... - - @property - def shard_descriptor(self): - return self._shard_descriptor - - @shard_descriptor.setter - def shard_descriptor(self, shard_descriptor): - self._shard_descriptor = shard_descriptor - # You can implement data splitting logic here - # Or update your data set according to local Shard Descriptor atributes if required - - def get_train_loader(self, **kwargs): - # these are the same kwargs you provided to __init__, - # But passed on a collaborator machine - bs = kwargs.get('train_batch_size', 32) - return foo_loader() - - # so on, see the full list of methods below - - -The following are shard descriptor setter and getter methods: - - - :code:`shard_descriptor(self, shard_descriptor)` is called during the *Collaborator* initialization procedure with the local shard descriptor. Include in this method any logic that is triggered with the shard descriptor replacement. - - :code:`get_train_loader(self, **kwargs)` is called before the execution of training tasks. This method returns the outcome of the training task according to the :code:`data_loader` contract argument. The :code:`kwargs` dict returns the same information that was provided during the :code:`DataInterface` initialization. - - :code:`get_valid_loader(self, **kwargs)` is called before the execution of validation tasks. This method returns the outcome of the validation task according to the :code:`data_loader` contract argument. The :code:`kwargs` dict returns the same information that was provided during the :code:`DataInterface` initialization. - - :code:`get_train_data_size(self)` returns the number of samples in the local dataset for training. Use the information provided by the shard descriptor to determine how to split your training and validation tasks. - - :code:`get_valid_data_size(self)` returns the number of samples in the local dataset for validation. - - -.. note:: - - - The *User Dataset* class should be instantiated to pass further to the *Experiment* object. - - Dummy *shard descriptor* (or a custom local one) may be set up to test the augmentation or batching pipeline. - - Keyword arguments used during initialization on the frontend node may be used during dataloaders construction on collaborator machines. - - - -.. _federation_api_start_fl_experiment: - -Start an FL Experiment -^^^^^^^^^^^^^^^^^^^^^^ - -Use the Experiment API to prepare a workspace archive to transfer to the *Director*. - - .. code-block:: python - - FLExperiment.start() - - .. note:: - Instances of interface classes :code:`(TaskInterface, DataInterface, ModelInterface)` must be passed to :code:`FLExperiment.start()` method along with other parameters. - - This method: - - - Compiles all provided settings to a Plan object. The Plan is the central place where all actors in federation look up their parameters. - - Saves **plan.yaml** to the :code:`plan` folder inside the workspace. - - Serializes interface objects on the disk. - - Prepares **requirements.txt** for remote Python environment setup. - - Compresses the whole workspace to an archive. - - Sends the experiment archive to the *Director* so it may distribute the archive across the federation and start the *Aggregator*. - -FLExperiment :code:`start()` Method Parameters -"""""""""""""""""""""""""""""""""""""""""""""" - -The following are parameters of the :code:`start()` method in FLExperiment: - -:code:`model_provider` - This parameter is defined earlier by the :code:`ModelInterface` object. - -:code:`task_keeper` - This parameter is defined earlier by the :code:`TaskInterface` object. - -:code:`data_loader` - This parameter is defined earlier by the :code:`DataInterface` object. - -:code:`task_assigner` - This parameter is optional. You can pass a `Custom task assigner function`_. - -:code:`rounds_to_train` - This parameter defines the number of aggregation rounds needed to be conducted before the experiment is considered finished. - -:code:`delta_updates` - This parameter sets up the aggregation to use calculated gradients instead of model checkpoints. - -:code:`opt_treatment` - This parameter defines the optimizer state treatment in the federation. The following are available values: - - - **RESET**: the optimizer state is initialized each round from noise - - **CONTINUE_LOCAL**: the optimizer state will be reused locally by every collaborator - - **CONTINUE_GLOBAL**: the optimizer's state will be aggregated - -:code:`device_assignment_policy` - The following are available values: - - - **CPU_ONLY**: the :code:`device` parameter (which is a part of a task contract) that is passed to an FL task each round will be **cpu** - - **CUDA_PREFFERED**: the :code:`device` parameter will be **cuda:{index}** if CUDA devices are enabled in the Envoy config and **cpu** otherwise. - - -.. _federation_api_observe_fl_experiment: - -Observe the Experiment Execution -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -If the experiment was accepted by the *Director*, you can oversee its execution with the :code:`FLexperiment.stream_metrics()` method. This method prints metrics from the FL tasks (and saves TensorBoard logs). - -.. _federation_api_get_fl_experiment_status: - -Get Experiment Status -^^^^^^^^^^^^^^^^^^^^^ - -You can get the current experiment status with the :code:`FLexperiment.get_experiment_status()` method. The status could be pending, in progress, finished, rejected or failed. - -.. _federation_api_complete_fl_experiment: - -Complete the Experiment -^^^^^^^^^^^^^^^^^^^^^^^ - -When the experiment has completed: - - - retrieve trained models in the native format using :code:`FLexperiment.get_best_model()` and :code:`FLexperiment.get_last_model()`. - - erase experiment artifacts from the Director with :code:`FLexperiment.remove_experiment_data()`. - - -You may use the same federation object to report another experiment or even schedule several experiments that will be executed in series. - -Custom task assigner function -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -OpenFL has an entity named Task Assigner, that responsible for aggregator task assigning to collaborators. -There are three default tasks that are used: :code:`train`, :code:`locally_tuned_model_validate`, -:code:`aggregated_model_validate`. -When you register a train function and pass optimizer it generates a train task: - - .. code-block:: python - - task_keeper = TaskInterface() - - - @task_keeper.register_fl_task(model='net_model', data_loader='train_loader', - device='device', optimizer='optimizer') - def train(net_model, train_loader, optimizer, device, loss_fn=cross_entropy, some_parameter=None): - torch.manual_seed(0) - ... - -When you register a validate function, it generates two tasks: :code:`locally_tuned_model_validate` and -:code:`aggregated_model_validate`. -:code:`locally_tuned_model_validate` is applied by collaborator to locally trained model, -:code:`aggregated_model_validate` - to a globally aggregated model. -If there not a train task only aggregated_model_validate are generated. - -Since 1.3 version it is possible to create a custom task assigner function to implement your own task assigning logic. -You can get registered task from :code:`task_keeper` calling method :code:`get_registered_tasks`: - - .. code-block:: python - - tasks = task_keeper.get_registered_tasks() - - -And then implement your own assigner function: - - .. code-block:: python - - def random_assigner(collaborators, round_number, **kwargs): - """Assigning task groups randomly while ensuring target distribution""" - import random - random.shuffle(collaborators) - collaborator_task_map = {} - for idx, col in enumerate(collaborators): - # select only 70% collaborators for training and validation, 30% for validation - if (idx+1)/len(collaborators) <= 0.7: - collaborator_task_map[col] = tasks.values() # all three tasks - else: - collaborator_task_map[col] = [tasks['aggregated_model_validate']] - return collaborator_task_map - -And then pass that function to fl_experiment start method: - .. code-block:: python - - fl_experiment.start( - model_provider=model_interface, - task_keeper=task_keeper, - data_loader=fed_dataset, - task_assigner=random_assigner, - rounds_to_train=50, - opt_treatment='CONTINUE_GLOBAL', - device_assignment_policy='CUDA_PREFERRED' - ) - - -It will be passed to assigner and tasks will be assigned to collaborators by using this function. - -Another example. -If you want only exclude some collaborators from experiment, you can define next assigner function: - - .. code-block:: python - - def filter_assigner(collaborators, round_number, **kwargs): - collaborator_task_map = {} - exclude_collaborators = ['env_two', 'env_three'] - for collaborator_name in collaborators: - if collaborator_name in exclude_collaborators: - continue - collaborator_task_map[collaborator_name] = [ - tasks['train'], - tasks['locally_tuned_model_validate'], - tasks['aggregated_model_validate'] - ] - return collaborator_task_map - - -Also you can use static shard information to exclude any collaborators without cuda devices from training: - - .. code-block:: python - - shard_registry = federation.get_shard_registry() - def filter_by_shard_registry_assigner(collaborators, round_number, **kwargs): - collaborator_task_map = {} - for collaborator in collaborators: - col_status = shard_registry.get(collaborator) - if not col_status or not col_status['is_online']: - continue - node_info = col_status['shard_info'].node_info - # Assign train task if collaborator has GPU with total memory more that 8 GB - if len(node_info.cuda_devices) > 0 and node_info.cuda_devices[0].memory_total > 8 * 1024**3: - collaborator_task_map[collaborator] = [ - tasks['train'], - tasks['locally_tuned_model_validate'], - tasks['aggregated_model_validate'], - ] - else: - collaborator_task_map[collaborator] = [ - tasks['aggregated_model_validate'], - ] - return collaborator_task_map - - -Assigner with additional validation round: - - .. code-block:: python - - rounds_to_train = 3 - total_rounds = rounds_to_train + 1 # use fl_experiment.start(..., rounds_to_train=total_rounds,...) - - def assigner_with_last_round_validation(collaborators, round_number, **kwargs): - collaborator_task_map = {} - for collaborator in collaborators: - if round_number == total_rounds - 1: - collaborator_task_map[collaborator] = [ - tasks['aggregated_model_validate'], - ] - else: - collaborator_task_map[collaborator] = [ - tasks['train'], - tasks['locally_tuned_model_validate'], - tasks['aggregated_model_validate'] - ] - return collaborator_task_map - - -.. _running_the_federation_aggregator_based: - -Aggregator-Based Workflow -========================= - -An overview of this workflow is shown below. - -.. figure:: /images/openfl_flow.png - -.. centered:: Overview of the Aggregator-Based Workflow - -There are two ways to run federation without Director: - -- `Bare Metal Approach`_ -- `Docker Approach`_ - - -This workflow uses short-lived components in a federation, which is terminated when the experiment is finished. The components are as follows: - -- The *Collaborator* uses a local dataset to train a global model and the *Aggregator* receives model updates from *Collaborators* and aggregates them to create the new global model. -- The *Aggregator* is framework-agnostic, while the *Collaborator* can use any deep learning frameworks, such as `TensorFlow `_\* \ or `PyTorch `_\*\. - - -For this workflow, you modify the federation workspace to your requirements by editing the Federated Learning plan (FL plan) along with the Python\*\ code that defines the model and the data loader. The FL plan is a `YAML `_ file that defines the collaborators, aggregator, connections, models, data, and any other parameters that describe the training. - - -.. _plan_settings: - - -Federated Learning Plan (FL Plan) Settings ------------------------------------------- - -.. note:: - Use the Federated Learning plan (FL plan) to modify the federation workspace to your requirements in an **aggregator-based workflow**. - - -In order for participants to agree to take part in an experiment, everyone should know ahead of time both what code is going to run on their infrastructure and exactly what information on their system will be accessed. The federated learning (FL) plan aims to capture all of this information needed to decide whether to participate in an experiment, in addition to runtime details needed to load the code and make remote connections. -The FL plan is described by the **plan.yaml** file located in the **plan** directory of the workspace. - -Configurable Settings -^^^^^^^^^^^^^^^^^^^^^ - -- :class:`Aggregator ` - `openfl.component.Aggregator `_ - Defines the settings for the aggregator which is the model-owner in the experiment. While models can be trained from scratch, in many cases the federation performs fine-tuning of a previously trained model. For this reason, pre-trained weights for the model are stored in protobuf files on the aggregator node and passed to collaborator nodes during initialization. The settings for aggregator include: - - - :code:`init_state_path`: (str:path) Defines the weight protobuf file path where the experiment's initial weights will be loaded from. These weights will be generated with the `fx plan initialize` command. - - :code:`best_state_path`: (str:path) Defines the weight protobuf file path that will be saved to for the highest accuracy model during the experiment. - - :code:`last_state_path`: (str:path) Defines the weight protobuf file path that will be saved to during the last round completed in each experiment. - - :code:`rounds_to_train`: (int) Specifies the number of rounds in a federation. A federated learning round is defined as one complete iteration when the collaborators train the model and send the updated model weights back to the aggregator to form a new global model. Within a round, collaborators can train the model for multiple iterations called epochs. - - :code:`write_logs`: (boolean) Metric logging callback feature. By default, logging is done through `tensorboard `_ but users can also use custom metric logging function for each task. - - -- :class:`Collaborator ` - `openfl.component.Collaborator `_ - Defines the settings for the collaborator which is the data owner in the experiment. The settings for collaborator include: - - - :code:`delta_updates`: (boolean) Determines whether the difference in model weights between the current and previous round will be sent (True), or if whole checkpoints will be sent (False). Setting to delta_updates to True leads to higher sparsity in model weights sent across, which may improve compression ratios. - - :code:`opt_treatment`: (str) Defines the optimizer state treatment policy. Valid options are : 'RESET' - reinitialize optimizer for every round (default), 'CONTINUE_LOCAL' - keep local optimizer state for every round, 'CONTINUE_GLOBAL' - aggregate optimizer state for every round. - - -- :class:`Data Loader ` - `openfl.federated.data.loader.DataLoader `_ - Defines the data loader class that provides access to local dataset. It implements a train loader and a validation loader that takes in the train dataset and the validation dataset respectively. The settings for the dataloader include: - - - :code:`collaborator_count`: (int) The number of collaborators participating in the federation - - :code:`data_group_name`: (str) The name of the dataset - - :code:`batch_size`: (int) The size of the training or validation batch - - -- :class:`Task Runner ` - `openfl.federated.task.runner.TaskRunner `_ - Defines the model, training/validation functions, and how to extract and set the tensors from model weights and optimizer dictionary. Depending on different AI frameworks like PyTorch and Tensorflow, users can select pre-defined task runner methods. - - -- :class:`Assigner ` - `openfl.component.Assigner `_ - Defines the task that are sent to the collaborators from the aggregator. There are three default tasks that could be given to each Collaborator: - - - :code:`aggregated_model_validation`: (str) Perform validation on aggregated global model sent by the aggregator. - - :code:`train`: (str) Perform training on the global model. - - :code:`locally_tuned_model_validation`: (str) Perform validation on the model that was locally trained by the collaborator. - - -Each YAML top-level section contains the following subsections: - -- ``template``: The name of the class including top-level packages names. An instance of this class is created when the plan gets initialized. -- ``settings``: The arguments that are passed to the class constructor. -- ``defaults``: The file that contains default settings for this subsection. - Any setting from defaults file can be overridden in the **plan.yaml** file. - -The following is an example of a **plan.yaml**: - -.. literalinclude:: ../openfl-workspace/torch_cnn_mnist/plan/plan.yaml - :language: yaml - - -Tasks -^^^^^ - -Each task subsection contains the following: - -- ``function``: The function name to call. - The function must be the one defined in :class:`TaskRunner ` class. -- ``kwargs``: kwargs passed to the ``function``. - -.. note:: - See an `example `_ of the :class:`TaskRunner ` class for details. - - -.. _running_the_federation_manual: - - -.. _interactive_api: - - - -Bare Metal Approach -------------------- - -.. note:: - - Ensure you have installed the |productName| package on every node (aggregator and collaborators) in the federation. - - See :ref:`install_package` for details. - - -You can use the `"Hello Federation" python script `_ to quickly create a federation (an aggregator node and two collaborator nodes) to test the project pipeline. - -.. literalinclude:: ../tests/github/test_hello_federation.py - :language: python - -However, continue with the following procedure for details in creating a federation with an aggregator-based workflow. - - `STEP 1: Create a Workspace`_ - - - Creates a federated learning workspace on one of the nodes. - - - `STEP 2: Configure the Federation`_ - - - Ensures each node in the federation has a valid public key infrastructure (PKI) certificate. - - Distributes the workspace from the aggregator node to the other collaborator nodes. - - - `STEP 3: Start the Federation`_ - - -.. _creating_workspaces: - - -STEP 1: Create a Workspace -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -1. Start a Python 3.8 (>=3.6, <3.11) virtual environment and confirm |productName| is available. - - .. code-block:: python - - fx - - -2. This example uses the :code:`keras_cnn_mnist` template. - - Set the environment variables to use the :code:`keras_cnn_mnist` as the template and :code:`${HOME}/my_federation` as the path to the workspace directory. - - .. code-block:: console - - export WORKSPACE_TEMPLATE=keras_cnn_mnist - export WORKSPACE_PATH=${HOME}/my_federation - -3. Decide a workspace template, which are end-to-end federated learning training demonstrations. The following is a sample of available templates: - - - :code:`keras_cnn_mnist`: a workspace with a simple `Keras `__ CNN model that will download the `MNIST `_ dataset and train in a federation. - - :code:`tf_2dunet`: a workspace with a simple `TensorFlow `__ CNN model that will use the `BraTS `_ dataset and train in a federation. - - :code:`tf_cnn_histology`: a workspace with a simple `TensorFlow `__ CNN model that will download the `Colorectal Histology `_ dataset and train in a federation. - - :code:`torch_cnn_histology`: a workspace with a simple `PyTorch `__ CNN model that will download the `Colorectal Histology `_ dataset and train in a federation. - - :code:`torch_cnn_mnist`: a workspace with a simple `PyTorch `__ CNN model that will download the `MNIST `_ dataset and train in a federation. - - See the complete list of available templates. - - .. code-block:: console - - fx workspace create --prefix ${WORKSPACE_PATH} - - -4. Create a workspace directory for the new federation project. - - .. code-block:: console - - fx workspace create --prefix ${WORKSPACE_PATH} --template ${WORKSPACE_TEMPLATE} - - - .. note:: - - You can use your own models by overwriting the Python scripts in the **src** subdirectory in the workspace directory. - -5. Change to the workspace directory. - - .. code-block:: console - - cd ${WORKSPACE_PATH} - -6. Install the workspace requirements: - - .. code-block:: console - - pip install -r requirements.txt - - -7. Create an initial set of random model weights. - - .. note:: - - While models can be trained from scratch, in many cases the federation performs fine-tuning of a previously trained model. For this reason, pre-trained weights for the model are stored in protobuf files on the aggregator node and passed to collaborator nodes during initialization. - - The protobuf file with the initial weights is found in **${WORKSPACE_TEMPLATE}_init.pbuf**. - - - .. code-block:: console - - fx plan initialize - - - This command initializes the FL plan and auto populates the `fully qualified domain name (FQDN) `_ of the aggregator node. This FQDN is embedded within the FL plan so the collaborator nodes know the address of the externally accessible aggregator server to connect to. - - If you have connection issues with the auto populated FQDN in the FL plan, you can do **one of the following**: - - - OPTION 1: override the auto populated FQDN value with the :code:`-a` flag. - - .. code-block:: console - - fx plan initialize -a aggregator-hostname.internal-domain.com - - - OPTION 2: override the apparent FQDN of the system by setting an FQDN environment variable. - - .. code-block:: console - - export FQDN=x.x.x.x - - and initializing the FL plan - - .. code-block:: console - - fx plan initialize - - -.. note:: - - Each workspace may have multiple FL plans and multiple collaborator lists associated with it. Therefore, :code:`fx plan initialize` has the following optional parameters. - - +-------------------------+---------------------------------------------------------+ - | Optional Parameters | Description | - +=========================+=========================================================+ - | -p, --plan_config PATH | Federated Learning plan [default = plan/plan.yaml] | - +-------------------------+---------------------------------------------------------+ - | -c, --cols_config PATH | Authorized collaborator list [default = plan/cols.yaml] | - +-------------------------+---------------------------------------------------------+ - | -d, --data_config PATH | The data set/shard configuration file | - +-------------------------+---------------------------------------------------------+ - - - -.. _configure_the_federation: - - -STEP 2: Configure the Federation -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The objectives in this step: - - - Ensure each node in the federation has a valid public key infrastructure (PKI) certificate. See :doc:`/source/utilities/pki` for details on available workflows. - - Distribute the workspace from the aggregator node to the other collaborator nodes. - - -.. _install_certs_agg: - -**On the Aggregator Node:** - -Setting Up the Certificate Authority - -1. Change to the path of your workspace: - - .. code-block:: console - - cd WORKSPACE_PATH - -2. Set up the aggregator node as the `certificate authority `_ for the federation. - - All certificates will be signed by the aggregator node. Follow the instructions and enter the information as prompted. The command will create a simple database file to keep track of all issued certificates. - - .. code-block:: console - - fx workspace certify - -3. Run the aggregator certificate creation command, replacing :code:`AFQDN` with the actual `fully qualified domain name (FQDN) `_ for the aggregator node. - - .. code-block:: console - - fx aggregator generate-cert-request --fqdn AFQDN - - .. note:: - - On Linux\*\, you can discover the FQDN with this command: - - .. code-block:: console - - hostname --all-fqdns | awk '{print $1}' - - .. note:: - - You can override the apparent FQDN of the system by setting an FQDN environment variable before creating the certificate. - - .. code-block:: console - - fx aggregator generate-cert-request export FQDN=x.x.x.x - - If you omit the :code:`--fdqn` parameter, then :code:`fx` will automatically use the FQDN of the current node assuming the node has been correctly set with a static address. - - .. code-block:: console - - fx aggregator generate-cert-request - -4. Run the aggregator certificate signing command, replacing :code:`AFQDN` with the actual `fully qualified domain name (FQDN) `_ for the aggregator node. - - .. code-block:: console - - fx aggregator certify --fqdn AFQDN - - - .. note:: - - You can override the apparent FQDN of the system by setting an FQDN environment variable (:code:`export FQDN=x.x.x.x`) before signing the certificate. - - .. code-block:: console - - fx aggregator certify export FQDN=x.x.x.x - -5. This node now has a signed security certificate as the aggregator for this new federation. You should have the following files. - - +---------------------------+--------------------------------------------------+ - | File Type | Filename | - +===========================+==================================================+ - | Certificate chain | WORKSPACE.PATH/cert/cert_chain.crt | - +---------------------------+--------------------------------------------------+ - | Aggregator certificate | WORKSPACE.PATH/cert/server/agg_{AFQDN}.crt | - +---------------------------+--------------------------------------------------+ - | Aggregator key | WORKSPACE.PATH/cert/server/agg_{AFQDN}.key | - +---------------------------+--------------------------------------------------+ - - where **AFQDN** is the fully-qualified domain name of the aggregator node. - -.. _workspace_export: - -Exporting the Workspace - - -1. Export the workspace so that it can be imported to the collaborator nodes. - - .. code-block:: console - - fx workspace export - - The :code:`export` command will archive the current workspace (with a :code:`zip` file extension) and create a **requirements.txt** of the current Python\*\ packages in the virtual environment. - -2. The next step is to transfer this workspace archive to each collaborator node. - - -.. _install_certs_colab: - -**On the Collaborator Node**: - -Importing the Workspace - -1. Copy the :ref:`workspace archive ` from the aggregator node to the collaborator nodes. - -2. Import the workspace archive. - - .. code-block:: console - - fx workspace import --archive WORKSPACE.zip - - where **WORKSPACE.zip** is the name of the workspace archive. This will unzip the workspace to the current directory and install the required Python packages within the current virtual environment. - -3. For each test machine you want to run as collaborator nodes, create a collaborator certificate request to be signed by the certificate authority. - - Replace :code:`COL_LABEL` with the label you assigned to the collaborator. This label does not have to be the FQDN; it can be any unique alphanumeric label. - - .. code-block:: console - - fx collaborator create -n {COL_LABEL} -d {DATA_PATH:optional} - fx collaborator generate-cert-request -n {COL_LABEL} - - - The creation script will also ask you to specify the path to the data. For this example, enter the integer that represents which MNIST shard to use on this collaborator node. For the first collaborator node enter **1**. For the second collaborator node enter **2**. - - This will create the following files: - - +-----------------------------+--------------------------------------------------------+ - | File Type | Filename | - +=============================+========================================================+ - | Collaborator CSR | WORKSPACE.PATH/cert/client/col_{COL_LABEL}.csr | - +-----------------------------+--------------------------------------------------------+ - | Collaborator key | WORKSPACE.PATH/cert/client/col_{COL_LABEL}.key | - +-----------------------------+--------------------------------------------------------+ - | Collaborator CSR Package | WORKSPACE.PATH/col_{COL_LABEL}_to_agg_cert_request.zip | - +-----------------------------+--------------------------------------------------------+ - - -4. On the aggregator node (i.e., the certificate authority in this example), sign the Collaborator CSR Package from the collaborator nodes. - - .. code-block:: console - - fx collaborator certify --request-pkg /PATH/TO/col_{COL_LABEL}_to_agg_cert_request.zip - - where :code:`/PATH/TO/col_{COL_LABEL}_to_agg_cert_request.zip` is the path to the Collaborator CSR Package containing the :code:`.csr` file from the collaborator node. The certificate authority will sign this certificate for use in the federation. - - The command packages the signed collaborator certificate, along with the **cert_chain.crt** file needed to verify certificate signatures, for transport back to the collaborator node: - - +---------------------------------+------------------------------------------------------------+ - | File Type | Filename | - +=================================+============================================================+ - | Certificate and Chain Package | WORKSPACE.PATH/agg_to_col_{COL_LABEL}_signed_cert.zip | - +---------------------------------+------------------------------------------------------------+ - -5. On the collaborator node, import the signed certificate and certificate chain into your workspace. - - .. code-block:: console - - fx collaborator certify --import /PATH/TO/agg_to_col_{COL_LABEL}_signed_cert.zip - - - -.. _running_the_federation.start_nodes: - - -STEP 3: Start the Federation -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -**On the Aggregator Node:** - -1. Start the Aggregator. - - .. code-block:: console - - fx aggregator start - - Now, the Aggregator is running and waiting for Collaborators to connect. - -.. _running_collaborators: - -**On the Collaborator Nodes:** - -1. Open a new terminal, change the directory to the workspace, and activate the virtual environment. - -2. Run the Collaborator. - - .. code-block:: console - - fx collaborator start -n {COLLABORATOR_LABEL} - - where :code:`COLLABORATOR_LABEL` is the label for this Collaborator. - - .. note:: - - Each workspace may have multiple FL plans and multiple collaborator lists associated with it. - Therefore, :code:`fx collaborator start` has the following optional parameters. - - +-------------------------+---------------------------------------------------------+ - | Optional Parameters | Description | - +=========================+=========================================================+ - | -p, --plan_config PATH | Federated Learning plan [default = plan/plan.yaml] | - +-------------------------+---------------------------------------------------------+ - | -d, --data_config PATH | The data set/shard configuration file | - +-------------------------+---------------------------------------------------------+ - -3. Repeat the earlier steps for each collaborator node in the federation. - - When all of the Collaborators connect, the Aggregator starts training. You will see log messages describing the progress of the federated training. - - When the last round of training is completed, the Aggregator stores the final weights in the protobuf file that was specified in the YAML file, which in this example is located at **save/${WORKSPACE_TEMPLATE}_latest.pbuf**. - - -Post Experiment -^^^^^^^^^^^^^^^ - -Experiment owners may access the final model in its native format. -Among other training artifacts, the aggregator creates the last and best aggregated (highest validation score) model snapshots. One may convert a snapshot to the native format and save the model to disk by calling the following command from the workspace: - -.. code-block:: console - - fx model save -i model_protobuf_path.pth -o save_model_path - -In order for this command to succeed, the **TaskRunner** used in the experiment must implement a :code:`save_native()` method. - -Another way to access the trained model is by calling the API command directly from a Python script: - -.. code-block:: python - - from openfl import get_model - model = get_model(plan_config, cols_config, data_config, model_protobuf_path) - -In fact, the :code:`get_model()` method returns a **TaskRunner** object loaded with the chosen model snapshot. Users may utilize the linked model as a regular Python object. - - -.. _running_the_federation_docker: - - -Docker Approach ---------------- - -There are two ways you can run |productName| with Docker\*\. - -- `Option 1: Deploy a Federation in a Docker Container`_ -- `Option 2: Deploy Your Workspace in a Docker Container`_ - - -.. _running_the_federation_docker_base_image: - -Option 1: Deploy a Federation in a Docker Container -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -.. note:: - You have to built an |productName| image. See :ref:`install_docker` for details. - - -1. Run the |productName| image. - - .. code-block:: console - - docker run -it --network host openfl - - -You can now experiment with |productName| in the container. For example, you can test the project pipeline with the `"Hello Federation" bash script `_. - - -.. _running_the_federation_docker_workspace: - -Option 2: Deploy Your Workspace in a Docker Container -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -.. note:: - You have to set up a TaskRunner and run :code:`fx plan initialize` in the workspace directory. See `STEP 1: Create a Workspace`_ for details. - - -1. Build an image with the workspace you created. - - .. code-block:: console - - fx workspace dockerize - - - By default, the image is saved as **WORKSPACE_NAME_image.tar** in the workspace directory. - -2. The image can be distributed and run on other nodes without any environment preparation. - - .. parsed-literal:: - - docker run -it --rm \\ - --network host \\ - -v user_data_folder:/home/user/workspace/data \\ - ${WORKSPACE_IMAGE_NAME} \\ - bash - - - .. note:: - - The FL plan should be initialized with the FQDN of the node where the aggregator container will be running. - -3. Generate public key infrastructure (PKI) certificates for all collaborators and the aggregator. See :doc:`/source/utilities/pki` for details. - -4. `STEP 3: Start the Federation`_. +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +.. _running_interactive: + +================ +Interactive API +================ + +A director-based workflow uses long-lived components in a federation. These components continue to be available to distribute more experiments in the federation. + +- The *Director* is the central node of the federation. This component starts an *Aggregator* for each experiment, sends data to connected collaborator nodes, and provides updates on the status. +- The *Envoy* runs on collaborator nodes connected to the *Director*. When the *Director* starts an experiment, the *Envoy* starts the *Collaborator* to train the global model. + + +The director-based workflow comprises the following roles and their tasks: + + - `Director Manager: Set Up the Director`_ + - `Collaborator Manager: Set Up the Envoy`_ + - `Experiment Manager: Describe an Experiment`_ + +Follow the procedure in the director-based workflow to become familiar with the setup required and APIs provided for each role in the federation: *Experiment manager (Data scientist)*, *Director manager*, and *Collaborator manager*. + +- *Experiment manager* (or Data scientist) is a person or group of people using OpenFL. +- *Director Manager* is ML model creator's representative controlling Director. +- *Collaborator manager* is Data owner's representative controlling Envoy. + +.. note:: + The Open Federated Learning (|productName|) interactive Python API enables the Experiment manager (data scientists) to define and start a federated learning experiment from a single entry point: a Jupyter\*\ notebook or a Python\*\ script. + + See `Interactive Python API (Beta)`_ for details. + +An overview of this workflow is shown below. + +.. figure:: ../../source/openfl/director_workflow.svg + +.. centered:: Overview of the Director-Based Workflow + + +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + + +.. _establishing_federation_director: + +Director Manager: Set Up the Director +------------------------------------- + +The *Director manager* sets up the *Director*, which is the central node of the federation. + + - :ref:`plan_agreement_director` + - :ref:`optional_step_create_pki_using_step_ca` + - :ref:`step0_install_director_prerequisites` + - :ref:`step1_start_the_director` + +.. _plan_agreement_director: + +OPTIONAL STEP: Director's Plan Agreement +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +In order to carry out a secure federation, the Director must approve the FL Plan before starting the experiment. This check could be enforced with the use of the setting :code:`review_experiment: True` in director config. Refer to **director_config_review_exp.yaml** file under **PyTorch_Histology** interactive API example. +After the Director approves the experiment, it starts the aggregator and sends the experiment archive to all the participanting Envoys for review. +On the other hand, if the Director rejects the experiment, the experiment is aborted right away, no aggregator is started and the Envoys don't receive the experiment archive at all. + +.. _optional_step_create_pki_using_step_ca: + +OPTIONAL STEP: Create PKI Certificates Using Step-CA +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The use of mutual Transport Layer Security (mTLS) is recommended for deployments in untrusted environments to establish participant identity and to encrypt communication. You may either import certificates provided by your organization or generate certificates with the :ref:`semi-automatic PKI ` provided by |productName|. + +.. _step0_install_director_prerequisites: + +STEP 1: Install Open Federated Learning (|productName|) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Install |productName| in a virtual Python\*\ environment. See :ref:`install_package` for details. + +.. _step1_start_the_director: + +STEP 2: Start the Director +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Start the Director on a node with at least two open ports. See :ref:`openfl_ll_components` to learn more about the Director entity. + +1. Create a Director workspace with a default config file. + + .. code-block:: console + + fx director create-workspace -p path/to/director_workspace_dir + + This workspace will contain received experiments and supplementary files (Director config file and certificates). + +2. Modify the Director config file according to your federation setup. + + The default config file contains the Director node FQDN, an open port, path of certificates, and :code:`sample_shape` and :code:`target_shape` fields with string representation of the unified data interface in the federation. + +3. Start the Director. + + If mTLS protection is not set up, run this command. + + .. code-block:: console + + fx director start --disable-tls -c director_config.yaml + + If you have a federation with PKI certificates, run this command. + + .. code-block:: console + + fx director start -c director_config.yaml \ + -rc cert/root_ca.crt \ + -pk cert/priv.key \ + -oc cert/open.crt + + + +.. _establishing_federation_envoy: + +Collaborator Manager: Set Up the Envoy +-------------------------------------- + +The *Collaborator manager* sets up the *Envoys*, which are long-lived components on collaborator nodes. When started, Envoys will try to connect to the Director. Envoys receive an experiment archive and provide access to local data. + + - :ref:`plan_agreement_envoy` + - :ref:`optional_step_sign_pki_envoy` + - :ref:`step0_install_envoy_prerequisites` + - :ref:`step1_start_the_envoy` + +.. _plan_agreement_envoy: + +OPTIONAL STEP: Envoy's Plan Agreement +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +In order to carry out a secure federation, each of the Envoys must approve the experiment before it is started, after the Director's approval. This check could be enforced with the use of the parameter :code:`review_experiment: True` in envoy config. Refer to **envoy_config_review_exp.yaml** file under **PyTorch_Histology** interactive API example. +If any of the Envoys rejects the experiment, a :code:`set_experiment_failed` request is sent to the Director to stop the aggregator. + +.. _optional_step_sign_pki_envoy: + +OPTIONAL STEP: Sign PKI Certificates (Optional) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The use of mTLS is recommended for deployments in untrusted environments to establish participant identity and to encrypt communication. You may either import certificates provided by your organization or use the :ref:`semi-automatic PKI certificate ` provided by |productName|. + + +.. _step0_install_envoy_prerequisites: + +STEP 1: Install |productName| +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Install |productName| in a Python\*\ virtual environment. See :ref:`install_package` for details. + + +.. _step1_start_the_envoy: + +STEP 2: Start the Envoy +^^^^^^^^^^^^^^^^^^^^^^^ + +1. Create an Envoy workspace with a default config file and shard descriptor Python\*\ script. + + .. code-block:: console + + fx envoy create-workspace -p path/to/envoy_workspace_dir + +2. Modify the Envoy config file and local shard descriptor template. + + - Provide the settings field with the arbitrary settings required to initialize the shard descriptor. + - Complete the shard descriptor template field with the address of the local shard descriptor class. + + .. note:: + The shard descriptor is an object to provide a unified data interface for FL experiments. + The shard descriptor implements :code:`get_dataset()` method as well as several additional + methods to access **sample shape**, **target shape**, and **shard description** that may be used to identify + participants during experiment definition and execution. + + :code:`get_dataset()` method accepts the dataset_type (for instance train, validation, query, gallery) and returns + an iterable object with samples and targets. + + User's implementation of ShardDescriptor should be inherented from :code:`openfl.interface.interactive_api.shard_descriptor.ShardDescriptor`. It should implement :code:`get_dataset`, :code:`sample_shape` and :code:`target_shape` methods to describe the way data samples and labels will be loaded from disk during training. + +3. Start the Envoy. + + If mTLS protection is not set up, run this command. + + .. code-block:: console + + ENVOY_NAME=envoy_example_name + + fx envoy start \ + -n "$ENVOY_NAME" \ + --disable-tls \ + --envoy-config-path envoy_config.yaml \ + -dh director_fqdn \ + -dp port + + If you have a federation with PKI certificates, run this command. + + .. code-block:: console + + ENVOY_NAME=envoy_example_name + + fx envoy start \ + -n "$ENVOY_NAME" \ + --envoy-config-path envoy_config.yaml \ + -dh director_fqdn \ + -dp port \ + -rc cert/root_ca.crt \ + -pk cert/"$ENVOY_NAME".key \ + -oc cert/"$ENVOY_NAME".crt + + +.. _establishing_federation_experiment_manager: + +Experiment Manager: Describe an Experiment +------------------------------------------ + +The process of defining an experiment is decoupled from the process of establishing a federation. +The Experiment manager (or data scientist) is able to prepare an experiment in a Python environment. +Then the Experiment manager registers experiments into the federation using `Interactive Python API (Beta)`_ +that is allow to communicate with the Director using a gRPC client. + + +.. _interactive_python_api: + +Interactive Python API (Beta) +----------------------------- + +The Open Federated Learning (|productName|) interactive Python API enables the Experiment manager (data scientists) to define and start a federated learning experiment from a single entry point: a Jupyter\*\ notebook or a Python script. + + - `Prerequisites`_ + - `Define a Federated Learning Experiment`_ + - `Federation API`_ + - `Experiment API`_ + - `Start an FL Experiment`_ + + +.. _prerequisites: + +Prerequisites +^^^^^^^^^^^^^ + +The Experiment manager requires the following: + +Python Intepreter + Create a virtual Python environment with packages required for conducting the experiment. The Python environment is replicated on collaborator nodes. + +A Local Experiment Workspace + Initialize a workspace by creating an empty directory and placing inside the workspace a Jupyter\*\ notebook or a Python script. + + Items in the workspace may include: + + - source code of objects imported into the notebook from local modules + - local test data stored in a **data** directory + - certificates stored in a **cert** directory + + .. note:: + + This workspace will be archived and transferred to collaborator nodes. Ensure only relevant source code or resources are stored in the workspace. + **data** and **cert** directories will not be included in the archive. + + +.. _federation_api_define_fl_experiment: + +Define a Federated Learning Experiment +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The definition process of a federated learning experiment uses the interactive Python API to set up several interface entities and experiment parameters. + +The following are the interactive Python API to define an experiment: + + - `Federation API`_ + - `Experiment API`_ + - `Start an FL Experiment`_ + - `Observe the Experiment Execution`_ + +.. note:: + Each federation is bound to some Machine Learning problem in a sense that all collaborators dataset shards should allow to solve the same data science problem. + For example object detection and semantic segmentation problems should be solved in different federations. \ + + +.. _federation_api: + +Federation API +"""""""""""""" + +The *Federation* entity is designed to be a bridge between a notebook and *Director*. + + +1. Import the Federation class from openfl package + + .. code-block:: python + + from openfl.interface.interactive_api.federation import Federation + + +2. Initialize the Federation object with the Director node network address and encryption settings. + + .. code-block:: python + + federation = Federation( + client_id: str, director_node_fqdn: str, director_port: str + tls: bool, cert_chain: str, api_cert: str, api_private_key: str) + + .. note:: + You may disable mTLS in trusted environments or enable mTLS by providing paths to the certificate chain of the API authority, aggregator certificate, and a private key. + + +.. note:: + Methods available in the Federation API: + + - :code:`get_dummy_shard_descriptor`: creates a dummy shard descriptor for debugging the experiment pipeline + - :code:`get_shard_registry`: returns information about the Envoys connected to the Director and their shard descriptors + +.. _experiment_api: + +Experiment API +"""""""""""""" + +The *Experiment* entity registers training-related objects, federated learning (FL) tasks, and settings. + +1. Import the FLExperiment class from openfl package + + .. code-block:: python + + from openfl.interface.interactive_api.experiment import FLExperiment + +2. Initialize the experiment with the following parameters: a federation object and a unique experiment name. + + .. code-block:: python + + fl_experiment = FLExperiment(federation: Federation, experiment_name: str) + +3. Import these supplementary interface classes: :code:`TaskInterface`, :code:`DataInterface`, and :code:`ModelInterface`. + + .. code-block:: python + + from openfl.interface.interactive_api.experiment import TaskInterface, DataInterface, ModelInterface + + +.. _experiment_api_modelinterface: + +Register the Model and Optimizer ( :code:`ModelInterface` ) + +Instantiate and initialize a model and optimizer in your preferred deep learning framework. + + .. code-block:: python + + from openfl.interface.interactive_api.experiment import ModelInterface + MI = ModelInterface(model, optimizer, framework_plugin: str) + +The initialized model and optimizer objects should be passed to the :code:`ModelInterface` along with the path to correct Framework Adapter plugin inside the |productName| package +or from local workspace. + +.. note:: + The |productName| interactive API supports *TensorFlow* and *PyTorch* frameworks via existing plugins. + User can add support for other deep learning frameworks via the plugin interface and point to your implementation of a :code:`framework_plugin` in :code:`ModelInterface`. + + +.. _experiment_api_taskinterface: + +Register FL Tasks ( :code:`TaskInterface` ) + +An FL task accepts the following objects: + + - :code:`model` - will be rebuilt with relevant weights for every task by `TaskRunner` + - :code:`data_loader` - data loader that will provide local data + - :code:`device` - a device to be used for execution on collaborator machines + - :code:`optimizer` (optional) - model optimizer; only for training tasks + +Register an FL task and accompanying information. + + .. code-block:: python + + TI = TaskInterface() + + task_settings = { + 'batch_size': 32, + 'some_arg': 228, + } + @TI.add_kwargs(**task_settings) + @TI.register_fl_task(model='my_model', data_loader='train_loader', + device='device', optimizer='my_Adam_opt') + def foo(my_model, train_loader, my_Adam_opt, device, batch_size, some_arg=356): + # training or validation logic + ... + +FL tasks return a dictionary object with metrics: :code:`{metric name: metric value for this task}`. + +.. note:: + The |productName| interactive API currently allows registering only standalone functions defined in the main module or imported from other modules inside the workspace. + + The :code:`TaskInterface` class must be instantiated before you can use its methods to register FL tasks. + + - :code:`@TI.register_fl_task()` needs tasks argument names for :code:`model`, :code:`data_loader`, :code:`device` , and :code:`optimizer` (optional) that constitute a *task contract*. This method adds the callable and the task contract to the task registry. + - :code:`@TI.add_kwargs()` should be used to set up arguments that are not included in the contract. + + +.. _experiment_api_datainterface: + +Register Federated Data Loader ( :code:`DataInterface` ) + +A *shard descriptor* defines how to read and format the local data. Therefore, the *data loader* contains the batching and augmenting data logic, which are common for all collaborators. + +Subclass :code:`DataInterface` and implement the following methods. + + .. code-block:: python + + class CustomDataLoader(DataInterface): + def __init__(self, **kwargs): + # Initialize superclass with kwargs: this array will be passed + # to get_data_loader methods + super().__init__(**kwargs) + # Set up augmentation, save required parameters, + # use it as you regular dataset class + validation_fraction = kwargs.get('validation_fraction', 0.5) + ... + + @property + def shard_descriptor(self): + return self._shard_descriptor + + @shard_descriptor.setter + def shard_descriptor(self, shard_descriptor): + self._shard_descriptor = shard_descriptor + # You can implement data splitting logic here + # Or update your data set according to local Shard Descriptor atributes if required + + def get_train_loader(self, **kwargs): + # these are the same kwargs you provided to __init__, + # But passed on a collaborator machine + bs = kwargs.get('train_batch_size', 32) + return foo_loader() + + # so on, see the full list of methods below + + +The following are shard descriptor setter and getter methods: + + - :code:`shard_descriptor(self, shard_descriptor)` is called during the *Collaborator* initialization procedure with the local shard descriptor. Include in this method any logic that is triggered with the shard descriptor replacement. + - :code:`get_train_loader(self, **kwargs)` is called before the execution of training tasks. This method returns the outcome of the training task according to the :code:`data_loader` contract argument. The :code:`kwargs` dict returns the same information that was provided during the :code:`DataInterface` initialization. + - :code:`get_valid_loader(self, **kwargs)` is called before the execution of validation tasks. This method returns the outcome of the validation task according to the :code:`data_loader` contract argument. The :code:`kwargs` dict returns the same information that was provided during the :code:`DataInterface` initialization. + - :code:`get_train_data_size(self)` returns the number of samples in the local dataset for training. Use the information provided by the shard descriptor to determine how to split your training and validation tasks. + - :code:`get_valid_data_size(self)` returns the number of samples in the local dataset for validation. + + +.. note:: + + - The *User Dataset* class should be instantiated to pass further to the *Experiment* object. + - Dummy *shard descriptor* (or a custom local one) may be set up to test the augmentation or batching pipeline. + - Keyword arguments used during initialization on the frontend node may be used during dataloaders construction on collaborator machines. + + + +.. _federation_api_start_fl_experiment: + +Start an FL Experiment +^^^^^^^^^^^^^^^^^^^^^^ + +Use the Experiment API to prepare a workspace archive to transfer to the *Director*. + + .. code-block:: python + + FLExperiment.start() + + .. note:: + Instances of interface classes :code:`(TaskInterface, DataInterface, ModelInterface)` must be passed to :code:`FLExperiment.start()` method along with other parameters. + + This method: + + - Compiles all provided settings to a Plan object. The Plan is the central place where all actors in federation look up their parameters. + - Saves **plan.yaml** to the :code:`plan` folder inside the workspace. + - Serializes interface objects on the disk. + - Prepares **requirements.txt** for remote Python environment setup. + - Compresses the whole workspace to an archive. + - Sends the experiment archive to the *Director* so it may distribute the archive across the federation and start the *Aggregator*. + +FLExperiment :code:`start()` Method Parameters +"""""""""""""""""""""""""""""""""""""""""""""" + +The following are parameters of the :code:`start()` method in FLExperiment: + +:code:`model_provider` + This parameter is defined earlier by the :code:`ModelInterface` object. + +:code:`task_keeper` + This parameter is defined earlier by the :code:`TaskInterface` object. + +:code:`data_loader` + This parameter is defined earlier by the :code:`DataInterface` object. + +:code:`task_assigner` + This parameter is optional. You can pass a `Custom task assigner function`_. + +:code:`rounds_to_train` + This parameter defines the number of aggregation rounds needed to be conducted before the experiment is considered finished. + +:code:`delta_updates` + This parameter sets up the aggregation to use calculated gradients instead of model checkpoints. + +:code:`opt_treatment` + This parameter defines the optimizer state treatment in the federation. The following are available values: + + - **RESET**: the optimizer state is initialized each round from noise + - **CONTINUE_LOCAL**: the optimizer state will be reused locally by every collaborator + - **CONTINUE_GLOBAL**: the optimizer's state will be aggregated + +:code:`device_assignment_policy` + The following are available values: + + - **CPU_ONLY**: the :code:`device` parameter (which is a part of a task contract) that is passed to an FL task each round will be **cpu** + - **CUDA_PREFFERED**: the :code:`device` parameter will be **cuda:{index}** if CUDA devices are enabled in the Envoy config and **cpu** otherwise. + + +.. _federation_api_observe_fl_experiment: + +Observe the Experiment Execution +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If the experiment was accepted by the *Director*, you can oversee its execution with the :code:`FLexperiment.stream_metrics()` method. This method prints metrics from the FL tasks (and saves TensorBoard logs). + +.. _federation_api_get_fl_experiment_status: + +Get Experiment Status +^^^^^^^^^^^^^^^^^^^^^ + +You can get the current experiment status with the :code:`FLexperiment.get_experiment_status()` method. The status could be pending, in progress, finished, rejected or failed. + +.. _federation_api_complete_fl_experiment: + +Complete the Experiment +^^^^^^^^^^^^^^^^^^^^^^^ + +When the experiment has completed: + + - retrieve trained models in the native format using :code:`FLexperiment.get_best_model()` and :code:`FLexperiment.get_last_model()`. + - erase experiment artifacts from the Director with :code:`FLexperiment.remove_experiment_data()`. + + +You may use the same federation object to report another experiment or even schedule several experiments that will be executed in series. + +Custom task assigner function +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +OpenFL has an entity named Task Assigner, that responsible for aggregator task assigning to collaborators. +There are three default tasks that are used: :code:`train`, :code:`locally_tuned_model_validate`, +:code:`aggregated_model_validate`. +When you register a train function and pass optimizer it generates a train task: + + .. code-block:: python + + task_keeper = TaskInterface() + + + @task_keeper.register_fl_task(model='net_model', data_loader='train_loader', + device='device', optimizer='optimizer') + def train(net_model, train_loader, optimizer, device, loss_fn=cross_entropy, some_parameter=None): + torch.manual_seed(0) + ... + +When you register a validate function, it generates two tasks: :code:`locally_tuned_model_validate` and +:code:`aggregated_model_validate`. +:code:`locally_tuned_model_validate` is applied by collaborator to locally trained model, +:code:`aggregated_model_validate` - to a globally aggregated model. +If there not a train task only aggregated_model_validate are generated. + +Since 1.3 version it is possible to create a custom task assigner function to implement your own task assigning logic. +You can get registered task from :code:`task_keeper` calling method :code:`get_registered_tasks`: + + .. code-block:: python + + tasks = task_keeper.get_registered_tasks() + + +And then implement your own assigner function: + + .. code-block:: python + + def random_assigner(collaborators, round_number, **kwargs): + """Assigning task groups randomly while ensuring target distribution""" + import random + random.shuffle(collaborators) + collaborator_task_map = {} + for idx, col in enumerate(collaborators): + # select only 70% collaborators for training and validation, 30% for validation + if (idx+1)/len(collaborators) <= 0.7: + collaborator_task_map[col] = tasks.values() # all three tasks + else: + collaborator_task_map[col] = [tasks['aggregated_model_validate']] + return collaborator_task_map + +And then pass that function to fl_experiment start method: + .. code-block:: python + + fl_experiment.start( + model_provider=model_interface, + task_keeper=task_keeper, + data_loader=fed_dataset, + task_assigner=random_assigner, + rounds_to_train=50, + opt_treatment='CONTINUE_GLOBAL', + device_assignment_policy='CUDA_PREFERRED' + ) + + +It will be passed to assigner and tasks will be assigned to collaborators by using this function. + +Another example. +If you want only exclude some collaborators from experiment, you can define next assigner function: + + .. code-block:: python + + def filter_assigner(collaborators, round_number, **kwargs): + collaborator_task_map = {} + exclude_collaborators = ['env_two', 'env_three'] + for collaborator_name in collaborators: + if collaborator_name in exclude_collaborators: + continue + collaborator_task_map[collaborator_name] = [ + tasks['train'], + tasks['locally_tuned_model_validate'], + tasks['aggregated_model_validate'] + ] + return collaborator_task_map + + +Also you can use static shard information to exclude any collaborators without cuda devices from training: + + .. code-block:: python + + shard_registry = federation.get_shard_registry() + def filter_by_shard_registry_assigner(collaborators, round_number, **kwargs): + collaborator_task_map = {} + for collaborator in collaborators: + col_status = shard_registry.get(collaborator) + if not col_status or not col_status['is_online']: + continue + node_info = col_status['shard_info'].node_info + # Assign train task if collaborator has GPU with total memory more that 8 GB + if len(node_info.cuda_devices) > 0 and node_info.cuda_devices[0].memory_total > 8 * 1024**3: + collaborator_task_map[collaborator] = [ + tasks['train'], + tasks['locally_tuned_model_validate'], + tasks['aggregated_model_validate'], + ] + else: + collaborator_task_map[collaborator] = [ + tasks['aggregated_model_validate'], + ] + return collaborator_task_map + + +Assigner with additional validation round: + + .. code-block:: python + + rounds_to_train = 3 + total_rounds = rounds_to_train + 1 # use fl_experiment.start(..., rounds_to_train=total_rounds,...) + + def assigner_with_last_round_validation(collaborators, round_number, **kwargs): + collaborator_task_map = {} + for collaborator in collaborators: + if round_number == total_rounds - 1: + collaborator_task_map[collaborator] = [ + tasks['aggregated_model_validate'], + ] + else: + collaborator_task_map[collaborator] = [ + tasks['train'], + tasks['locally_tuned_model_validate'], + tasks['aggregated_model_validate'] + ] + return collaborator_task_map + + +.. toctree +.. overview.how_can_intel_protect_federated_learning +.. overview.what_is_intel_federated_learning \ No newline at end of file diff --git a/docs/about/features_index/privacy_meter.rst b/docs/about/features_index/privacy_meter.rst new file mode 100644 index 0000000000..f668960338 --- /dev/null +++ b/docs/about/features_index/privacy_meter.rst @@ -0,0 +1,39 @@ +.. # Copyright (C) 2020-2024 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +Privacy Meter +============== + +On the Integration of Privacy and |productName| +----------------------------------------------- +Federated learning (FL) enables parties to learn from each other without sharing their data. In FL, parties share the local update about a global model in each round with a server. The server aggregates the local updates from all parties to produce the next version of the global model, which will be used by all parties as the initialization for training in the next round. + +Although each party's data remains local, the shared local updates and aggregate global model each round can leak significant information about the private local training datasets. Specifically, the server can infer information about (even potentially reconstruct) the private data from each party based on their shared local update. Even when the server is trusted, collaborating parties of FL can infer other parties' sensitive data based on the updated global model in each round due to the fact that it is influenced by all local model updates. Due to this serious privacy issue, enabling parties to audit their privacy loss becomes a compelling need. + +Privacy meter, based on state-of-the-art membership inference attacks, provides a tool to quantitatively audit data privacy in statistical and machine learning algorithms. The objective of a membership inference attack is to determine whether a given data record was in the training dataset of the target model. Measures of success (accuracy, area under the ROC curve, true positive rate at a given false positive rate ...) for particular membership inference attacks against a target model are used to estimate privacy loss for that model (how much information a target model leaks about its training data). Since stonger attacks may be possible, these measures serve as lower bounds of the actual privacy loss. We have integrated the ML Privacy Meter library into |productName|, generating privacy loss reports for all party's local model updates as well as the global models throughout all rounds of the FL training. + +Threat Model +----------------------------------------------- +Following this, we consider two threat models. +- Server is trusted, and other parties are honest-but-curious (follow the protocol, but try to learn as much as possible from what information they have access to) +In this threat model, each party can audit the privacy loss of the global model, quantifying how much information will be leaked to other parties via the global model. +- Everyone, including the server, is honest-but-curious +In this threat model, each party can audit the privacy loss of the local and global models, quantifying how much information will be leaked to the aggregator via the local model and to the other parties via the global model. + +Workflow +----------------------------------------------- +We provide a demo code in `cifar10_PM.py `_. Here, we briefly describe its workflow. +In each round of FL, parties train, starting with the current global model as initialization, using their local dataset. Then, the current global model and updated local model will be passed to the privacy auditing module (See `audit` function in `cifar10_PM.py`) to produce a privacy loss report. The local model update will then be shared to the server and all such updates aggregated to form the next global model. Though this is a simulation so that no network sharing of models is involved, these reports could be used in a fully distributed setting to trigger actions when the loss is too high. These actions could include not sharing local updates to the aggregator, not +allowing the FL system to release the model to other outside entities, or potentially re-running local training in a differentially private mode and re-auditing in an attempt to reduce the leakage before sharing occurs. + +Methodology +----------------------------------------------- +We integrate the population attack from ML Privacy Meter into |productName|. In the population attack, the adversary first computes the signal (e.g., loss, logits) on all samples in a population dataset using the target model. The population dataset is sampled from the same distribution as the train and test datasets, but is non-overlapping with both. The population dataset signals are then used to determine (using the fact that all population data are known not to be target training samples) a signal threshold for which false positives (samples whose signal against the threshold would be erroneously identified as target training samples) would occur at a rate below a provided false positive rate tolerance. Known positives (target training samples) as well as known negatives (target test samples) are tested against the threshold to determine how well this threshold does at classifying training set memberhsip. + +Therefore, to use this attack for auditing privacy, we assume there is a set of data points used for auditing which is not overlapped with the training dataset. The size of the auditing dataset is indicated by `audit_dataset_ratio` argument. In addition, we also need to define which signal will be used to distinguish members and non-members. Currently, we support loss, logits and gradient norm. When the gradient norm is used for inferring the membership information, we need to specify which layer of the model we would like to compute the gradient with respect to. For instance, if we want to measure the gradient norm with respect to the 10th layer of the representation (before the fully connected layers), we can pass the following argument `--is_feature True` and `--layer_number 10` to the `cifar10_PM.py`. + +To measure the success of the attack (privacy loss), we generate the ROC of the attack and the dynamic of the AUC during the training. In addition, parties can also indicate the false positive rate tolerance, and the privacy loss report will show the maximal true positive rate (fraction of members which is correctly identified) during the training. This false positive rate tolerance is passed to `fpr_tolerance` argument. The privacy loss report will be saved in the folder indicated by `log_dir` argument. + +Examples +----------------------------------------------- +`Here `_, we give a few commands and the results for each of them. \ No newline at end of file diff --git a/docs/about/features_index/pynative.rst b/docs/about/features_index/pynative.rst new file mode 100644 index 0000000000..7106af305d --- /dev/null +++ b/docs/about/features_index/pynative.rst @@ -0,0 +1,12 @@ +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +================= +Python Native API +================= + +TODO + +.. toctree +.. overview.how_can_intel_protect_federated_learning +.. overview.what_is_intel_federated_learning \ No newline at end of file diff --git a/docs/about/features_index/taskrunner.rst b/docs/about/features_index/taskrunner.rst new file mode 100644 index 0000000000..064df64a14 --- /dev/null +++ b/docs/about/features_index/taskrunner.rst @@ -0,0 +1,574 @@ +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +.. _running_the_task_runner: + +================ +Task Runner API +================ + + +An overview of this workflow is shown below. + +.. figure:: ../../images/openfl_flow.png + +.. centered:: Overview of the Aggregator-Based Workflow + +There are two ways to run federation without Director: + +- `Bare Metal Approach`_ +- `Docker Approach`_ + + +This workflow uses short-lived components in a federation, which is terminated when the experiment is finished. The components are as follows: + +- The *Collaborator* uses a local dataset to train a global model and the *Aggregator* receives model updates from *Collaborators* and aggregates them to create the new global model. +- The *Aggregator* is framework-agnostic, while the *Collaborator* can use any deep learning frameworks, such as `TensorFlow `_\* \ or `PyTorch `_\*\. + + +For this workflow, you modify the federation workspace to your requirements by editing the Federated Learning plan (FL plan) along with the Python\*\ code that defines the model and the data loader. The FL plan is a `YAML `_ file that defines the collaborators, aggregator, connections, models, data, and any other parameters that describe the training. + + +.. _plan_settings: + + +Federated Learning Plan (FL Plan) Settings +------------------------------------------ + +.. note:: + Use the Federated Learning plan (FL plan) to modify the federation workspace to your requirements in an **aggregator-based workflow**. + + +In order for participants to agree to take part in an experiment, everyone should know ahead of time both what code is going to run on their infrastructure and exactly what information on their system will be accessed. The federated learning (FL) plan aims to capture all of this information needed to decide whether to participate in an experiment, in addition to runtime details needed to load the code and make remote connections. +The FL plan is described by the **plan.yaml** file located in the **plan** directory of the workspace. + +Configurable Settings +^^^^^^^^^^^^^^^^^^^^^ + +- :class:`Aggregator ` + `openfl.component.Aggregator `_ + Defines the settings for the aggregator which is the model-owner in the experiment. While models can be trained from scratch, in many cases the federation performs fine-tuning of a previously trained model. For this reason, pre-trained weights for the model are stored in protobuf files on the aggregator node and passed to collaborator nodes during initialization. The settings for aggregator include: + + - :code:`init_state_path`: (str:path) Defines the weight protobuf file path where the experiment's initial weights will be loaded from. These weights will be generated with the `fx plan initialize` command. + - :code:`best_state_path`: (str:path) Defines the weight protobuf file path that will be saved to for the highest accuracy model during the experiment. + - :code:`last_state_path`: (str:path) Defines the weight protobuf file path that will be saved to during the last round completed in each experiment. + - :code:`rounds_to_train`: (int) Specifies the number of rounds in a federation. A federated learning round is defined as one complete iteration when the collaborators train the model and send the updated model weights back to the aggregator to form a new global model. Within a round, collaborators can train the model for multiple iterations called epochs. + - :code:`write_logs`: (boolean) Metric logging callback feature. By default, logging is done through `tensorboard `_ but users can also use custom metric logging function for each task. + + +- :class:`Collaborator ` + `openfl.component.Collaborator `_ + Defines the settings for the collaborator which is the data owner in the experiment. The settings for collaborator include: + + - :code:`delta_updates`: (boolean) Determines whether the difference in model weights between the current and previous round will be sent (True), or if whole checkpoints will be sent (False). Setting to delta_updates to True leads to higher sparsity in model weights sent across, which may improve compression ratios. + - :code:`opt_treatment`: (str) Defines the optimizer state treatment policy. Valid options are : 'RESET' - reinitialize optimizer for every round (default), 'CONTINUE_LOCAL' - keep local optimizer state for every round, 'CONTINUE_GLOBAL' - aggregate optimizer state for every round. + + +- :class:`Data Loader ` + `openfl.federated.data.loader.DataLoader `_ + Defines the data loader class that provides access to local dataset. It implements a train loader and a validation loader that takes in the train dataset and the validation dataset respectively. The settings for the dataloader include: + + - :code:`collaborator_count`: (int) The number of collaborators participating in the federation + - :code:`data_group_name`: (str) The name of the dataset + - :code:`batch_size`: (int) The size of the training or validation batch + + +- :class:`Task Runner ` + `openfl.federated.task.runner.TaskRunner `_ + Defines the model, training/validation functions, and how to extract and set the tensors from model weights and optimizer dictionary. Depending on different AI frameworks like PyTorch and Tensorflow, users can select pre-defined task runner methods. + + +- :class:`Assigner ` + `openfl.component.Assigner `_ + Defines the task that are sent to the collaborators from the aggregator. There are three default tasks that could be given to each Collaborator: + + - :code:`aggregated_model_validation`: (str) Perform validation on aggregated global model sent by the aggregator. + - :code:`train`: (str) Perform training on the global model. + - :code:`locally_tuned_model_validation`: (str) Perform validation on the model that was locally trained by the collaborator. + + +Each YAML top-level section contains the following subsections: + +- ``template``: The name of the class including top-level packages names. An instance of this class is created when the plan gets initialized. +- ``settings``: The arguments that are passed to the class constructor. +- ``defaults``: The file that contains default settings for this subsection. + Any setting from defaults file can be overridden in the **plan.yaml** file. + +The following is an example of a **plan.yaml**: + +.. literalinclude:: ../../../openfl-workspace/torch_cnn_mnist/plan/plan.yaml + :language: yaml + + +Tasks +^^^^^ + +Each task subsection contains the following: + +- ``function``: The function name to call. + The function must be the one defined in :class:`TaskRunner ` class. +- ``kwargs``: kwargs passed to the ``function``. + +.. note:: + See an `example `_ of the :class:`TaskRunner ` class for details. + + +.. _running_the_federation_manual: + + +.. _interactive_api: + + + +Bare Metal Approach +------------------- + +.. note:: + + Ensure you have installed the |productName| package on every node (aggregator and collaborators) in the federation. + + See :ref:`install_package` for details. + + + + `STEP 1: Create a Workspace`_ + + - Creates a federated learning workspace on one of the nodes. + + + `STEP 2: Configure the Federation`_ + + - Ensures each node in the federation has a valid public key infrastructure (PKI) certificate. + - Distributes the workspace from the aggregator node to the other collaborator nodes. + + + `STEP 3: Start the Federation`_ + + +.. _creating_workspaces: + + +STEP 1: Create a Workspace +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +1. Start a Python 3.8 (>=3.6, <3.11) virtual environment and confirm |productName| is available. + + .. code-block:: python + + fx + + +2. This example uses the :code:`keras_cnn_mnist` template. + + Set the environment variables to use the :code:`keras_cnn_mnist` as the template and :code:`${HOME}/my_federation` as the path to the workspace directory. + + .. code-block:: console + + export WORKSPACE_TEMPLATE=keras_cnn_mnist + export WORKSPACE_PATH=${HOME}/my_federation + +3. Decide a workspace template, which are end-to-end federated learning training demonstrations. The following is a sample of available templates: + + - :code:`keras_cnn_mnist`: a workspace with a simple `Keras `__ CNN model that will download the `MNIST `_ dataset and train in a federation. + - :code:`tf_2dunet`: a workspace with a simple `TensorFlow `__ CNN model that will use the `BraTS `_ dataset and train in a federation. + - :code:`tf_cnn_histology`: a workspace with a simple `TensorFlow `__ CNN model that will download the `Colorectal Histology `_ dataset and train in a federation. + - :code:`torch_cnn_histology`: a workspace with a simple `PyTorch `__ CNN model that will download the `Colorectal Histology `_ dataset and train in a federation. + - :code:`torch_cnn_mnist`: a workspace with a simple `PyTorch `__ CNN model that will download the `MNIST `_ dataset and train in a federation. + + See the complete list of available templates. + + .. code-block:: console + + fx workspace create --prefix ${WORKSPACE_PATH} + + +4. Create a workspace directory for the new federation project. + + .. code-block:: console + + fx workspace create --prefix ${WORKSPACE_PATH} --template ${WORKSPACE_TEMPLATE} + + + .. note:: + + You can use your own models by overwriting the Python scripts in the **src** subdirectory in the workspace directory. + +5. Change to the workspace directory. + + .. code-block:: console + + cd ${WORKSPACE_PATH} + +6. Install the workspace requirements: + + .. code-block:: console + + pip install -r requirements.txt + + +7. Create an initial set of random model weights. + + .. note:: + + While models can be trained from scratch, in many cases the federation performs fine-tuning of a previously trained model. For this reason, pre-trained weights for the model are stored in protobuf files on the aggregator node and passed to collaborator nodes during initialization. + + The protobuf file with the initial weights is found in **${WORKSPACE_TEMPLATE}_init.pbuf**. + + + .. code-block:: console + + fx plan initialize + + + This command initializes the FL plan and auto populates the `fully qualified domain name (FQDN) `_ of the aggregator node. This FQDN is embedded within the FL plan so the collaborator nodes know the address of the externally accessible aggregator server to connect to. + + If you have connection issues with the auto populated FQDN in the FL plan, you can do **one of the following**: + + - OPTION 1: override the auto populated FQDN value with the :code:`-a` flag. + + .. code-block:: console + + fx plan initialize -a aggregator-hostname.internal-domain.com + + - OPTION 2: override the apparent FQDN of the system by setting an FQDN environment variable. + + .. code-block:: console + + export FQDN=x.x.x.x + + and initializing the FL plan + + .. code-block:: console + + fx plan initialize + + +.. note:: + + Each workspace may have multiple FL plans and multiple collaborator lists associated with it. Therefore, :code:`fx plan initialize` has the following optional parameters. + + +-------------------------+---------------------------------------------------------+ + | Optional Parameters | Description | + +=========================+=========================================================+ + | -p, --plan_config PATH | Federated Learning plan [default = plan/plan.yaml] | + +-------------------------+---------------------------------------------------------+ + | -c, --cols_config PATH | Authorized collaborator list [default = plan/cols.yaml] | + +-------------------------+---------------------------------------------------------+ + | -d, --data_config PATH | The data set/shard configuration file | + +-------------------------+---------------------------------------------------------+ + + + +.. _configure_the_federation: + + +STEP 2: Configure the Federation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The objectives in this step: + + - Ensure each node in the federation has a valid public key infrastructure (PKI) certificate. See :doc:`../../developer_guide/utilities/pki` for details on available workflows. + - Distribute the workspace from the aggregator node to the other collaborator nodes. + + +.. _install_certs_agg: + +**On the Aggregator Node:** + +Setting Up the Certificate Authority + +1. Change to the path of your workspace: + + .. code-block:: console + + cd WORKSPACE_PATH + +2. Set up the aggregator node as the `certificate authority `_ for the federation. + + All certificates will be signed by the aggregator node. Follow the instructions and enter the information as prompted. The command will create a simple database file to keep track of all issued certificates. + + .. code-block:: console + + fx workspace certify + +3. Run the aggregator certificate creation command, replacing :code:`AFQDN` with the actual `fully qualified domain name (FQDN) `_ for the aggregator node. + + .. code-block:: console + + fx aggregator generate-cert-request --fqdn AFQDN + + .. note:: + + On Linux\*\, you can discover the FQDN with this command: + + .. code-block:: console + + hostname --all-fqdns | awk '{print $1}' + + .. note:: + + You can override the apparent FQDN of the system by setting an FQDN environment variable before creating the certificate. + + .. code-block:: console + + fx aggregator generate-cert-request export FQDN=x.x.x.x + + If you omit the :code:`--fdqn` parameter, then :code:`fx` will automatically use the FQDN of the current node assuming the node has been correctly set with a static address. + + .. code-block:: console + + fx aggregator generate-cert-request + +4. Run the aggregator certificate signing command, replacing :code:`AFQDN` with the actual `fully qualified domain name (FQDN) `_ for the aggregator node. + + .. code-block:: console + + fx aggregator certify --fqdn AFQDN + + + .. note:: + + You can override the apparent FQDN of the system by setting an FQDN environment variable (:code:`export FQDN=x.x.x.x`) before signing the certificate. + + .. code-block:: console + + fx aggregator certify export FQDN=x.x.x.x + +5. This node now has a signed security certificate as the aggregator for this new federation. You should have the following files. + + +---------------------------+--------------------------------------------------+ + | File Type | Filename | + +===========================+==================================================+ + | Certificate chain | WORKSPACE.PATH/cert/cert_chain.crt | + +---------------------------+--------------------------------------------------+ + | Aggregator certificate | WORKSPACE.PATH/cert/server/agg_{AFQDN}.crt | + +---------------------------+--------------------------------------------------+ + | Aggregator key | WORKSPACE.PATH/cert/server/agg_{AFQDN}.key | + +---------------------------+--------------------------------------------------+ + + where **AFQDN** is the fully-qualified domain name of the aggregator node. + +.. _workspace_export: + +Exporting the Workspace + + +1. Export the workspace so that it can be imported to the collaborator nodes. + + .. code-block:: console + + fx workspace export + + The :code:`export` command will archive the current workspace (with a :code:`zip` file extension) and create a **requirements.txt** of the current Python\*\ packages in the virtual environment. + +2. The next step is to transfer this workspace archive to each collaborator node. + + +.. _install_certs_colab: + +**On the Collaborator Node**: + +Importing the Workspace + +1. Copy the :ref:`workspace archive ` from the aggregator node to the collaborator nodes. + +2. Import the workspace archive. + + .. code-block:: console + + fx workspace import --archive WORKSPACE.zip + + where **WORKSPACE.zip** is the name of the workspace archive. This will unzip the workspace to the current directory and install the required Python packages within the current virtual environment. + +3. For each test machine you want to run as collaborator nodes, create a collaborator certificate request to be signed by the certificate authority. + + Replace :code:`COL_LABEL` with the label you assigned to the collaborator. This label does not have to be the FQDN; it can be any unique alphanumeric label. + + .. code-block:: console + + fx collaborator create -n {COL_LABEL} -d {DATA_PATH:optional} + fx collaborator generate-cert-request -n {COL_LABEL} + + + The creation script will also ask you to specify the path to the data. For this example, enter the integer that represents which MNIST shard to use on this collaborator node. For the first collaborator node enter **1**. For the second collaborator node enter **2**. + + This will create the following files: + + +-----------------------------+--------------------------------------------------------+ + | File Type | Filename | + +=============================+========================================================+ + | Collaborator CSR | WORKSPACE.PATH/cert/client/col_{COL_LABEL}.csr | + +-----------------------------+--------------------------------------------------------+ + | Collaborator key | WORKSPACE.PATH/cert/client/col_{COL_LABEL}.key | + +-----------------------------+--------------------------------------------------------+ + | Collaborator CSR Package | WORKSPACE.PATH/col_{COL_LABEL}_to_agg_cert_request.zip | + +-----------------------------+--------------------------------------------------------+ + + +4. On the aggregator node (i.e., the certificate authority in this example), sign the Collaborator CSR Package from the collaborator nodes. + + .. code-block:: console + + fx collaborator certify --request-pkg /PATH/TO/col_{COL_LABEL}_to_agg_cert_request.zip + + where :code:`/PATH/TO/col_{COL_LABEL}_to_agg_cert_request.zip` is the path to the Collaborator CSR Package containing the :code:`.csr` file from the collaborator node. The certificate authority will sign this certificate for use in the federation. + + The command packages the signed collaborator certificate, along with the **cert_chain.crt** file needed to verify certificate signatures, for transport back to the collaborator node: + + +---------------------------------+------------------------------------------------------------+ + | File Type | Filename | + +=================================+============================================================+ + | Certificate and Chain Package | WORKSPACE.PATH/agg_to_col_{COL_LABEL}_signed_cert.zip | + +---------------------------------+------------------------------------------------------------+ + +5. On the collaborator node, import the signed certificate and certificate chain into your workspace. + + .. code-block:: console + + fx collaborator certify --import /PATH/TO/agg_to_col_{COL_LABEL}_signed_cert.zip + + + +.. _running_the_federation.start_nodes: + + +STEP 3: Start the Federation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +**On the Aggregator Node:** + +1. Start the Aggregator. + + .. code-block:: console + + fx aggregator start + + Now, the Aggregator is running and waiting for Collaborators to connect. + +.. _running_collaborators: + +**On the Collaborator Nodes:** + +1. Open a new terminal, change the directory to the workspace, and activate the virtual environment. + +2. Run the Collaborator. + + .. code-block:: console + + fx collaborator start -n {COLLABORATOR_LABEL} + + where :code:`COLLABORATOR_LABEL` is the label for this Collaborator. + + .. note:: + + Each workspace may have multiple FL plans and multiple collaborator lists associated with it. + Therefore, :code:`fx collaborator start` has the following optional parameters. + + +-------------------------+---------------------------------------------------------+ + | Optional Parameters | Description | + +=========================+=========================================================+ + | -p, --plan_config PATH | Federated Learning plan [default = plan/plan.yaml] | + +-------------------------+---------------------------------------------------------+ + | -d, --data_config PATH | The data set/shard configuration file | + +-------------------------+---------------------------------------------------------+ + +3. Repeat the earlier steps for each collaborator node in the federation. + + When all of the Collaborators connect, the Aggregator starts training. You will see log messages describing the progress of the federated training. + + When the last round of training is completed, the Aggregator stores the final weights in the protobuf file that was specified in the YAML file, which in this example is located at **save/${WORKSPACE_TEMPLATE}_latest.pbuf**. + + +Post Experiment +^^^^^^^^^^^^^^^ + +Experiment owners may access the final model in its native format. +Among other training artifacts, the aggregator creates the last and best aggregated (highest validation score) model snapshots. One may convert a snapshot to the native format and save the model to disk by calling the following command from the workspace: + +.. code-block:: console + + fx model save -i model_protobuf_path.pth -o save_model_path + +In order for this command to succeed, the **TaskRunner** used in the experiment must implement a :code:`save_native()` method. + +Another way to access the trained model is by calling the API command directly from a Python script: + +.. code-block:: python + + from openfl import get_model + model = get_model(plan_config, cols_config, data_config, model_protobuf_path) + +In fact, the :code:`get_model()` method returns a **TaskRunner** object loaded with the chosen model snapshot. Users may utilize the linked model as a regular Python object. + + +.. _running_the_federation_docker: + + +Docker Approach +--------------- + +There are two ways you can run |productName| with Docker\*\. + +- `Option 1: Deploy a Federation in a Docker Container`_ +- `Option 2: Deploy Your Workspace in a Docker Container`_ + + +.. _running_the_federation_docker_base_image: + +Option 1: Deploy a Federation in a Docker Container +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. note:: + You have to built an |productName| image. See :ref:`install_docker` for details. + + +1. Run the |productName| image. + + .. code-block:: console + + docker run -it --network host openfl + + +You can now experiment with |productName| in the container. For example, you can test the project pipeline with the `"Hello Federation" bash script `_. + + +.. _running_the_federation_docker_workspace: + +Option 2: Deploy Your Workspace in a Docker Container +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. note:: + You have to set up a TaskRunner and run :code:`fx plan initialize` in the workspace directory. See `STEP 1: Create a Workspace`_ for details. + + +1. Build an image with the workspace you created. + + .. code-block:: console + + fx workspace dockerize + + + By default, the image is saved as **WORKSPACE_NAME_image.tar** in the workspace directory. + +2. The image can be distributed and run on other nodes without any environment preparation. + + .. parsed-literal:: + + docker run -it --rm \\ + --network host \\ + -v user_data_folder:/home/user/workspace/data \\ + ${WORKSPACE_IMAGE_NAME} \\ + bash + + + .. note:: + + The FL plan should be initialized with the FQDN of the node where the aggregator container will be running. + +3. Generate public key infrastructure (PKI) certificates for all collaborators and the aggregator. See :doc:`../../developer_guide/utilities/pki` for details. + +4. `STEP 3: Start the Federation`_. + +.. toctree +.. overview.how_can_intel_protect_federated_learning +.. overview.what_is_intel_federated_learning \ No newline at end of file diff --git a/docs/workflow_interface.rst b/docs/about/features_index/workflowinterface.rst similarity index 99% rename from docs/workflow_interface.rst rename to docs/about/features_index/workflowinterface.rst index 0886aa3fc9..d09655f96c 100644 --- a/docs/workflow_interface.rst +++ b/docs/about/features_index/workflowinterface.rst @@ -9,7 +9,7 @@ Workflow Interface **Important Note** -The OpenFL workflow interface is experimental, subject to change, and is currently limited to single node execution. To setup and launch a real federation, see :doc:`running_the_federation` +The OpenFL workflow interface is experimental, subject to change, and is currently limited to single node execution. To setup and launch a real federation, see :ref:`running_a_federation` What is it? =========== @@ -378,4 +378,4 @@ Our goal is to make it a one line change to configure where and how a flow is ex # A future example of how the same flow could be run on distributed infrastructure federated_runtime = FederatedRuntime(...) flow.runtime = federated_runtime - flow.run() + flow.run() \ No newline at end of file diff --git a/docs/about/license.rst b/docs/about/license.rst new file mode 100644 index 0000000000..605ed297b5 --- /dev/null +++ b/docs/about/license.rst @@ -0,0 +1,9 @@ +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +========== +License +========== + +This project is licensed under `Apache License Version 2.0 `_. +By contributing to the project, you agree to the license and copyright terms therein and release your contribution under these terms. \ No newline at end of file diff --git a/docs/notices_and_disclaimers.rst b/docs/about/notices_and_disclaimers.rst similarity index 100% rename from docs/notices_and_disclaimers.rst rename to docs/about/notices_and_disclaimers.rst diff --git a/docs/overview.how_can_intel_protect_federated_learning.rst b/docs/about/overview.how_can_intel_protect_federated_learning.rst similarity index 97% rename from docs/overview.how_can_intel_protect_federated_learning.rst rename to docs/about/overview.how_can_intel_protect_federated_learning.rst index bd90dfd849..40afe5f75e 100644 --- a/docs/overview.how_can_intel_protect_federated_learning.rst +++ b/docs/about/overview.how_can_intel_protect_federated_learning.rst @@ -15,7 +15,7 @@ without a `cryptographic key `_ and `SContain `_. -.. figure:: images/graphene.png +.. figure:: ../images/graphene.png :alt: graphene :scale: 40% diff --git a/docs/overview.rst b/docs/about/overview.rst similarity index 97% rename from docs/overview.rst rename to docs/about/overview.rst index a824e9aa34..e0a6eb5134 100644 --- a/docs/overview.rst +++ b/docs/about/overview.rst @@ -12,7 +12,7 @@ Overview Open Federated Learning (OpenFL) is a Python\*\ 3 project developed by Intel Internet of Things Group (IOTG) and Intel Labs. -.. figure:: images/ct_vs_fl.png +.. figure:: ../images/ct_vs_fl.png .. centered:: Federated Learning @@ -30,7 +30,7 @@ or classified secrets (`McMahan, 2016 `_; `Sheller et al., 2020 `_). In federated learning, the model moves to meet the data rather than the data moving to meet the model. The movement of data across the federation are the model parameters and their updates. -.. figure:: images/diagram_fl_new.png +.. figure:: ../images/diagram_fl_new.png .. centered:: Federated Learning diff --git a/docs/overview.what_is_intel_federated_learning.rst b/docs/about/overview.what_is_intel_federated_learning.rst similarity index 95% rename from docs/overview.what_is_intel_federated_learning.rst rename to docs/about/overview.what_is_intel_federated_learning.rst index 1cd8e77832..45932a71fc 100644 --- a/docs/overview.what_is_intel_federated_learning.rst +++ b/docs/about/overview.what_is_intel_federated_learning.rst @@ -11,7 +11,7 @@ attacks that are well documented in the literature. With Intel\ :sup:`®` \ SGX every node in the federation, risks are mitigated even if the nodes are not fully-controlled by the federation owner. -.. figure:: images/trusted_fl.png +.. figure:: ../images/trusted_fl.png .. centered:: Intel\ :sup:`®` \ Federated Learning @@ -31,6 +31,6 @@ ran the expected code within the enclave. Attestation can either be done via a trusted Intel server or by the developers own server. This stops attackers from injecting their own code into the federated training. -.. figure:: images/why_intel_fl.png +.. figure:: ../images/why_intel_fl.png .. centered:: Why Intel\ :sup:`®` \ Federated Learning diff --git a/docs/about/releases.md b/docs/about/releases.md new file mode 100644 index 0000000000..1f6772c6d8 --- /dev/null +++ b/docs/about/releases.md @@ -0,0 +1,117 @@ +Releases +========== + +## 1.5.1 +[Full Release Notes](https://github.com/securefederatedai/openfl/releases/tag/v1.5.1) + +We are excited to announce the release of OpenFL 1.5.1 - our first since moving to LF AI & Data! This release brings the following changes. + +### Highlights +- **Documentation accessibility improvements**: As part of our [Global Accessibility Awareness Day](https://www.intel.com/content/www/us/en/developer/articles/community/open-fl-project-improve-accessibility-for-devs.html) (GAAD) Pledge, the OpenFL project is making strides towards more accessible documentation. This release includes the integration of [Intel® One Mono](https://www.intel.com/content/www/us/en/company-overview/one-monospace-font.html) font, contrast color improvements, formatting improvements, and [new accessibility focused issues](https://github.com/securefederatedai/openfl/issues?q=is%3Aissue+is%3Aopen+accessibility) to take up in the future. +- **[Documentation to federate a Generally Nuanced Deep Learning Framework (GaNDLF) model with OpenFL](https://openfl.readthedocs.io/en/latest/running_the_federation_with_gandlf.html)** +- **New OpenFL Interactive API Tutorials**: + - [Linear regression with SciKit-Learn](https://github.com/securefederatedai/openfl/tree/develop/openfl-tutorials/interactive_api/scikit_learn_linear_regression) + - [MedMNIST 2D Classification Using FedProx Optimizer](https://github.com/securefederatedai/openfl/blob/develop/openfl-tutorials/interactive_api/PyTorch_FedProx_MNIST/README.md?plain=1) + - [PyTorch Linear Regression Example](https://github.com/securefederatedai/openfl/tree/develop/openfl-tutorials/interactive_api/PyTorch_LinearRegression) +- **Improvements to workspace export and import** +- **Many documentation improvements and updates** +- **Bug fixes** +- **Fixing dependency vulnerabilities** + +## 1.5 +[Full Release Notes](https://github.com/securefederatedai/openfl/releases/tag/v1.5) + +### Highlights +* **New Workflows Interface (Experimental)** - a new way of composing federated learning experiments inspired by [Metaflow](https://github.com/Netflix/metaflow). Enables the creation of custom aggregator and collaborators tasks. This initial release is intended for simulation on a single node (using the LocalRuntime); distributed execution (FederatedRuntime) to be enabled in a future release. +* **New use cases enabled by the workflow interface**: + * **[End-of-round validation with aggregator dataset](https://github.com/intel/openfl/blob/develop/openfl-tutorials/experimental/Workflow_Interface_102_Aggregator_Validation.ipynb)** + * **[Privacy Meter](https://github.com/intel/openfl/tree/develop/openfl-tutorials/experimental/Privacy_Meter)** - Privacy meter, based on state-of-the-art membership inference attacks, provides a tool to quantitatively audit data privacy in statistical and machine learning algorithms. The objective of a membership inference attack is to determine whether a given data record was in the training dataset of the target model. Measures of success (accuracy, area under the ROC curve, true positive rate at a given false positive rate ...) for particular membership inference attacks against a target model are used to estimate privacy loss for that model (how much information a target model leaks about its training data). Since stronger attacks may be possible, these measures serve as lower bounds of the actual privacy loss. The Privacy Meter workflow example generates privacy loss reports for all party's local model updates as well as the global models throughout all rounds of the FL training. + * **[Vertical Federated Learning Examples](https://github.com/intel/openfl/tree/develop/openfl-tutorials/experimental/Vertical_FL)** + * **[Federated Model Watermarking](https://github.com/intel/openfl/blob/develop/openfl-tutorials/experimental/Workflow_Interface_301_MNIST_Watermarking.ipynb)** using the [WAFFLE](https://arxiv.org/pdf/2008.07298.pdf) method + * **[Differential Privacy](https://github.com/intel/openfl/tree/develop/openfl-tutorials/experimental/Global_DP)** – Global differentially private federated learning using Opacus library to achieve a differentially private result w.r.t the inclusion or exclusion of any collaborator in the training process. At each round, a subset of collaborators are selected using a Poisson distribution over all collaborators, the selected collaborators perform local training with periodic clipping of their model delta (with respect to the current global model) to bound their contribution to the average of local model updates. Gaussian noise is then added to the average of these local models at the aggregator. This example is implemented in two different but statistically equivalent ways – the lower level API utilizes RDPAccountant and DPDataloader Opacus objects to perform privacy accounting and collaborator selection respectively, whereas the higher level API uses PrivacyEngine Opacus object for collaborator selection and internally utilizes RDPAccountant for privacy accounting. +* **[Habana Accelerator Support](https://github.com/intel/openfl/tree/develop/openfl-tutorials/interactive_api/HPU/PyTorch_TinyImageNet)** +* **Official support for Python 3.9 and 3.10** +* **[EDEN Compression Pipeline](https://github.com/intel/openfl/blob/develop/openfl/pipelines/eden_pipeline.py)**: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning ([paper link](https://proceedings.mlr.press/v162/vargaftik22a.html)) +* **[FLAX Framework Support](https://github.com/intel/openfl/tree/develop/openfl-tutorials/interactive_api/Flax_CNN_CIFAR)** +* **Improvements to the resiliency and security of the director / envoy infrastructure**: + * Optional notification to plan participants to agree to experiment sent to their infrastructure + * Improved resistance to loss of network connectivity and failure at various stages of execution +* **Windows Support (Experimental)**: Continuous Integration now tests OpenFL on Windows, but certain features may not work as expected. Full Windows support will be added in a future release. + +## 1.4 +[Full Release Notes](https://github.com/securefederatedai/openfl/releases/tag/v1.4) + +The OpenFL v1.4 release contains the following: + +- [Straggler Handling](https://github.com/intel/openfl/pull/465)​ +- tf.data [Pipeline Example​](https://github.com/intel/openfl/pull/440) +- [`PrivilegedAggregationFunction`](https://github.com/intel/openfl/pull/417) Interface​ +- FeTS Challenge [Task Runner](https://github.com/intel/openfl/pull/419)​ +- [JAX Framework Support](https://github.com/intel/openfl/pull/443) +- Bug fixes and other improvements + +## 1.3 +[Full Release Notes](https://github.com/securefederatedai/openfl/releases/tag/v1.3) + +The OpenFL v1.3 release contains the following updates: + +* [Task Assigner functionality](https://github.com/intel/openfl/pull/343) +* [OpenFL + Gramine to support workloads within SGX](https://github.com/intel/openfl/pull/339) +* [FedCurv aggregation](https://github.com/intel/openfl/pull/167) algorithm +* [HuggingFace/transformers audio classification example using SUPERB dataset](https://github.com/intel/openfl/pull/340) +* [PyTorch Lightning GAN](https://github.com/intel/openfl/pull/287) example +* NumPy Linear Regression example in [Google Colab](https://github.com/intel/openfl/pull/286) +* [Adaptive Federated Optimization ](https://github.com/intel/openfl/issues/281) algorithms implementation: `FedYogi`, `FedAdagrad`, `FedAdam` +* [MXNet landmarks regression example](https://github.com/intel/openfl/pull/349) as a custom plugin to OpenFL +* Migration to [JupyterLab](https://github.com/intel/openfl/pull/307) +* Bug fixes and other improvements + +## 1.2 +[Full Release Notes](https://github.com/securefederatedai/openfl/releases/tag/v1.2) + +The OpenFL v1.2 release contains the following updates: + +- Long-living entities: [Director/Envoy](https://github.com/intel/openfl/issues/120) for supporting multiple experiments within the same `Federation` +- [Scalable PKI](https://github.com/intel/openfl/issues/38): semi-automatic mechanism for certificates distribution via step-ca +- Examples with new Interactive API + Director/Envoy: [TensorFlow Next Word Prediction](https://github.com/intel/openfl/pull/183), [PyTorch Re-ID on Market](https://github.com/intel/openfl/pull/156), [PyTorch MobileNet v2 on TinyImageNet](https://github.com/intel/openfl/pull/170) +- [3D U-Net TensorFlow workspace for BraTS 2020 for CLI-based workflow](https://github.com/intel/openfl/pull/108) +- `AggregationFunction` interface for custom aggregation functions in new Interactive API +- Autocomplete of `fx` CLI +- Bug fixes and documentation improvements + + ## 1.1 +[Full Release Notes](https://github.com/securefederatedai/openfl/releases/tag/v1.1) + + The OpenFL v1.1 release contains the following updates: + +- New [Interactive Python API](https://github.com/intel/openfl/blob/develop/openfl-tutorials/interactive_api_tutorials_(experimental)/Pytorch_Kvasir_UNET_workspace/new_python_api_UNET.ipynb) (experimental) +- Example FedProx algorithm implementation for PyTorch and Tensorflow +- `AggregationFunctionInterface` for custom aggregation functions +- Adds a [Keras-based NLP Example](https://github.com/intel/openfl/tree/develop/openfl-workspace/keras_nlp) +- Fixed lossy compression pipelines and added an [example](https://github.com/intel/openfl/tree/develop/openfl-workspace/keras_cnn_with_compression) for usage +- Bug fixes and documentation improvements + + ## 1.0.1 +[Full Release Notes](https://github.com/securefederatedai/openfl/releases/tag/v1.0.1) + +v1.0.1 is a patch release. It includes the following updates: + +- New docker CI tests +- New Pytorch UNet Kvasir tutorial +- Cleanup / fixes to other OpenFL tutorials +- Fixed description for Pypi +- Status/documentation/community badges for README.md + + ## 1.0 +[Full Release Notes](https://github.com/securefederatedai/openfl/releases/tag/v1.0.1) + +This release includes: +- The official open source release of OpenFL +- Tensorflow 2.0 and PyTorch support +- Examples for classification, segmentation, and adversarial training +- No-install Docker and Singularity* deployments +- Python native API intended for single node federated learning experiments +- `fx` CLI for multi-node production deployments +- Additional test coverage for OpenFL components + +\* Singularity supported via DockerHub integration: `singularity shell docker://openfl:latest` \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py index 36e0bdf33d..5c8b93d1a9 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -35,14 +35,18 @@ 'sphinx_rtd_theme', 'sphinx.ext.autosectionlabel', 'sphinx-prompt', + 'sphinx_copybutton', 'sphinx_substitution_extensions', 'sphinx.ext.ifconfig', 'sphinxcontrib.mermaid', 'sphinx.ext.autodoc', - 'sphinx.ext.autosummary' + 'sphinx.ext.autosummary', + 'recommonmark' ] autosummary_generate = True # Turn on sphinx.ext.autosummary +source_suffix = ['.rst', '.md'] + # -- Project information ----------------------------------------------------- # This will replace the |variables| within the rST documents automatically @@ -70,9 +74,9 @@ # Config the returns section to behave like the Args section napoleon_custom_sections = [('Returns', 'params_style')] -# This code extends Sphinx's GoogleDocstring class to support 'Keys', 'Attributes', -# and 'Class Attributes' sections in docstrings. Allows for more detailed and structured -# documentation of Python classes and their attributes. +# This code extends Sphinx's GoogleDocstring class to support 'Keys', +# 'Attributes', and 'Class Attributes' sections in docstrings. Allows for more +# detailed and structured documentation of Python classes and their attributes. from sphinx.ext.napoleon.docstring import GoogleDocstring # Define new sections and their corresponding parse methods @@ -87,6 +91,7 @@ setattr(GoogleDocstring, f'_parse_{section}_section', lambda self, section: self._format_fields(title, self._consume_fields())) + # Patch the parse method to include new sections def patched_parse(self): for section in new_sections: @@ -103,8 +108,9 @@ def patched_parse(self): # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This pattern also affects html_static_path and html_extra_path. -exclude_patterns = ['_build', 'Thumbs.db', +exclude_patterns = ['_build', 'Thumbs.db', 'README.md', 'structurizer_dsl/README.md', '.DS_Store', 'tutorials/*', 'graveyard/*', '_templates'] + # add temporary unused files exclude_patterns.extend(['modules.rst', 'install.singularity.rst', @@ -125,3 +131,7 @@ def patched_parse(self): html_static_path = ['_static'] html_style = 'css/Intel_One_Mono_Font_Theme.css' autosectionlabel_prefix_document = True + + +def setup(app): + app.add_css_file('css/custom.css') diff --git a/docs/contributing_guidelines/contributing.md b/docs/contributing_guidelines/contributing.md new file mode 100644 index 0000000000..e060398a91 --- /dev/null +++ b/docs/contributing_guidelines/contributing.md @@ -0,0 +1,102 @@ +Contributing to OpenFL +===================================================================== + +We welcome contributions from the community. We believe that anyone can bring something valuable to OpenFL and help us to improve the project. This document explains how to contribute to OpenFL. + +We accept various contributions from documentation improvement and bug fixing to major features proposals and [roadmap](https://github.com/intel/openfl/blob/develop/ROADMAP.md) suggestions. + +Documentation improvement: review our [documentation](https://openfl.readthedocs.io/en/latest/install.html) and let us know if something is not clear or not relevant. +Propose your own formulations or even write new section explaining something that you know how works, but do not see in the documentation. +Propose it through GitHub [issues](https://github.com/intel/openfl/issues/new/choose) or [Discussions](https://github.com/intel/openfl/discussions). + +To propose bugs, new features, or other code improvements: + +1. Check open and closed [issues](https://github.com/intel/openfl/issues) and make sure there is no similar proposal. +2. Open a [new issue](https://github.com/intel/openfl/issues/new/choose), select a relevant category (Bug report / Feature request / Report a security vulnerability) and describe your idea using the template. +3. If you want to fix a bug or create this feature by yourself, prepare a contribution. + - Format your code following the [flake8 style](https://flake8.pycqa.org/en/latest/). + - Make sure that your code is original and corresponds to [OpenFL license](#license). + - Sing your work - [see below](#sign-your-work). + - Create a [pull request](#formatting-of-pull-requests) and wait for feedback. + - Verify that all tests in our [CI/CD pipeline](#Continuous-Integration-and-Continuous-Development) passed. +4. Hurrah! You are a new contributor to OpenFL! You will see your name in released notes of the subsequent releases!😊 + +Join our [Slack](https://join.slack.com/t/openfl/shared_invite/zt-ovzbohvn-T5fApk05~YS_iZhjJ5yaTw) and [Community meetings](https://github.com/intel/openfl#support) and participate in the discussions. + +Are you an expert in Federated Learning and want to contribute to our roadmap? You can nominate yourself as a member of our Technical Steering Committee and be part of the OpenFL decision making group. Please reach us through our [Slack](https://join.slack.com/t/openfl/shared_invite/zt-ovzbohvn-T5fApk05~YS_iZhjJ5yaTw). + +### Code format and style + +We use [flake8](https://flake8.pycqa.org/en/latest/) for PEP8 style guide enforcement. This is run as a part of our CI/CD pipeline and it’s required prior a merge. + +### Formatting of Pull Requests + +OpenFL follows standard recommendations of PR formatting. Please find more details [here](https://github.blog/2015-01-21-how-to-write-the-perfect-pull-request/). + +### Continuous Integration and Continuous Development + +OpenFL uses GitHub actions to perform all functional and unit tests. Before your contribution can be merged make sure that all your tests are passing. +For more information of what fails you can click on the “details” link near the pipeline that failed. + +![CI/CD](../images/CI_details.png) + +### Writing the tests + +The OpenFL team recommend including tests for all new features contributions. Test can be found in the “Tests” directory. +The [Tests/OpenFL folder](https://github.com/intel/openfl/tree/develop/tests/openfl) contains unit tests and the [Tests/GitHub folder](https://github.com/intel/openfl/tree/develop/tests/github) contains end-to-end and functional tests. + +### License + +OpenFL is licensed under the terms in [Apache 2.0 license](https://github.com/intel/openfl/blob/develop/LICENSE). By contributing to the project, you agree to the license and copyright terms therein and release your contribution under these terms. + +### Sign your work + +Please use the sign-off line at the end of the patch. Your signature certifies that you wrote the patch or otherwise have the right to pass it on as an open-source patch. The rules are pretty simple: if you can certify +the below (from [developercertificate.org](http://developercertificate.org/)): + +``` +Developer Certificate of Origin +Version 1.1 + +Copyright (C) 2004, 2006 The Linux Foundation and its contributors. +660 York Street, Suite 102, +San Francisco, CA 94110 USA + +Everyone is permitted to copy and distribute verbatim copies of this +license document, but changing it is not allowed. + +Developer's Certificate of Origin 1.1 + +By making a contribution to this project, I certify that: + +(a) The contribution was created in whole or in part by me and I + have the right to submit it under the open source license + indicated in the file; or + +(b) The contribution is based upon previous work that, to the best + of my knowledge, is covered under an appropriate open source + license and I have the right under that license to submit that + work with modifications, whether created in whole or in part + by me, under the same open source license (unless I am + permitted to submit under a different license), as indicated + in the file; or + +(c) The contribution was provided directly to me by some other + person who certified (a), (b) or (c) and I have not modified + it. + +(d) I understand and agree that this project and the contribution + are public and that a record of the contribution (including all + personal information I submit with it, including my sign-off) is + maintained indefinitely and may be redistributed consistent with + this project or the open source license(s) involved. +``` + +Then you just add a line to every git commit message: + + Signed-off-by: Joe Smith + +Use your real name (sorry, no pseudonyms or anonymous contributions.) + +If you set your `user.name` and `user.email` git configs, you can sign your +commit automatically with `git commit -s`. \ No newline at end of file diff --git a/docs/advanced_topics.rst b/docs/developer_guide/advanced_topics.rst similarity index 52% rename from docs/advanced_topics.rst rename to docs/developer_guide/advanced_topics.rst index c7596dfe61..c3c687d1b2 100644 --- a/docs/advanced_topics.rst +++ b/docs/developer_guide/advanced_topics.rst @@ -10,48 +10,47 @@ Advanced Topics Speed up activating Open Federated Learning (|productName|) commands: - - :doc:`bash_autocomplete_activation` + - :doc:`advanced_topics/bash_autocomplete_activation` **Aggregator-Based Workflow** Learn to manage multiple Federation Learning plans (FL plan) in the same workspace: - - :doc:`multiple_plans` + - :doc:`advanced_topics/multiple_plans` Reduce the amount of data transferred in a federation through compression pipelines available in |productName|: - - :doc:`compression_settings` + - :doc:`advanced_topics/compression_settings` Customize the aggregation function for each task: - - :doc:`overriding_agg_fn` + - :doc:`advanced_topics/overriding_agg_fn` Customize straggler handling function: - - :doc:`straggler_handling_algorithms` + - :doc:`advanced_topics/straggler_handling_algorithms` **Director-Based Workflow** Customize the logging function for each task: - - :doc:`log_metric_callback` + - :doc:`advanced_topics/log_metric_callback` Update plan settings: - - :doc:`overriding_plan_settings` + - :doc:`advanced_topics/overriding_plan_settings` .. toctree:: - :maxdepth: 4 + :maxdepth: 1 :hidden: - bash_autocomplete_activation - multiple_plans - compression_settings - overriding_agg_fn - straggler_handling_algorithms - log_metric_callback - supported_aggregation_algorithms - overriding_plan_settings + advanced_topics/bash_autocomplete_activation + advanced_topics/multiple_plans + advanced_topics/compression_settings + advanced_topics/overriding_agg_fn + advanced_topics/straggler_handling_algorithms + advanced_topics/log_metric_callback + advanced_topics/overriding_plan_settings diff --git a/docs/bash_autocomplete_activation.rst b/docs/developer_guide/advanced_topics/bash_autocomplete_activation.rst similarity index 100% rename from docs/bash_autocomplete_activation.rst rename to docs/developer_guide/advanced_topics/bash_autocomplete_activation.rst diff --git a/docs/compression_settings.rst b/docs/developer_guide/advanced_topics/compression_settings.rst similarity index 100% rename from docs/compression_settings.rst rename to docs/developer_guide/advanced_topics/compression_settings.rst diff --git a/docs/log_metric_callback.rst b/docs/developer_guide/advanced_topics/log_metric_callback.rst similarity index 100% rename from docs/log_metric_callback.rst rename to docs/developer_guide/advanced_topics/log_metric_callback.rst diff --git a/docs/multiple_plans.rst b/docs/developer_guide/advanced_topics/multiple_plans.rst similarity index 100% rename from docs/multiple_plans.rst rename to docs/developer_guide/advanced_topics/multiple_plans.rst diff --git a/docs/overriding_agg_fn.rst b/docs/developer_guide/advanced_topics/overriding_agg_fn.rst similarity index 100% rename from docs/overriding_agg_fn.rst rename to docs/developer_guide/advanced_topics/overriding_agg_fn.rst diff --git a/docs/overriding_plan_settings.rst b/docs/developer_guide/advanced_topics/overriding_plan_settings.rst similarity index 100% rename from docs/overriding_plan_settings.rst rename to docs/developer_guide/advanced_topics/overriding_plan_settings.rst diff --git a/docs/straggler_handling_algorithms.rst b/docs/developer_guide/advanced_topics/straggler_handling_algorithms.rst similarity index 100% rename from docs/straggler_handling_algorithms.rst rename to docs/developer_guide/advanced_topics/straggler_handling_algorithms.rst diff --git a/docs/experimental_features.rst b/docs/developer_guide/experimental_features.rst similarity index 93% rename from docs/experimental_features.rst rename to docs/developer_guide/experimental_features.rst index 7c68ce5876..8ed19f95a6 100644 --- a/docs/experimental_features.rst +++ b/docs/developer_guide/experimental_features.rst @@ -20,11 +20,10 @@ Experimental features are *not* ready for production. These features are under a - Filter out information that should stay local - Use Metaflow tools to analyze and debug experiments - - :doc:`workflow_interface` + - :doc:`../about/features_index/workflowinterface` .. toctree:: :maxdepth: 4 :hidden: - workflow_interface - + workflow_interface \ No newline at end of file diff --git a/docs/developer_guide/manual.rst b/docs/developer_guide/manual.rst new file mode 100644 index 0000000000..7145f49d9c --- /dev/null +++ b/docs/developer_guide/manual.rst @@ -0,0 +1,33 @@ +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +****** +Manual +****** + +Establish a federation using GaNDLF + +- :doc:`running_the_federation_with_gandlf` + +Customize the federation: + +- :doc:`utilities` +- :doc:`advanced_topics` + +Get familiar with the APIs: + +- `Open Federated Learning (OpenFL) Tutorials `_ + +Explore new and experimental features: + +- :doc:`experimental_features` + +.. toctree:: + :maxdepth: 2 + :hidden: + + running_the_federation_with_gandlf + utilities + advanced_topics + running_the_federation.tutorial + experimental_features \ No newline at end of file diff --git a/docs/openfl.rst b/docs/developer_guide/openfl_structure.rst similarity index 73% rename from docs/openfl.rst rename to docs/developer_guide/openfl_structure.rst index e07305d3c4..4b9d126a8a 100644 --- a/docs/openfl.rst +++ b/docs/developer_guide/openfl_structure.rst @@ -1,27 +1,27 @@ -.. # Copyright (C) 2020-2023 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -************************************************* -|productName| Structure -************************************************* - -Learn about the short-lived and long-lived components that compose Open Federated Learning (|productName|): - -- :doc:`source/openfl/components` - -Understand the procedure calls to the Director service. - -- :doc:`source/openfl/communication` - -Learn about the plugin framework that makes |productName| flexible and extensible for your use: - -- :doc:`source/openfl/plugins` - - -.. toctree:: - :maxdepth: 4 - :hidden: - - source/openfl/components - source/openfl/communication - source/openfl/plugins \ No newline at end of file +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +************************************************* +|productName| Structure +************************************************* + +Learn about the short-lived and long-lived components that compose Open Federated Learning (|productName|): + +- :doc:`structure/components` + +Understand the procedure calls to the Director service. + +- :doc:`structure/communication` + +Learn about the plugin framework that makes |productName| flexible and extensible for your use: + +- :doc:`structure/plugins` + + +.. toctree:: + :maxdepth: 4 + :hidden: + + structure/components + structure/communication + structure/plugins \ No newline at end of file diff --git a/docs/source/workflow/running_the_federation.notebook.rst b/docs/developer_guide/running_the_federation.notebook.rst similarity index 96% rename from docs/source/workflow/running_the_federation.notebook.rst rename to docs/developer_guide/running_the_federation.notebook.rst index 2a8d92547c..ce8c75df72 100644 --- a/docs/source/workflow/running_the_federation.notebook.rst +++ b/docs/developer_guide/running_the_federation.notebook.rst @@ -1,219 +1,219 @@ -.. # Copyright (C) 2020-2023 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -.. _running_notebook: - -********************************** -Aggregator-Based Workflow Tutorial -********************************** - -You will start a Jupyter\* \ lab server and receive a URL you can use to access the tutorials. Jupyter notebooks are provided for PyTorch\* \ and TensorFlow\* \ that simulate a federation on a local machine. - -.. note:: - - Follow the procedure to become familiar with the APIs used in aggregator-based workflow and conventions such as *FL Plans*, *Aggregators*, and *Collaborators*. - - -Start the Tutorials -=================== - -1. Start a Python\* \ 3.8 (>=3.6, <3.9) virtual environment and confirm |productName| is available. - - .. code-block:: python - - fx - - You should see a list of available commands - -2. Start a Jupyter server. This returns a URL to access available tutorials. - - .. code-block:: python - - fx tutorial start - -3. Open the URL (including the token) in your browser. - -4. Choose a tutorial from which to start. Each tutorial is a demonstration of a simulated federated learning. The following are examples of available tutorials: - - - :code:`Federated Keras MNIST Tutorial`: workspace with a simple `Keras `_ CNN model that will download the `MNIST `_ dataset and train in a federation. - - :code:`Federated Pytorch MNIST Tutorial`: workspace with a simple `PyTorch `_ CNN model that will download the `MNIST `_ dataset and train in a federation. - - :code:`Federated PyTorch UNET Tutorial`: workspace with a UNET `PyTorch `_ model that will download the `Hyper-Kvasir `_ dataset and train in a federation. - - :code:`Federated PyTorch TinyImageNet`: workspace with a MobileNet-V2 `PyTorch `_ model that will download the `Tiny-ImageNet `_ dataset and train in a federation. - - -Familiarize with the API Concepts in an Aggregator-Based Worklow -================================================================ - -Step 1: Enable the |productName| Python API -------------------------------------------- - -Add the following lines to your Python script. - - .. code-block:: python - - import openfl.native as fx - from openfl.federated import FederatedModel, FederatedDataSet - -This loads the |productName| package and import wrappers that adapt your existing data and models to a (simulated) federated context. - -Step 2: Set Up the Experiment ------------------------------ - -For a basic experiment, run the following command. - - .. code-block:: python - - fx.init() - - -This creates a workspace directory containing default FL plan values for your experiments, and sets up a an experiment with two collaborators (the collaborators are creatively named **one** and **two**). - -For an experiment with more collaborators, run the following command. - - .. code-block:: python - - collaborator_list = [str(i) for i in range(NUM_COLLABORATORS)] - fx.init('keras_cnn_mnist', col_names=collaborator_list) - - -.. note:: - - The following are template recommendations for training models: - - - For Keras models, run :code:`fx.init('keras_cnn_mnist')` to start with the *keras_cnn_mnist* template. - - For PyTorch models, run :code:`fx.init('torch_cnn_mnist')` to start with the *torch_cnn_mnist* template. - - -Step 3: Customize the Federated Learning Plan (FL Plan) -------------------------------------------------------- - -For this example, the experiment is set up with the *keras_cnn_mnist* template. - - .. code-block:: python - - fx.init('keras_cnn_mnist') - - -See the FL plan values that can be set with the :code:`fx.get_plan()` command. - - .. code-block:: python - - print(fx.get_plan()) - - { - "aggregator.settings.best_state_path": "save/keras_cnn_mnist_best.pbuf", - "aggregator.settings.init_state_path": "save/keras_cnn_mnist_init.pbuf", - "aggregator.settings.last_state_path": "save/keras_cnn_mnist_last.pbuf", - "aggregator.settings.rounds_to_train": 10, - "aggregator.template": "openfl.component.Aggregator", - ... - } - -Based on this plan values, the experiment will run for 10 rounds. You can customize the experiment to run for 20 rounds either at runtime or ahead of time. - -Set the value at **runtime** with the :code:`override-config` parameter of :code:`fx.run_experiment`. - - .. code-block:: python - - #set values at experiment runtime - fx.run_experiment(experiment_collaborators, override_config={"aggregator.settings.rounds_to_train": 20}) - - -Set the value **ahead of time** with :code:`fx.update_plan()`. - - .. code-block:: python - - #Set values ahead of time with fx.update_plan() - fx.update_plan({"aggregator.settings.rounds_to_train": 20}) - - -Step 4: Wrap the Data and Model -------------------------------- - -Use the :code:`FederatedDataSet` function to wrap in-memory numpy datasets and split the data into N mutually-exclusive chunks for each collaborator participating in the experiment. - - .. code-block:: python - - fl_data = FederatedDataSet(train_images, train_labels, valid_images, valid_labels, batch_size=32, num_classes=classes) - -Similarly, the :code:`FederatedModel` function takes as an argument your model definition. For the first example, you can wrap a Keras model in a function that outputs the compiled model. - -**Example 1:** - - .. code-block:: python - - def build_model(feature_shape,classes): - #Defines the MNIST model - model = Sequential() - model.add(Dense(64, input_shape=feature_shape, activation='relu')) - model.add(Dense(64, activation='relu')) - model.add(Dense(classes, activation='softmax')) - - model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy']) - return model - - fl_model = FederatedModel(build_model, data_loader=fl_data) - -For the second example with a PyTorch model, the :code:`FederatedModel` function takes the following parameters: - -- The class that defines the network definition and associated forward function -- The lambda optimizer method that can be set to a newly instantiated network -- The loss function - -**Example 2:** - - .. code-block:: python - - class Net(nn.Module): - def __init__(self): - super(Net, self).__init__() - self.conv1 = nn.Conv2d(1, 16, 3) - self.pool = nn.MaxPool2d(2, 2) - self.conv2 = nn.Conv2d(16, 32, 3) - self.fc1 = nn.Linear(32 * 5 * 5, 32) - self.fc2 = nn.Linear(32, 84) - self.fc3 = nn.Linear(84, 10) - - def forward(self, x): - x = self.pool(F.relu(self.conv1(x))) - x = self.pool(F.relu(self.conv2(x))) - x = x.view(x.size(0),-1) - x = F.relu(self.fc1(x)) - x = F.relu(self.fc2(x)) - x = self.fc3(x) - return F.log_softmax(x, dim=1) - - optimizer = lambda x: optim.Adam(x, lr=1e-4) - - def cross_entropy(output, target): - """Binary cross-entropy metric - """ - return F.binary_cross_entropy_with_logits(input=output,target=target) - - fl_model = FederatedModel(build_model=Net, optimizer=optimizer, loss_fn=cross_entropy, data_loader=fl_data) - - -Step 5: Define the Collaborators --------------------------------- - -Define the collaborators taking part in the experiment. The example below uses the collaborator list, created earlier with the the :code:`fx.init()` command. - - .. code-block:: python - - experiment_collaborators = {col_name:col_model for col_name, col_model \ - in zip(collaborator_list, fl_model.setup(len(collaborator_list)))} - -This command creates a model for each collaborator with their data shard. - -.. note:: - - In production deployments of |productName|, each collaborator will have the data on premise. Splitting data into shards is not necessary. - -Step 6: Run the Experiment --------------------------- - -Run the experiment for five rounds and return the final model once completed. - - .. code-block:: python - - final_fl_model = fx.run_experiment(experiment_collaborators, override_config={"aggregator.settings.rounds_to_train": 5}) +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +.. _running_notebook: + +********************************** +Aggregator-Based Workflow Tutorial +********************************** + +You will start a Jupyter\* \ lab server and receive a URL you can use to access the tutorials. Jupyter notebooks are provided for PyTorch\* \ and TensorFlow\* \ that simulate a federation on a local machine. + +.. note:: + + Follow the procedure to become familiar with the APIs used in aggregator-based workflow and conventions such as *FL Plans*, *Aggregators*, and *Collaborators*. + + +Start the Tutorials +=================== + +1. Start a Python\* \ 3.8 (>=3.6, <3.9) virtual environment and confirm |productName| is available. + + .. code-block:: python + + fx + + You should see a list of available commands + +2. Start a Jupyter server. This returns a URL to access available tutorials. + + .. code-block:: python + + fx tutorial start + +3. Open the URL (including the token) in your browser. + +4. Choose a tutorial from which to start. Each tutorial is a demonstration of a simulated federated learning. The following are examples of available tutorials: + + - :code:`Federated Keras MNIST Tutorial`: workspace with a simple `Keras `_ CNN model that will download the `MNIST `_ dataset and train in a federation. + - :code:`Federated Pytorch MNIST Tutorial`: workspace with a simple `PyTorch `_ CNN model that will download the `MNIST `_ dataset and train in a federation. + - :code:`Federated PyTorch UNET Tutorial`: workspace with a UNET `PyTorch `_ model that will download the `Hyper-Kvasir `_ dataset and train in a federation. + - :code:`Federated PyTorch TinyImageNet`: workspace with a MobileNet-V2 `PyTorch `_ model that will download the `Tiny-ImageNet `_ dataset and train in a federation. + + +Familiarize with the API Concepts in an Aggregator-Based Worklow +================================================================ + +Step 1: Enable the |productName| Python API +------------------------------------------- + +Add the following lines to your Python script. + + .. code-block:: python + + import openfl.native as fx + from openfl.federated import FederatedModel, FederatedDataSet + +This loads the |productName| package and import wrappers that adapt your existing data and models to a (simulated) federated context. + +Step 2: Set Up the Experiment +----------------------------- + +For a basic experiment, run the following command. + + .. code-block:: python + + fx.init() + + +This creates a workspace directory containing default FL plan values for your experiments, and sets up a an experiment with two collaborators (the collaborators are creatively named **one** and **two**). + +For an experiment with more collaborators, run the following command. + + .. code-block:: python + + collaborator_list = [str(i) for i in range(NUM_COLLABORATORS)] + fx.init('keras_cnn_mnist', col_names=collaborator_list) + + +.. note:: + + The following are template recommendations for training models: + + - For Keras models, run :code:`fx.init('keras_cnn_mnist')` to start with the *keras_cnn_mnist* template. + - For PyTorch models, run :code:`fx.init('torch_cnn_mnist')` to start with the *torch_cnn_mnist* template. + + +Step 3: Customize the Federated Learning Plan (FL Plan) +------------------------------------------------------- + +For this example, the experiment is set up with the *keras_cnn_mnist* template. + + .. code-block:: python + + fx.init('keras_cnn_mnist') + + +See the FL plan values that can be set with the :code:`fx.get_plan()` command. + + .. code-block:: python + + print(fx.get_plan()) + + { + "aggregator.settings.best_state_path": "save/keras_cnn_mnist_best.pbuf", + "aggregator.settings.init_state_path": "save/keras_cnn_mnist_init.pbuf", + "aggregator.settings.last_state_path": "save/keras_cnn_mnist_last.pbuf", + "aggregator.settings.rounds_to_train": 10, + "aggregator.template": "openfl.component.Aggregator", + ... + } + +Based on this plan values, the experiment will run for 10 rounds. You can customize the experiment to run for 20 rounds either at runtime or ahead of time. + +Set the value at **runtime** with the :code:`override-config` parameter of :code:`fx.run_experiment`. + + .. code-block:: python + + #set values at experiment runtime + fx.run_experiment(experiment_collaborators, override_config={"aggregator.settings.rounds_to_train": 20}) + + +Set the value **ahead of time** with :code:`fx.update_plan()`. + + .. code-block:: python + + #Set values ahead of time with fx.update_plan() + fx.update_plan({"aggregator.settings.rounds_to_train": 20}) + + +Step 4: Wrap the Data and Model +------------------------------- + +Use the :code:`FederatedDataSet` function to wrap in-memory numpy datasets and split the data into N mutually-exclusive chunks for each collaborator participating in the experiment. + + .. code-block:: python + + fl_data = FederatedDataSet(train_images, train_labels, valid_images, valid_labels, batch_size=32, num_classes=classes) + +Similarly, the :code:`FederatedModel` function takes as an argument your model definition. For the first example, you can wrap a Keras model in a function that outputs the compiled model. + +**Example 1:** + + .. code-block:: python + + def build_model(feature_shape,classes): + #Defines the MNIST model + model = Sequential() + model.add(Dense(64, input_shape=feature_shape, activation='relu')) + model.add(Dense(64, activation='relu')) + model.add(Dense(classes, activation='softmax')) + + model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy']) + return model + + fl_model = FederatedModel(build_model, data_loader=fl_data) + +For the second example with a PyTorch model, the :code:`FederatedModel` function takes the following parameters: + +- The class that defines the network definition and associated forward function +- The lambda optimizer method that can be set to a newly instantiated network +- The loss function + +**Example 2:** + + .. code-block:: python + + class Net(nn.Module): + def __init__(self): + super(Net, self).__init__() + self.conv1 = nn.Conv2d(1, 16, 3) + self.pool = nn.MaxPool2d(2, 2) + self.conv2 = nn.Conv2d(16, 32, 3) + self.fc1 = nn.Linear(32 * 5 * 5, 32) + self.fc2 = nn.Linear(32, 84) + self.fc3 = nn.Linear(84, 10) + + def forward(self, x): + x = self.pool(F.relu(self.conv1(x))) + x = self.pool(F.relu(self.conv2(x))) + x = x.view(x.size(0),-1) + x = F.relu(self.fc1(x)) + x = F.relu(self.fc2(x)) + x = self.fc3(x) + return F.log_softmax(x, dim=1) + + optimizer = lambda x: optim.Adam(x, lr=1e-4) + + def cross_entropy(output, target): + """Binary cross-entropy metric + """ + return F.binary_cross_entropy_with_logits(input=output,target=target) + + fl_model = FederatedModel(build_model=Net, optimizer=optimizer, loss_fn=cross_entropy, data_loader=fl_data) + + +Step 5: Define the Collaborators +-------------------------------- + +Define the collaborators taking part in the experiment. The example below uses the collaborator list, created earlier with the the :code:`fx.init()` command. + + .. code-block:: python + + experiment_collaborators = {col_name:col_model for col_name, col_model \ + in zip(collaborator_list, fl_model.setup(len(collaborator_list)))} + +This command creates a model for each collaborator with their data shard. + +.. note:: + + In production deployments of |productName|, each collaborator will have the data on premise. Splitting data into shards is not necessary. + +Step 6: Run the Experiment +-------------------------- + +Run the experiment for five rounds and return the final model once completed. + + .. code-block:: python + + final_fl_model = fx.run_experiment(experiment_collaborators, override_config={"aggregator.settings.rounds_to_train": 5}) \ No newline at end of file diff --git a/docs/source/workflow/running_the_federation.singularity.rst b/docs/developer_guide/running_the_federation.singularity.rst similarity index 100% rename from docs/source/workflow/running_the_federation.singularity.rst rename to docs/developer_guide/running_the_federation.singularity.rst diff --git a/docs/source/workflow/running_the_federation.tutorial.rst b/docs/developer_guide/running_the_federation.tutorial.rst similarity index 100% rename from docs/source/workflow/running_the_federation.tutorial.rst rename to docs/developer_guide/running_the_federation.tutorial.rst diff --git a/docs/running_the_federation_with_gandlf.rst b/docs/developer_guide/running_the_federation_with_gandlf.rst similarity index 98% rename from docs/running_the_federation_with_gandlf.rst rename to docs/developer_guide/running_the_federation_with_gandlf.rst index 50c0100dd9..33f9c4ed7a 100644 --- a/docs/running_the_federation_with_gandlf.rst +++ b/docs/developer_guide/running_the_federation_with_gandlf.rst @@ -21,7 +21,7 @@ Aggregator-Based Workflow An overview of this workflow is shown below. -.. figure:: /images/openfl_flow.png +.. figure:: ../images/openfl_flow.png .. centered:: Overview of the Aggregator-Based Workflow @@ -88,7 +88,7 @@ Simulate a federation You can use the `"Hello Federation" bash script `_ to quickly create a federation (an aggregator node and two collaborator nodes) to test the project pipeline. -.. literalinclude:: ../tests/github/test_hello_federation.py +.. literalinclude:: ../../tests/github/test_hello_federation.py :language: bash However, continue with the following procedure for details in creating a federation with an aggregator-based workflow. @@ -197,7 +197,7 @@ STEP 1: Install GaNDLF prerequisites and Create a Workspace The following is an example of the GaNDLF Segmentation Test **plan.yaml**. Notice the **task_runner/settings/gandlf_config** block where the GaNDLF configuration file is embedded: - .. literalinclude:: ../openfl-workspace/gandlf_seg_test/plan/plan.yaml + .. literalinclude:: ../../openfl-workspace/gandlf_seg_test/plan/plan.yaml :language: yaml @@ -248,7 +248,7 @@ STEP 2: Configure the Federation The objectives in this step: - - Ensure each node in the federation has a valid public key infrastructure (PKI) certificate. See :doc:`/source/utilities/pki` for details on available workflows. + - Ensure each node in the federation has a valid public key infrastructure (PKI) certificate. See :doc:`utilities/pki` for details on available workflows. - Distribute the workspace from the aggregator node to the other collaborator nodes. diff --git a/docs/source/openfl/communication.rst b/docs/developer_guide/structure/communication.rst similarity index 88% rename from docs/source/openfl/communication.rst rename to docs/developer_guide/structure/communication.rst index 97e270232c..d189a1cfb0 100644 --- a/docs/source/openfl/communication.rst +++ b/docs/developer_guide/structure/communication.rst @@ -14,7 +14,7 @@ Director-Envoy Communication The following diagram depicts a typical process of establishing a Federation and registering an experiment. -.. mermaid:: director_envoy.mmd +.. mermaid:: ../../mermaid/director_envoy.mmd :caption: Basic Scenario of Director-Envoy Communication :align: center @@ -23,7 +23,7 @@ Director Side Envoy Representation and Related Remote Procedure Calls This diagram shows possible interactions with Envoy handles on the Director side. -.. mermaid:: envoy_representation_and_RPCs.mmd +.. mermaid:: ../../mermaid/envoy_representation_and_RPCs.mmd :caption: Communications Altering or Requesting Envoy-Related Information :align: center @@ -32,6 +32,6 @@ Director Side Experiment Representation and Related Remote Procedure Calls This diagram shows possible interactions with Experiment handles on the Director side. -.. mermaid:: experiment_representation_and_RPCs.mmd +.. mermaid:: ../../mermaid/experiment_representation_and_RPCs.mmd :caption: Communications Altering or Requesting Experiment-Related Information :align: center diff --git a/docs/source/openfl/components.rst b/docs/developer_guide/structure/components.rst similarity index 94% rename from docs/source/openfl/components.rst rename to docs/developer_guide/structure/components.rst index 1bb3d32663..160c0bb84c 100644 --- a/docs/source/openfl/components.rst +++ b/docs/developer_guide/structure/components.rst @@ -49,8 +49,8 @@ The Collaborator is a short-lived entity that manages training the model on loca - exchanging model parameters with the Aggregator. The Collaborator is created by the :ref:`Envoy ` when a new experiment is submitted -in the :ref:`Director-based workflow `. The Collaborator should be started from CLI if a user follows the -:ref:`Aggregator-based workflow ` +in the :ref:`Director-based workflow `. The Collaborator should be started from CLI if a user follows the +:ref:`Aggregator-based workflow ` Every Collaborator is a unique service. The data loader is loaded with a local *shard descriptor* to perform tasks included in an FL experiment. At the end of the training task, weight tensors are extracted and sent to the central node @@ -67,7 +67,7 @@ they would like see supported in |productName|. Long-Lived Components ====================== -These components were introduced to support the :ref:`Director-based workflow `. +These components were introduced to support the :ref:`Director-based workflow `. - The *Director* is the central node of the federation. This component starts an *Aggregator* for each experiment, broadcasts experiment archive to connected collaborator nodes, and provides updates on the status. - The *Envoy* runs on collaborator nodes and is always connected to the *Director*. When the *Director* starts an experiment, the *Envoy* starts the *Collaborator* to train the global model. @@ -110,4 +110,4 @@ regarding collaborator machine resource utilization. Refer to :ref:`device monit Static Diagram ============== -.. figure:: director_workflow.svg +.. figure:: ../../source/openfl/director_workflow.svg diff --git a/docs/source/openfl/plugins.rst b/docs/developer_guide/structure/plugins.rst similarity index 100% rename from docs/source/openfl/plugins.rst rename to docs/developer_guide/structure/plugins.rst diff --git a/docs/source/utilities/utilities.rst b/docs/developer_guide/utilities.rst similarity index 80% rename from docs/source/utilities/utilities.rst rename to docs/developer_guide/utilities.rst index 3516db29d7..13d68acfdf 100644 --- a/docs/source/utilities/utilities.rst +++ b/docs/developer_guide/utilities.rst @@ -7,19 +7,19 @@ Open Federated Learning (|productName|) Utilities The following are utilities available in Open Federated Learning (|productName|). -:doc:`pki` +:doc:`utilities/pki` Use the Public Key Infrastructure (PKI) solution workflows to certify the nodes in your federation. -:doc:`splitters_data` +:doc:`utilities/splitters_data` Split your data to run your federation from a single dataset. -:doc:`timeouts` +:doc:`utilities/timeouts` Decorate methods to enforce timeout on it's execution. .. toctree:: :maxdepth: 1 :hidden: - pki - splitters_data - timeouts \ No newline at end of file + utilities/pki + utilities/splitters_data + utilities/timeouts \ No newline at end of file diff --git a/docs/source/utilities/pki.rst b/docs/developer_guide/utilities/pki.rst similarity index 100% rename from docs/source/utilities/pki.rst rename to docs/developer_guide/utilities/pki.rst diff --git a/docs/source/utilities/splitters_data.rst b/docs/developer_guide/utilities/splitters_data.rst similarity index 100% rename from docs/source/utilities/splitters_data.rst rename to docs/developer_guide/utilities/splitters_data.rst diff --git a/docs/source/utilities/timeouts.rst b/docs/developer_guide/utilities/timeouts.rst similarity index 100% rename from docs/source/utilities/timeouts.rst rename to docs/developer_guide/utilities/timeouts.rst diff --git a/docs/developer_ref/api_documentation.rst b/docs/developer_ref/api_documentation.rst new file mode 100644 index 0000000000..d42fc1251d --- /dev/null +++ b/docs/developer_ref/api_documentation.rst @@ -0,0 +1,10 @@ +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +************************************************* +|productName| API +************************************************* + +Welcome to the |productName| API reference: + +TODO \ No newline at end of file diff --git a/docs/troubleshooting.rst b/docs/developer_ref/troubleshooting.rst similarity index 100% rename from docs/troubleshooting.rst rename to docs/developer_ref/troubleshooting.rst diff --git a/docs/get_started/examples.rst b/docs/get_started/examples.rst new file mode 100644 index 0000000000..f0672f3a38 --- /dev/null +++ b/docs/get_started/examples.rst @@ -0,0 +1,58 @@ +.. # Copyright (C) 2020-2024 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +.. _openfl_examples: + +================================= +Examples for Running a Federation +================================= + +|productName| currently offers three ways to set up and run experiments with a federation: +the Task Runner API, the Interactive API, and the experimental workflow interface. +The Interactive API introduces a convenient way to set up a federation and brings “long-lived” components in a federation (“Director” and “Envoy”), +while the Task Runner API workflow is advised for scenarios where the workload needs to be verified prior to execution. In contrast, the experimental workflow interface +is introduce to provide significant flexility to researchers and developers in the construction of federated learning experiments. + +------------------------- +Task Runner API +------------------------- +Formulate the experiment as a series of tasks, or a flow. + +See :ref:`taskrunner_pytorch_mnist` + +.. toctree:: + :hidden: + :maxdepth: 1 + + examples/taskrunner_pytorch_mnist + +------------------------- +Interactive API +------------------------- +Setup long-lived components to run many experiments in series. + +See :ref:`interactive_tensorflow_mnist` + +.. toctree:: + :hidden: + :maxdepth: 1 + + examples/interactive_tensorflow_mnist + +------------------------- +Workflow Interface (Experimental) +------------------------- +Formulate the experiment as a series of tasks, or a flow. + +See :ref:`workflowinterface_pytorch_mnist` + +.. toctree:: + :hidden: + :maxdepth: 1 + + examples/workflowinterface_pytorch_mnist + + +.. note:: + + Please visit `repository `_ for a full list of tutorials \ No newline at end of file diff --git a/docs/get_started/examples/interactive_tensorflow_mnist.rst b/docs/get_started/examples/interactive_tensorflow_mnist.rst new file mode 100644 index 0000000000..2432b9d01a --- /dev/null +++ b/docs/get_started/examples/interactive_tensorflow_mnist.rst @@ -0,0 +1,374 @@ +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +.. _interactive_tensorflow_mnist: + +Interactive API: MNIST Classification Tutorial +=================================================== + +In this tutorial, we will set up a federation and train a basic TensoFlow model on the MNIST dataset using the interactive API. +See `full tutorial `_. + +**About the dataset** + +It is a dataset of 60,000 small square 28x28 pixel grayscale images of handwritten single digits +between 0 and 9. More info at `wiki `_. + +.. note:: + + This tutorial will be run without TLS and will be done locally as a simulation + +----------------------------------- +Step 0: Installation +----------------------------------- +- If you haven't done so already, create a virtual environment, upgrade pip and install OpenFL (See :ref:`install_package`) + +----------------------------------- +Step 1: Set up environment +----------------------------------- +Split terminal into 3 (1 terminal for the director, 1 for the envoy, and 1 for the experiment) and activate the virtual environment created in Step 0 + +.. code-block:: console + + source venv/bin/activate + +Clone the OpenFL repository: + +.. code-block:: console + + git clone https://github.com/securefederatedai/openfl.git + + +Navigate to the tutorial: + +.. code-block:: console + + cd openfl/openfl-tutorials/interactive_api/Tensorflow_MNIST + +----------------------------------- +Step 2: Setting up Director +----------------------------------- +In the first terminal, run the director: + +.. code-block:: console + + cd director + ./start_director.sh + +----------------------------------- +Step 3: Setting up Envoy +----------------------------------- +In the second terminal, run the envoy: + +.. code-block:: console + + cd envoy + ./start_envoy.sh env_one envoy_config_one.yaml + +Optional: Run a second envoy in an additional terminal: + +- Ensure steps 0 and 1 are complete for this terminal as well. + +- Run the second envoy: + +.. code-block:: console + + cd envoy + ./start_envoy.sh env_two envoy_config_two.yaml + +----------------------------------- +Step 4: Run the federation +----------------------------------- +In the third terminal (or forth terminal, if you chose to do two envoys) run the `Tensorflow_MNIST.ipynb` Jupyter Notebook: + +.. code-block:: console + + cd workspace + jupyter lab Tensorflow_MNIST.ipynb + + +**Notebook walkthrough:** + +Contents of this notebook can be found `here `_. + +Install additional dependencies if not already installed + +.. code-block:: console + + pip install tensorflow==2.8 + +Import: + +.. code-block:: python + + import tensorflow as tf + print('TensorFlow', tf.__version__) + +Connect to the Federation + +Be sure to start Director and Envoy (Steps 2 and 3) before proceeding with this cell. + +This cell connects this notebook to the Federation. + +.. code-block:: python + + from openfl.interface.interactive_api.federation import Federation + + # please use the same identificator that was used in signed certificate + client_id = 'api' + cert_dir = 'cert' + director_node_fqdn = 'localhost' + director_port = 50051 + + # Run with TLS disabled (trusted environment) + + # Create a Federation + federation = Federation( + client_id=client_id, + director_node_fqdn=director_node_fqdn, + director_port=director_port, + tls=False + ) + +Query Datasets from Shard Registry + +.. code-block:: python + + shard_registry = federation.get_shard_registry() + shard_registry + +.. code-block:: python + + # First, request a dummy_shard_desc that holds information about the federated dataset + dummy_shard_desc = federation.get_dummy_shard_descriptor(size=10) + dummy_shard_dataset = dummy_shard_desc.get_dataset('train') + sample, target = dummy_shard_dataset[0] + f"Sample shape: {sample.shape}, target shape: {target.shape}" + +Describing FL experiment + +.. code-block:: python + + from openfl.interface.interactive_api.experiment import TaskInterface + from openfl.interface.interactive_api.experiment import ModelInterface + from openfl.interface.interactive_api.experiment import FLExperiment + +Register model + +.. code-block:: python + + # Define model + model = tf.keras.Sequential([ + tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), + tf.keras.layers.MaxPooling2D((2, 2)), + tf.keras.layers.BatchNormalization(), + tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)), + tf.keras.layers.MaxPooling2D((2, 2)), + tf.keras.layers.BatchNormalization(), + tf.keras.layers.Flatten(), + tf.keras.layers.Dense(10, activation=None), + ], name='simplecnn') + model.summary() + + # Define optimizer + optimizer = tf.optimizers.Adam(learning_rate=1e-3) + + # Loss and metrics. These will be used later. + loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) + train_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy() + val_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy() + + # Create ModelInterface + framework_adapter = 'openfl.plugins.frameworks_adapters.keras_adapter.FrameworkAdapterPlugin' + MI = ModelInterface(model=model, optimizer=optimizer, framework_plugin=framework_adapter) + +Register dataset + +.. code-block:: python + + import numpy as np + from tensorflow.keras.utils import Sequence + + from openfl.interface.interactive_api.experiment import DataInterface + + + class DataGenerator(Sequence): + + def __init__(self, shard_descriptor, batch_size): + self.shard_descriptor = shard_descriptor + self.batch_size = batch_size + self.indices = np.arange(len(shard_descriptor)) + self.on_epoch_end() + + def __len__(self): + return len(self.indices) // self.batch_size + + def __getitem__(self, index): + index = self.indices[index * self.batch_size:(index + 1) * self.batch_size] + batch = [self.indices[k] for k in index] + + X, y = self.shard_descriptor[batch] + return X, y + + def on_epoch_end(self): + np.random.shuffle(self.indices) + + + class MnistFedDataset(DataInterface): + + def __init__(self, **kwargs): + super().__init__(**kwargs) + + @property + def shard_descriptor(self): + return self._shard_descriptor + + @shard_descriptor.setter + def shard_descriptor(self, shard_descriptor): + """ + Describe per-collaborator procedures or sharding. + + This method will be called during a collaborator initialization. + Local shard_descriptor will be set by Envoy. + """ + self._shard_descriptor = shard_descriptor + + self.train_set = shard_descriptor.get_dataset('train') + self.valid_set = shard_descriptor.get_dataset('val') + + def __getitem__(self, index): + return self.shard_descriptor[index] + + def __len__(self): + return len(self.shard_descriptor) + + def get_train_loader(self): + """ + Output of this method will be provided to tasks with optimizer in contract + """ + if self.kwargs['train_bs']: + batch_size = self.kwargs['train_bs'] + else: + batch_size = 32 + return DataGenerator(self.train_set, batch_size=batch_size) + + def get_valid_loader(self): + """ + Output of this method will be provided to tasks without optimizer in contract + """ + if self.kwargs['valid_bs']: + batch_size = self.kwargs['valid_bs'] + else: + batch_size = 32 + + return DataGenerator(self.valid_set, batch_size=batch_size) + + def get_train_data_size(self): + """ + Information for aggregation + """ + + return len(self.train_set) + + def get_valid_data_size(self): + """ + Information for aggregation + """ + return len(self.valid_set) + +Create Mnist federated dataset + +.. code-block:: python + + fed_dataset = MnistFedDataset(train_bs=64, valid_bs=512) + +Define and register FL tasks + +.. code-block:: python + + import time + + TI = TaskInterface() + + # from openfl.interface.aggregation_functions import AdagradAdaptiveAggregation # Uncomment this lines to use + # agg_fn = AdagradAdaptiveAggregation(model_interface=MI, learning_rate=0.4) # Adaptive Federated Optimization + # @TI.set_aggregation_function(agg_fn) # alghorithm! + # # See details in the: + # # https://arxiv.org/abs/2003.00295 + + @TI.register_fl_task(model='model', data_loader='train_dataset', device='device', optimizer='optimizer') + def train(model, train_dataset, optimizer, device, loss_fn=loss_fn, warmup=False): + start_time = time.time() + + # Iterate over the batches of the dataset. + for step, (x_batch_train, y_batch_train) in enumerate(train_dataset): + with tf.GradientTape() as tape: + logits = model(x_batch_train, training=True) + loss_value = loss_fn(y_batch_train, logits) + grads = tape.gradient(loss_value, model.trainable_weights) + optimizer.apply_gradients(zip(grads, model.trainable_weights)) + + # Update training metric. + train_acc_metric.update_state(y_batch_train, logits) + + # Log every 200 batches. + if step % 200 == 0: + print( + "Training loss (for one batch) at step %d: %.4f" + % (step, float(loss_value)) + ) + print("Seen so far: %d samples" % ((step + 1) * 64)) + if warmup: + break + + # Display metrics at the end of each epoch. + train_acc = train_acc_metric.result() + print("Training acc over epoch: %.4f" % (float(train_acc),)) + + # Reset training metrics at the end of each epoch + train_acc_metric.reset_states() + + + return {'train_acc': train_acc,} + + + @TI.register_fl_task(model='model', data_loader='val_dataset', device='device') + def validate(model, val_dataset, device): + # Run a validation loop at the end of each epoch. + for x_batch_val, y_batch_val in val_dataset: + val_logits = model(x_batch_val, training=False) + # Update val metrics + val_acc_metric.update_state(y_batch_val, val_logits) + val_acc = val_acc_metric.result() + val_acc_metric.reset_states() + print("Validation acc: %.4f" % (float(val_acc),)) + + return {'validation_accuracy': val_acc,} + +Time to start a federated learning experiment + +.. code-block:: python + + # create an experimnet in federation + experiment_name = 'mnist_experiment' + fl_experiment = FLExperiment(federation=federation, experiment_name=experiment_name,serializer_plugin='openfl.plugins.interface_serializer.keras_seri + +.. code-block:: python + + # print the default federated learning plan + import openfl.native as fx + print(fx.get_plan(fl_plan=fl_experiment.plan)) + +.. code-block:: python + + # The following command zips the workspace and python requirements to be transfered to collaborator nodes + fl_experiment.start(model_provider=MI, + task_keeper=TI, + data_loader=fed_dataset, + rounds_to_train=5, + opt_treatment='CONTINUE_GLOBAL', + override_config={'aggregator.settings.db_store_rounds': 1, 'compression_pipeline.template': 'openfl.pipelines.KCPip + +.. code-block:: python + + fl_experiment.stream_metrics() \ No newline at end of file diff --git a/docs/get_started/examples/taskrunner_pytorch_mnist.rst b/docs/get_started/examples/taskrunner_pytorch_mnist.rst new file mode 100644 index 0000000000..c0d32fde02 --- /dev/null +++ b/docs/get_started/examples/taskrunner_pytorch_mnist.rst @@ -0,0 +1,173 @@ +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +.. _taskrunner_pytorch_mnist: + +========================================== +Task Runner API: Federated PyTorch MNIST +========================================== + +In this tutorial, we will set up a federation and train a basic PyTorch model on the MNIST dataset using the task runner API. +See `full notebook `_. + +.. note:: + + Ensure you have installed the |productName| package. + + See :ref:`install_package` for details. + + +Install additional dependencies if not already installed + +.. code-block:: console + + pip install torch torchvision + +.. code-block:: python + + import numpy as np + import torch + import torch.nn as nn + import torch.nn.functional as F + import torch.optim as optim + + import torchvision + import torchvision.transforms as transforms + import openfl.native as fx + from openfl.federated import FederatedModel,FederatedDataSet + +After importing the required packages, the next step is setting up our openfl workspace. +To do this, simply run the ``fx.init()`` command as follows: + +.. code-block:: python + + #Setup default workspace, logging, etc. + fx.init('torch_cnn_mnist', log_level='METRIC', log_file='./spam_metric.log') + +Now we are ready to define our dataset and model to perform federated learning on. +The dataset should be composed of a numpy array. We start with a simple fully connected model that is trained on the MNIST dataset. + +.. code-block:: python + + def one_hot(labels, classes): + return np.eye(classes)[labels] + + transform = transforms.Compose( + [transforms.ToTensor(), + transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) + + trainset = torchvision.datasets.MNIST(root='./data', train=True, + download=True, transform=transform) + + train_images,train_labels = trainset.train_data, np.array(trainset.train_labels) + train_images = torch.from_numpy(np.expand_dims(train_images, axis=1)).float() + + validset = torchvision.datasets.MNIST(root='./data', train=False, + download=True, transform=transform) + + valid_images,valid_labels = validset.test_data, np.array(validset.test_labels) + valid_images = torch.from_numpy(np.expand_dims(valid_images, axis=1)).float() + valid_labels = one_hot(valid_labels,10) + +.. code-block:: python + + feature_shape = train_images.shape[1] + classes = 10 + + fl_data = FederatedDataSet(train_images,train_labels,valid_images,valid_labels,batch_size=32,num_classes=classes) + + class Net(nn.Module): + def __init__(self): + super(Net, self).__init__() + self.conv1 = nn.Conv2d(1, 16, 3) + self.pool = nn.MaxPool2d(2, 2) + self.conv2 = nn.Conv2d(16, 32, 3) + self.fc1 = nn.Linear(32 * 5 * 5, 32) + self.fc2 = nn.Linear(32, 84) + self.fc3 = nn.Linear(84, 10) + + def forward(self, x): + x = self.pool(F.relu(self.conv1(x))) + x = self.pool(F.relu(self.conv2(x))) + x = x.view(x.size(0),-1) + x = F.relu(self.fc1(x)) + x = F.relu(self.fc2(x)) + x = self.fc3(x) + return F.log_softmax(x, dim=1) + + optimizer = lambda x: optim.Adam(x, lr=1e-4) + + def cross_entropy(output, target): + """Binary cross-entropy metric + """ + return F.cross_entropy(input=output,target=target) + + +Here we can define metric logging function. It should has the following signature described below. You can use it to write metrics to tensorboard or some another specific logging. + +.. code-block:: python + + from torch.utils.tensorboard import SummaryWriter + + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + + def write_metric(node_name, task_name, metric_name, metric, round_number): + writer.add_scalar("{}/{}/{}".format(node_name, task_name, metric_name), + metric, round_number) + +.. code-block:: python + + #Create a federated model using the pytorch class, lambda optimizer function, and loss function + fl_model = FederatedModel(build_model=Net,optimizer=optimizer,loss_fn=cross_entropy,data_loader=fl_data) + +The ``FederatedModel`` object is a wrapper around your Keras, Tensorflow or PyTorch model that makes it compatible with openfl. +It provides built in federated training and validation functions that we will see used below. +Using it's setup function, collaborator models and datasets can be automatically defined for the experiment. + +.. code-block:: python + + collaborator_models = fl_model.setup(num_collaborators=2) + collaborators = {'one':collaborator_models[0],'two':collaborator_models[1]}#, 'three':collaborator_models[2]} + +.. code-block:: python + + #Original MNIST dataset + print(f'Original training data size: {len(train_images)}') + print(f'Original validation data size: {len(valid_images)}\n') + + #Collaborator one's data + print(f'Collaborator one\'s training data size: {len(collaborator_models[0].data_loader.X_train)}') + print(f'Collaborator one\'s validation data size: {len(collaborator_models[0].data_loader.X_valid)}\n') + + #Collaborator two's data + print(f'Collaborator two\'s training data size: {len(collaborator_models[1].data_loader.X_train)}') + print(f'Collaborator two\'s validation data size: {len(collaborator_models[1].data_loader.X_valid)}\n') + + #Collaborator three's data + #print(f'Collaborator three\'s training data size: {len(collaborator_models[2].data_loader.X_train)}') + #print(f'Collaborator three\'s validation data size: {len(collaborator_models[2].data_loader.X_valid)}') + +We can see the current plan values by running the ``fx.get_plan()`` function + +.. code-block:: python + + #Get the current values of the plan. Each of these can be overridden + print(fx.get_plan()) + +Now we are ready to run our experiment. +If we want to pass in custom plan settings, we can easily do that with the override_config parameter + +.. code-block:: python + + # Run experiment, return trained FederatedModel + + final_fl_model = fx.run_experiment(collaborators, override_config={ + 'aggregator.settings.rounds_to_train': 5, + 'aggregator.settings.log_metric_callback': write_metric, + }) + +.. code-block:: python + + #Save final model + final_fl_model.save_native('final_pytorch_model') \ No newline at end of file diff --git a/docs/get_started/examples/workflowinterface_pytorch_mnist.rst b/docs/get_started/examples/workflowinterface_pytorch_mnist.rst new file mode 100644 index 0000000000..38cc24fddb --- /dev/null +++ b/docs/get_started/examples/workflowinterface_pytorch_mnist.rst @@ -0,0 +1,385 @@ +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +.. _workflowinterface_pytorch_mnist: + +============================================ +Workflow Interface: Federated PyTorch MNIST +============================================ + +This tutorial introduces the API to get up and running with your first horizontal federated learning workflow. This work has the following goals: + +- Simplify the federated workflow representation + +- Help users better understand the steps in federated learning (weight extraction, compression, etc.) + +- Designed to maintain data privacy + +- Aims for syntatic consistency with the Netflix MetaFlow project. Infrastructure reuse where possible. + +See `full notebook `_. + +**What is it?** +The workflow interface is a new way of composing federated learning experiments with |productName|. +It was borne through conversations with researchers and existing users who had novel use cases that didn't quite fit the standard horizontal federated learning paradigm. + +**Getting Started** +First we start by installing the necessary dependencies for the workflow interface: + +.. code-block:: console + + pip install git+https://github.com/intel/openfl.git + pip install -r requirements_workflow_interface.txt + pip install torch + pip install torchvision + +We begin with the quintessential example of a small pytorch CNN model trained on the MNIST dataset. +Let's start define our dataloaders, model, optimizer, and some helper functions like we would for any other deep learning experiment + +.. code-block:: python + + import torch.nn as nn + import torch.nn.functional as F + import torch.optim as optim + import torch + import torchvision + import numpy as np + + n_epochs = 3 + batch_size_train = 64 + batch_size_test = 1000 + learning_rate = 0.01 + momentum = 0.5 + log_interval = 10 + + random_seed = 1 + torch.backends.cudnn.enabled = False + torch.manual_seed(random_seed) + + mnist_train = torchvision.datasets.MNIST( + "./files/", + train=True, + download=True, + transform=torchvision.transforms.Compose( + [ + torchvision.transforms.ToTensor(), + torchvision.transforms.Normalize((0.1307,), (0.3081,)), + ] + ), + ) + + mnist_test = torchvision.datasets.MNIST( + "./files/", + train=False, + download=True, + transform=torchvision.transforms.Compose( + [ + torchvision.transforms.ToTensor(), + torchvision.transforms.Normalize((0.1307,), (0.3081,)), + ] + ), + ) + + class Net(nn.Module): + def __init__(self): + super(Net, self).__init__() + self.conv1 = nn.Conv2d(1, 10, kernel_size=5) + self.conv2 = nn.Conv2d(10, 20, kernel_size=5) + self.conv2_drop = nn.Dropout2d() + self.fc1 = nn.Linear(320, 50) + self.fc2 = nn.Linear(50, 10) + + def forward(self, x): + x = F.relu(F.max_pool2d(self.conv1(x), 2)) + x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) + x = x.view(-1, 320) + x = F.relu(self.fc1(x)) + x = F.dropout(x, training=self.training) + x = self.fc2(x) + return F.log_softmax(x) + + def inference(network,test_loader): + network.eval() + test_loss = 0 + correct = 0 + with torch.no_grad(): + for data, target in test_loader: + output = network(data) + test_loss += F.nll_loss(output, target, size_average=False).item() + pred = output.data.max(1, keepdim=True)[1] + correct += pred.eq(target.data.view_as(pred)).sum() + test_loss /= len(test_loader.dataset) + print('\nTest set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format( + test_loss, correct, len(test_loader.dataset), + 100. * correct / len(test_loader.dataset))) + accuracy = float(correct / len(test_loader.dataset)) + return accuracy + +Next we import the FLSpec, LocalRuntime, and placement decorators. + +- FLSpec – Defines the flow specification. User defined flows are subclasses of this. + +- Runtime – Defines where the flow runs, infrastructure for task transitions (how information gets sent). The LocalRuntime runs the flow on a single node. + +- aggregator/collaborator - placement decorators that define where the task will be assigned + +.. code-block:: python + + from copy import deepcopy + + from openfl.experimental.interface import FLSpec, Aggregator, Collaborator + from openfl.experimental.runtime import LocalRuntime + from openfl.experimental.placement import aggregator, collaborator + + + def FedAvg(models, weights=None): + new_model = models[0] + state_dicts = [model.state_dict() for model in models] + state_dict = new_model.state_dict() + for key in models[1].state_dict(): + state_dict[key] = torch.from_numpy(np.average([state[key].numpy() for state in state_dicts], + axis=0, + weights=weights)) + new_model.load_state_dict(state_dict) + return new_model + +Now we come to the flow definition. +The |productName| Workflow Interface adopts the conventions set by Metaflow, that every workflow begins with `start` +and concludes with the `end` task. The aggregator begins with an optionally passed in model and optimizer. +The aggregator begins the flow with the `start` task, +where the list of collaborators is extracted from the runtime (`self.collaborators = self.runtime.collaborators`) +and is then used as the list of participants to run the task listed in `self.next`, `aggregated_model_validation`. +The model, optimizer, and anything that is not explicitly excluded from the next function will be passed from the `start` +function on the aggregator to the `aggregated_model_validation` task on the collaborator. Where the tasks run is determined by the placement decorator that precedes each task definition (`@aggregator` or `@collaborator`). Once each of the collaborators (defined in the runtime) complete the `aggregated_model_validation` task, they pass their current state onto the `train` task, from `train` to `local_model_validation`, and then finally to `join` at the aggregator. It is in `join` that an average is taken of the model weights, and the next round can begin.\n", + +.. figure:: ../../images/workflow_interface.png + +.. code-block:: python + + class FederatedFlow(FLSpec): + + def __init__(self, model=None, optimizer=None, rounds=3, **kwargs): + super().__init__(**kwargs) + if model is not None: + self.model = model + self.optimizer = optimizer + else: + self.model = Net() + self.optimizer = optim.SGD(self.model.parameters(), lr=learning_rate, + momentum=momentum) + self.rounds = rounds + + @aggregator + def start(self): + print(f'Performing initialization for model') + self.collaborators = self.runtime.collaborators + self.private = 10 + self.current_round = 0 + self.next(self.aggregated_model_validation, foreach='collaborators', exclude=['private']) + + @collaborator + def aggregated_model_validation(self): + print(f'Performing aggregated model validation for collaborator {self.input}') + self.agg_validation_score = inference(self.model, self.test_loader) + print(f'{self.input} value of {self.agg_validation_score}') + self.next(self.train) + + @collaborator + def train(self): + self.model.train() + self.optimizer = optim.SGD(self.model.parameters(), lr=learning_rate, + momentum=momentum) + train_losses = [] + for batch_idx, (data, target) in enumerate(self.train_loader): + self.optimizer.zero_grad() + output = self.model(data) + loss = F.nll_loss(output, target) + loss.backward() + self.optimizer.step() + if batch_idx % log_interval == 0: + print('Train Epoch: 1 [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( + batch_idx * len(data), len(self.train_loader.dataset), + 100. * batch_idx / len(self.train_loader), loss.item())) + self.loss = loss.item() + torch.save(self.model.state_dict(), 'model.pth') + torch.save(self.optimizer.state_dict(), 'optimizer.pth') + self.training_completed = True + self.next(self.local_model_validation) + + @collaborator + def local_model_validation(self): + self.local_validation_score = inference(self.model, self.test_loader) + print( + f'Doing local model validation for collaborator {self.input}: {self.local_validation_score}') + self.next(self.join, exclude=['training_completed']) + + @aggregator + def join(self, inputs): + self.average_loss = sum(input.loss for input in inputs) / len(inputs) + self.aggregated_model_accuracy = sum( + input.agg_validation_score for input in inputs) / len(inputs) + self.local_model_accuracy = sum( + input.local_validation_score for input in inputs) / len(inputs) + print(f'Average aggregated model validation values = {self.aggregated_model_accuracy}') + print(f'Average training loss = {self.average_loss}') + print(f'Average local model validation values = {self.local_model_accuracy}') + self.model = FedAvg([input.model for input in inputs]) + self.optimizer = [input.optimizer for input in inputs][0] + self.current_round += 1 + if self.current_round < self.rounds: + self.next(self.aggregated_model_validation, + foreach='collaborators', exclude=['private']) + else: + self.next(self.end) + + @aggregator + def end(self): + print(f'This is the end of the flow') + + +You'll notice in the `FederatedFlow` definition above that there were certain attributes that the flow was not initialized with, namely the `train_loader` and `test_loader` for each of the collaborators. These are **private_attributes** that are exposed only throught he runtime. Each participant has it's own set of private attributes: a dictionary where the key is the attribute name, and the value is the object that will be made accessible through that participant's task. + +Below, we segment shards of the MNIST dataset for **four collaborators**: Portland, Seattle, Chandler, and Portland. Each has their own slice of the dataset that's accessible via the `train_loader` or `test_loader` attribute. Note that the private attributes are flexible, and you can choose to pass in a completely different type of object to any of the collaborators or aggregator (with an arbitrary name). These private attributes will always be filtered out of the current state when transfering from collaborator to aggregator, or vice versa. + + +.. code-block:: python + + # Aggregator + aggregator_ = Aggregator() + + collaborator_names = ["Portland", "Seattle", "Chandler", "Bangalore"] + + def callable_to_initialize_collaborator_private_attributes(index, n_collaborators, batch_size, train_dataset, test_dataset): + train = deepcopy(train_dataset) + test = deepcopy(test_dataset) + train.data = train_dataset.data[index::n_collaborators] + train.targets = train_dataset.targets[index::n_collaborators] + test.data = test_dataset.data[index::n_collaborators] + test.targets = test_dataset.targets[index::n_collaborators] + + return { + "train_loader": torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=True), + "test_loader": torch.utils.data.DataLoader(test, batch_size=batch_size, shuffle=True), + } + + # Setup collaborators private attributes via callable function + collaborators = [] + for idx, collaborator_name in enumerate(collaborator_names): + collaborators.append( + Collaborator( + name=collaborator_name, + private_attributes_callable=callable_to_initialize_collaborator_private_attributes, + index=idx, + n_collaborators=len(collaborator_names), + train_dataset=mnist_train, + test_dataset=mnist_test, + batch_size=64 + ) + ) + + local_runtime = LocalRuntime(aggregator=aggregator_, collaborators=collaborators, + backend="ray") + print(f'Local runtime collaborators = {local_runtime.collaborators}') + +Now that we have our flow and runtime defined, let's run the experiment! + +.. code-block:: python + + model = None + best_model = None + optimizer = None + flflow = FederatedFlow(model, optimizer, checkpoint=True) + flflow.runtime = local_runtime + flflow.run() + +Now that the flow has completed, let's get the final model and accuracy: + +.. code-block:: python + + print(f'Sample of the final model weights: {flflow.model.state_dict()["conv1.weight"][0]}') + + print(f'\nFinal aggregated model accuracy for {flflow.rounds} rounds of training: {flflow.aggregated_model_accuracy}') + + +We can get the final model, and all other aggregator attributes after the flow completes. But what if there's an intermediate model task and its specific output that we want to look at in detail? This is where **checkpointing** and reuse of Metaflow tooling come in handy. + +Let's make a tweak to the flow object, and run the experiment one more time (we can even use our previous model / optimizer as a base for the experiment) + +.. code-block:: python + + flflow2 = FederatedFlow(model=flflow.model, optimizer=flflow.optimizer, checkpoint=True) + flflow2.runtime = local_runtime + flflow2.run() + +Now that the flow is complete, let's dig into some of the information captured along the way + +.. code-block:: python + + run_id = flflow2._run_id + +.. code-block:: python + + import metaflow + from metaflow import Metaflow, Flow, Task, Step + +.. code-block:: python + + m = Metaflow() + list(m) + +For existing users of Metaflow, you'll notice this is the same way you would examine a flow after completion. Let's look at the latest run that generated some results: + +.. code-block:: python + + f = Flow('FederatedFlow').latest_run + f + +And its list of steps + +.. code-block:: python + + list(f) + +This matches the list of steps executed in the flow, so far so good... + +.. code-block:: python + + s = Step(f'FederatedFlow/{run_id}/train') + s + +.. code-block:: python + + list(s) + +Now we see 12 steps: 4 collaborators each performed 3 rounds of model training + +.. code-block:: python + + t = Task(f'FederatedFlow/{run_id}/train/9') + t + +.. code-block:: python + + t.data + +.. code-block:: python + + t.data.input + +Now let's look at its log output (stdout) and any error logs (stderr) + +.. code-block:: python + + print(t.stdout) + print(t.stderr) + +**Congratulations!** +Now that you've completed your first workflow interface quickstart notebook, + +see some of the more advanced things you can do in our other `tutorials `_, including: + +- Using the LocalRuntime Ray Backend for dedicated GPU access +- Vertical Federated Learning +- Model Watermarking +- Differential Privacy +- And More! diff --git a/docs/install.singularity.rst b/docs/get_started/install.singularity.rst similarity index 100% rename from docs/install.singularity.rst rename to docs/get_started/install.singularity.rst diff --git a/docs/install.rst b/docs/get_started/installation.rst similarity index 93% rename from docs/install.rst rename to docs/get_started/installation.rst index ec4439b135..ac5afb3322 100644 --- a/docs/install.rst +++ b/docs/get_started/installation.rst @@ -1,112 +1,112 @@ -.. # Copyright (C) 2020-2023 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -.. _install_software_root: - -===================== -Installation -===================== - -Depending on how you want to set up |productName|, choose one of the following installation procedure. - - -.. _install_package: - -********************************* -Install the Package -********************************* - -Follow this procedure to prepare the environment and install the |productName| package. -Perform this procedure on every node in the federation. - -1. Install a Python 3.8 (>=3.6, <3.9) virtual environment using venv. - - See the `Venv installation guide `_ for details. - -2. Create a new Virtualenv environment for the project. - - .. code-block:: console - - python3 -m venv venv - -3. Activate the virtual environment. - - .. code-block:: console - - source venv/bin/activate - -4. Install the |productName| package. - - A. Installation from PyPI: - - .. code-block:: console - - python -m pip install openfl - - B. Installation from source: - - #. Clone the |productName| repository: - - .. code-block:: console - - git clone https://github.com/intel/openfl.git - - - #. Install build tools, before installing |productName|: - - .. code-block:: console - - python -m pip install -U pip setuptools wheel - cd openfl/ - python -m pip install . - - - -5. Run the :code:`fx` command in the virtual environment to confirm |productName| is installed. - - .. figure:: images/fx_help.png - :scale: 70 % - -.. centered:: Output of the fx Command - - -.. _install_docker: - -**************************************** -|productName| with Docker\* \ -**************************************** - -Follow this procedure to download or build a Docker\*\ image of |productName|, which you can use to run your federation in an isolated environment. - -.. note:: - - The Docker\* \ version of |productName| is to provide an isolated environment complete with the prerequisites to run a federation. When the execution is over, the container can be destroyed and the results of the computation will be available on a directory on the local host. - -1. Install Docker on all nodes in the federation. - - See the `Docker installation guide `_ for details. - -2. Check that Docker is running properly with the *Hello World* command: - - .. code-block:: console - - $ docker run hello-world - Hello from Docker! - This message shows that your installation appears to be working correctly. - ... - ... - ... - -3. Build an image from the latest official |productName| release: - - .. code-block:: console - - docker pull intel/openfl - - If you prefer to build an image from a specific commit or branch, perform the following commands: - - .. code-block:: console - - git clone https://github.com/intel/openfl.git - cd openfl - docker build -f openfl-docker/Dockerfile.base . +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +.. _install_software_root: + +===================== +Installation +===================== + +Depending on how you want to set up |productName|, choose one of the following installation procedure. + + +.. _install_package: + +********************************* +Install the Package +********************************* + +Follow this procedure to prepare the environment and install the |productName| package. +Perform this procedure on every node in the federation. + +1. Install a Python 3.8 (>=3.6, <3.9) virtual environment using venv. + + See the `Venv installation guide `_ for details. + +2. Create a new Virtualenv environment for the project. + + .. code-block:: console + + python3 -m venv venv + +3. Activate the virtual environment. + + .. code-block:: console + + source venv/bin/activate + +4. Install the |productName| package. + + A. Installation from PyPI: + + .. code-block:: console + + python -m pip install openfl + + B. Installation from source: + + #. Clone the |productName| repository: + + .. code-block:: console + + git clone https://github.com/intel/openfl.git + + + #. Install build tools, before installing |productName|: + + .. code-block:: console + + python -m pip install -U pip setuptools wheel + cd openfl/ + python -m pip install . + + + +5. Run the :code:`fx` command in the virtual environment to confirm |productName| is installed. + + .. figure:: ../images/fx_help.png + :scale: 70 % + +.. centered:: Output of the fx Command + + +.. _install_docker: + +**************************************** +|productName| with Docker\* \ +**************************************** + +Follow this procedure to download or build a Docker\*\ image of |productName|, which you can use to run your federation in an isolated environment. + +.. note:: + + The Docker\* \ version of |productName| is to provide an isolated environment complete with the prerequisites to run a federation. When the execution is over, the container can be destroyed and the results of the computation will be available on a directory on the local host. + +1. Install Docker on all nodes in the federation. + + See the `Docker installation guide `_ for details. + +2. Check that Docker is running properly with the *Hello World* command: + + .. code-block:: console + + $ docker run hello-world + Hello from Docker! + This message shows that your installation appears to be working correctly. + ... + ... + ... + +3. Build an image from the latest official |productName| release: + + .. code-block:: console + + docker pull intel/openfl + + If you prefer to build an image from a specific commit or branch, perform the following commands: + + .. code-block:: console + + git clone https://github.com/intel/openfl.git + cd openfl + docker build -f openfl-docker/Dockerfile.base . \ No newline at end of file diff --git a/docs/get_started/quickstart.rst b/docs/get_started/quickstart.rst new file mode 100644 index 0000000000..de1687d898 --- /dev/null +++ b/docs/get_started/quickstart.rst @@ -0,0 +1,39 @@ +.. # Copyright (C) 2020-2023 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +.. _quick_start: + +===================== +Quick Start +===================== + +|productName| has a variety of APIs to choose from when setting up and running a federation. +In this quick start guide, we will demonstrate how to run a simple federated learning example using the Task Runner API and Hello Federation script + +.. note:: + + The example used in this section is designed primarily to demonstrate functionality of the package and its components. It is not the recommended method for running a real world federation. + + See :ref:`openfl_examples` for details. + +.. _hello_federation: + +********************************* +Hello Federation +********************************* +.. note:: + + Ensure you have installed the |productName| package. + + See :ref:`install_package` for details. + +We will use the `"Hello Federation" python script `_ to quickly create a federation (an aggregator node and two collaborator nodes) to test the project pipeline. + +.. literalinclude:: ../../tests/github/test_hello_federation.py + :language: python + +Run the script + +.. code-block:: console + + python test_hello_federation.py \ No newline at end of file diff --git a/docs/images/workflow_interface.png b/docs/images/workflow_interface.png new file mode 100644 index 0000000000..896270379b Binary files /dev/null and b/docs/images/workflow_interface.png differ diff --git a/docs/index.rst b/docs/index.rst index 7cd39c2caa..4222672b44 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,42 +1,77 @@ -.. # Copyright (C) 2020-2023 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0. - - -.. Documentation master file, created by - sphinx-quickstart on Thu Oct 24 15:07:19 2019. - You can adapt this file completely to your liking, but it should at least - contain the root `toctree` directive. - -********************************************************************* -Welcome to the Open Federated Learning (|productName|) Documentation! -********************************************************************* - -Open Federated Learning (|productName|) is a Python\* \ 3 library for federated learning that enables organizations to collaboratively train a model without sharing sensitive information. - -|productName| is Deep Learning framework-agnostic. -Training of statistical models may be done with any deep learning framework, such as -`TensorFlow `_\* \ or `PyTorch `_\*\, via a plugin mechanism. - - -|productName| is a community supported project, originally developed by Intel Labs and the Intel Internet of Things Group. The team would like to encourage any contributions, notes, or requests to improve the documentation. - -Looking for the Open Flash Library project also referred to as OpenFL? Find it [here](https://www.openfl.org/)! - -.. toctree:: - :maxdepth: 2 - :caption: Contents: - - manual - openfl - troubleshooting - notices_and_disclaimers - openfl_api - - -Indices and tables -================== - -* :ref:`genindex` -* :ref:`modindex` - -.. * :ref:`search` +.. # Copyright (C) 2020-2024 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0. + + +.. Documentation master file, created by + sphinx-quickstart on Thu Oct 24 15:07:19 2019. + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +===================================================================== +Welcome to the Open Federated Learning (|productName|) Documentation! +===================================================================== + +Open Federated Learning (|productName|) is a Python\* \ 3 library for federated learning that enables organizations to collaboratively train a model without sharing sensitive information. + +|productName| is Deep Learning framework-agnostic. +Training of statistical models may be done with any deep learning framework, such as +`TensorFlow `_\* \ or `PyTorch `_\*\, via a plugin mechanism. + + +|productName| is a community supported project, originally developed by Intel Labs and the Intel Internet of Things Group. The team would like to encourage any contributions, notes, or requests to improve the documentation. + +Looking for the Open Flash Library project also referred to as OpenFL? Find it `here `_! + +.. toctree:: + :hidden: + :caption: ABOUT + :maxdepth: 2 + + about/overview + about/features + about/releases + about/blogs_publications + about/license + about/notices_and_disclaimers + +.. toctree:: + :hidden: + :caption: GET STARTED + :maxdepth: 2 + + get_started/installation + get_started/quickstart + get_started/examples + +.. toctree:: + :hidden: + :caption: DEVELOPER GUIDE + :maxdepth: 2 + + developer_guide/manual + developer_guide/openfl_structure + +.. toctree:: + :hidden: + :caption: DEVELOPER REFERENCE + :maxdepth: 2 + + developer_ref/api_documentation + developer_ref/troubleshooting + +.. toctree:: + :hidden: + :caption: CONTRIBUTING GUIDELINES + :maxdepth: 2 + + contributing_guidelines/contributing + openfl_api + + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` + +.. * :ref:`search` \ No newline at end of file diff --git a/docs/manual.rst b/docs/manual.rst deleted file mode 100644 index a0a7254c76..0000000000 --- a/docs/manual.rst +++ /dev/null @@ -1,42 +0,0 @@ -.. # Copyright (C) 2020-2023 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -****** -Manual -****** - -What is Open Federated Learning (|productName|): - -- :doc:`overview` - -Establish a federation with |productName|: - -- :doc:`install` -- :doc:`running_the_federation` - -Customize the federation: - -- :doc:`source/utilities/utilities` -- :doc:`advanced_topics` - -Get familiar with the APIs: - -- :doc:`source/workflow/running_the_federation.tutorial` - -Explore new and experimental features: - -- :doc:`experimental_features` - -.. toctree:: - :maxdepth: 2 - :hidden: - - overview - install - running_the_federation - running_the_federation_with_gandlf - federated_evaluation - source/utilities/utilities - advanced_topics - source/workflow/running_the_federation.tutorial - experimental_features diff --git a/docs/source/openfl/director_envoy.mmd b/docs/mermaid/director_envoy.mmd similarity index 100% rename from docs/source/openfl/director_envoy.mmd rename to docs/mermaid/director_envoy.mmd diff --git a/docs/source/openfl/envoy_representation_and_RPCs.mmd b/docs/mermaid/envoy_representation_and_RPCs.mmd similarity index 100% rename from docs/source/openfl/envoy_representation_and_RPCs.mmd rename to docs/mermaid/envoy_representation_and_RPCs.mmd diff --git a/docs/source/openfl/experiment_representation_and_RPCs.mmd b/docs/mermaid/experiment_representation_and_RPCs.mmd similarity index 100% rename from docs/source/openfl/experiment_representation_and_RPCs.mmd rename to docs/mermaid/experiment_representation_and_RPCs.mmd diff --git a/docs/requirements-docs.txt b/docs/requirements-docs.txt index 87d8bb612e..f9a47f5bb4 100644 --- a/docs/requirements-docs.txt +++ b/docs/requirements-docs.txt @@ -1,4 +1,4 @@ -# Copyright (C) 2020-2023 Intel Corporation +# Copyright (C) 2020-2024 Intel Corporation # SPDX-License-Identifier: Apache-2.0 sphinx-rtd-theme sphinx-prompt @@ -6,3 +6,4 @@ sphinx_substitution_extensions sphinxcontrib-mermaid pygments>=2.7.4 # not directly required, pinned by Snyk to avoid a vulnerability sphinx>=3.0.4 # not directly required, pinned by Snyk to avoid a vulnerability +recommonmark \ No newline at end of file diff --git a/docs/supported_aggregation_algorithms.rst b/docs/supported_aggregation_algorithms.rst deleted file mode 100644 index 975239b21e..0000000000 --- a/docs/supported_aggregation_algorithms.rst +++ /dev/null @@ -1,49 +0,0 @@ -.. # Copyright (C) 2020-2023 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -********************************* -Supported aggregation algorithms -********************************* -=========== -FedAvg -=========== -Default aggregation algorithm in OpenFL. -Multiplies local model weights with relative data size and averages this multiplication result. - -========= -FedProx -========= -Paper: https://arxiv.org/abs/1812.06127 - -FedProx in OpenFL is implemented as a custom optimizer for PyTorch/TensorFlow. In order to use FedProx, do the following: - -1. PyTorch: - - - replace your optimizer with SGD-based :class:`openfl.utilities.optimizers.torch.FedProxOptimizer` - or Adam-based :class:`openfl.utilities.optimizers.torch.FedProxAdam`. - Also, you should save model weights for the next round via calling `.set_old_weights()` method of the optimizer - before the training epoch. - -2. TensorFlow: - - - replace your optimizer with SGD-based :py:class:`openfl.utilities.optimizers.keras.FedProxOptimizer`. - -For more details, see :code:`openfl-tutorials/Federated_FedProx_*_MNIST_Tutorial.ipynb` where * is the framework name. - -========= -FedOpt -========= -Paper: https://arxiv.org/abs/2003.00295 - -FedOpt in OpenFL: :ref:`adaptive_aggregation_functions` - -========== -FedCurv -========== -Paper: https://arxiv.org/abs/1910.07796 - -Requires PyTorch >= 1.9.0. Other frameworks are not supported yet. - -Use :py:class:`openfl.utilities.fedcurv.torch.FedCurv` to override train function using :code:`.get_penalty()`, :code:`.on_train_begin()`, and :code:`.on_train_end()` methods. -In addition, you should override default :code:`AggregationFunction` of the train task with :class:`openfl.interface.aggregation_functions.FedCurvWeightedAverage`. -See :code:`PyTorch_Histology_FedCurv` tutorial in :code:`openfl-tutorials/interactive_api` directory for more details. \ No newline at end of file