diff --git a/.github/workflows/experimental_workflow_tests.yml b/.github/workflows/experimental_workflow_tests.yml index fcbe9116ea..ab217cc6a5 100644 --- a/.github/workflows/experimental_workflow_tests.yml +++ b/.github/workflows/experimental_workflow_tests.yml @@ -80,3 +80,6 @@ jobs: - name: Workflow - Collaborator Subset (Ray Backend) run: | python tests/github/experimental/testflow_subset_of_collaborators.py ray + - name: Test Experimental Aggregator Based Workflow API + run: | + python -m tests.github.experimental.workspace.test_experimental_agg_based_workflow --custom_template tests/github/experimental/workspace/testcase_datastore_cli --fed_workspace aggregator --col col1 --col col2 --rounds-to-train 1 diff --git a/README.md b/README.md index 8f26ba6e04..b4df98a3fe 100644 --- a/README.md +++ b/README.md @@ -126,4 +126,3 @@ This project is licensed under [Apache License Version 2.0](LICENSE). By contrib publisher={IOP Publishing} } ``` - diff --git a/docs/_static/css/accessibility_overrides.css b/docs/_static/css/accessibility_overrides.css index 745bf23f04..39d8c32da3 100644 --- a/docs/_static/css/accessibility_overrides.css +++ b/docs/_static/css/accessibility_overrides.css @@ -199,4 +199,4 @@ font-size:0.875rem; } -} \ No newline at end of file +} diff --git a/openfl-tutorials/experimental/Privacy_Meter/cifar10_PM.py b/openfl-tutorials/experimental/Privacy_Meter/cifar10_PM.py index 8a185065f2..33ba6c15ab 100644 --- a/openfl-tutorials/experimental/Privacy_Meter/cifar10_PM.py +++ b/openfl-tutorials/experimental/Privacy_Meter/cifar10_PM.py @@ -656,8 +656,9 @@ def end(self): args = argparser.parse_args() # Setup participants - # Set `num_gpus=0.0` to `num_gpus=0.3` to run on GPU - aggregator = Aggregator(num_gpus=0.0) + # If running with GPU and 1 GPU is available then + # Set `num_gpus=0.3` to run on GPU + aggregator = Aggregator() collaborator_names = ["Portland", "Seattle"] diff --git a/openfl-tutorials/experimental/Vertical_FL/Workflow_Interface_VFL_Two_Party.ipynb b/openfl-tutorials/experimental/Vertical_FL/Workflow_Interface_VFL_Two_Party.ipynb index 5ba5a6fbce..02440dc273 100644 --- a/openfl-tutorials/experimental/Vertical_FL/Workflow_Interface_VFL_Two_Party.ipynb +++ b/openfl-tutorials/experimental/Vertical_FL/Workflow_Interface_VFL_Two_Party.ipynb @@ -8,7 +8,9 @@ "outputs": [], "source": [ "!pip install git+https://github.com/intel/openfl.git\n", - "!pip install -r ../requirements_workflow_interface.txt" + "!pip install -r ../requirements_workflow_interface.txt\n", + "!pip install torch\n", + "!pip install torchvision" ] }, { @@ -266,7 +268,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.16" + "version": "3.8.17" } }, "nbformat": 4, diff --git a/openfl-tutorials/experimental/Vertical_FL/Workflow_Interface_VFL_Two_Party_Workspace_Creation_from_JupyterNotebook.ipynb b/openfl-tutorials/experimental/Vertical_FL/Workflow_Interface_VFL_Two_Party_Workspace_Creation_from_JupyterNotebook.ipynb new file mode 100644 index 0000000000..b9c6a59d67 --- /dev/null +++ b/openfl-tutorials/experimental/Vertical_FL/Workflow_Interface_VFL_Two_Party_Workspace_Creation_from_JupyterNotebook.ipynb @@ -0,0 +1,480 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0d7f65ec-d3d8-4c91-99a4-277c160cb33b", + "metadata": {}, + "source": [ + "# Workflow Interface VFL Two Party: Workspace Creation from Jupyter Notebook" + ] + }, + { + "cell_type": "markdown", + "id": "d83a6c7f-4816-472a-9a46-ed0a2ddec8ef", + "metadata": {}, + "source": [ + "This tutorial demonstrates the methodology to convert a Federated Learning experiment developed in Jupyter Notebook into a Workspace that can be deployed using Aggregator Based Workflow\n", + "\n", + "OpenFL experimental Workflow Interface enables the user to simulate a Federated Learning experiment using **LocalRuntime**. Once the simulation is ready, the methodology described in this tutorial enables the user to convert this experiment into an OpenFL workspace that can be deployed using the Aggregator-Based-Workflow\n", + "\n", + "##### High Level Overview of Methodology\n", + "1. User annotates the relevant cells of the Jupyter notebook with `#| export` directive\n", + "2. We then Leverage `nbdev` functionality to export these annotated cells of Jupyter notebook into a Python script\n", + "3. Utilize OpenFL experimental module `WorkspaceExport` to convert the Python script into a OpenFL workspace\n", + "4. User can utilize the experimental `fx` commands to deploy and run the federation seamlessly\n", + "\n", + "\n", + "The methodology is described using an existing [OpenFL Two Party VFL Tutorial](https://github.com/securefederatedai/openfl/blob/develop/openfl-tutorials/experimental/Vertical_FL/Workflow_Interface_VFL_Two_Party.ipynb). Let's get started !" + ] + }, + { + "cell_type": "markdown", + "id": "054e58b4-1ecf-475e-ac5d-3b972ee25431", + "metadata": {}, + "source": [ + "## Getting Started" + ] + }, + { + "cell_type": "markdown", + "id": "3d918e19-90ac-4ab3-a678-0b2d94debaac", + "metadata": {}, + "source": [ + "Initially, we start by specifying the module where cells marked with the `#| export` directive will be automatically exported. \n", + "\n", + "In the following cell, `#| default_exp experiment `indicates that the exported file will be named 'experiment'. This name can be modified based on user's requirement & preferences" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0e35ccea-bb36-4d73-8dcf-a34e4d84908b", + "metadata": {}, + "outputs": [], + "source": [ + "#| default_exp experiment" + ] + }, + { + "cell_type": "markdown", + "id": "d65f17c2-a772-4f62-848e-9ba6ad1ab128", + "metadata": {}, + "source": [ + "We start by installing OpenFL and dependencies of the workflow interface \n", + "> These dependencies are required to be exported and become the requirements for the Federated Learning Workspace " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6867c928-430d-4710-ac2a-4f4e2c86ab0f", + "metadata": {}, + "outputs": [], + "source": [ + "#| export\n", + "\n", + "!pip install git+https://github.com/intel/openfl.git\n", + "!pip install -r ../requirements_workflow_interface.txt\n", + "!pip install torch\n", + "!pip install torchvision" + ] + }, + { + "cell_type": "markdown", + "id": "7cbcb941-93b8-4427-85ae-0c17439a81d7", + "metadata": {}, + "source": [ + "We now define our dataloaders, model, optimizer, and some helper functions like we would for any other deep learning experiment \n", + "\n", + "> This cell and all the subsequent cells are important ingredients of the Federated Learning experiment and therefore annotated with the `#| export` directive" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "difficult-madrid", + "metadata": {}, + "outputs": [], + "source": [ + "#| export\n", + "\n", + "from copy import deepcopy\n", + "import numpy as np\n", + "import torch\n", + "import torchvision\n", + "from time import time\n", + "from torchvision import datasets, transforms\n", + "from torch import nn, optim\n", + "\n", + "from openfl.experimental.interface import FLSpec, Aggregator, Collaborator\n", + "from openfl.experimental.runtime import LocalRuntime\n", + "from openfl.experimental.placement import aggregator, collaborator\n", + "\n", + "# Data preprocessing\n", + "transform = transforms.Compose([transforms.ToTensor(),\n", + " transforms.Normalize((0.5,), (0.5,)),\n", + " ])\n", + "trainset = datasets.MNIST('mnist', download=True,\n", + " train=True, transform=transform)\n", + "trainloader = torch.utils.data.DataLoader(\n", + " trainset, batch_size=2048, shuffle=False)\n", + "\n", + "testset = datasets.MNIST('mnist', download=True,\n", + " train=False, transform=transform)\n", + "testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)\n", + "\n", + "torch.manual_seed(0) # Define our model segments\n", + "input_size = 784\n", + "hidden_sizes = [128, 640]\n", + "output_size = 10\n", + "\n", + "label_model = nn.Sequential(\n", + " nn.Linear(hidden_sizes[1], output_size),\n", + " nn.LogSoftmax(dim=1)\n", + ")\n", + "\n", + "label_model_optimizer = optim.SGD(label_model.parameters(), lr=0.03)\n", + "\n", + "data_model = nn.Sequential(\n", + " nn.Linear(input_size, hidden_sizes[0]),\n", + " nn.ReLU(),\n", + " nn.Linear(hidden_sizes[0], hidden_sizes[1]),\n", + " nn.ReLU(),\n", + ")\n", + "\n", + "data_model_optimizer = optim.SGD(data_model.parameters(), lr=0.03)" + ] + }, + { + "cell_type": "markdown", + "id": "b4cae5c4-6b9c-4dc1-bde0-29e4f90bf414", + "metadata": {}, + "source": [ + "Now we define the workflow for Vertical Federated Learning" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "forward-world", + "metadata": {}, + "outputs": [], + "source": [ + "#| export\n", + "\n", + "class VerticalTwoPartyFlow(FLSpec):\n", + "\n", + " def __init__(self, total_rounds, batch_num=0):\n", + " super().__init__()\n", + " self.batch_num = batch_num\n", + " self.total_rounds = total_rounds\n", + " self.round = 0\n", + " \n", + "\n", + " @aggregator\n", + " def start(self):\n", + " if self.batch_num == 0:\n", + " print(f'Starting round {self.round}')\n", + " self.data_remaining=True\n", + " self.collaborators = self.runtime.collaborators\n", + " else:\n", + " print(f'Batch_num = {self.batch_num}')\n", + " # 1) Zero the gradients\n", + " self.label_model_optimizer.zero_grad()\n", + " self.next(self.data_model_forward_pass, foreach='collaborators')\n", + "\n", + "\n", + " @collaborator\n", + " def data_model_forward_pass(self):\n", + " self.data_model_output_local = ''\n", + " for idx, (images, _) in enumerate(self.trainloader):\n", + " if idx < self.batch_num:\n", + " continue\n", + " self.data_model_optimizer.zero_grad()\n", + " images = images.view(images.shape[0], -1)\n", + " model_output = self.data_model(images)\n", + " self.data_model_output_local = model_output\n", + " self.data_model_output = model_output.detach().requires_grad_()\n", + " break\n", + " self.next(self.label_model_forward_pass)\n", + " #exclude=['data_model_output_local'])\n", + "\n", + " @aggregator\n", + " def label_model_forward_pass(self, inputs):\n", + " criterion = nn.NLLLoss()\n", + " self.grad_to_local = []\n", + " total_loss = 0\n", + " self.data_remaining = False\n", + " for idx, (_, labels) in enumerate(self.trainloader):\n", + " if idx < self.batch_num:\n", + " continue\n", + " self.data_remaining = True\n", + " pred = self.label_model(inputs[0].data_model_output)\n", + " loss = criterion(pred, labels)\n", + " loss.backward()\n", + " self.grad_to_local = inputs[0].data_model_output.grad.clone()\n", + " self.label_model_optimizer.step()\n", + " total_loss += loss\n", + " break\n", + " print(f'Total loss = {total_loss}') # / len(self.trainloader)}')\n", + " self.next(self.data_model_backprop, foreach='collaborators')\n", + "\n", + " @collaborator\n", + " def data_model_backprop(self):\n", + " if self.data_remaining:\n", + " self.data_model_optimizer = optim.SGD(self.data_model.parameters(), lr=0.03)\n", + " self.data_model_optimizer.zero_grad()\n", + " self.data_model_output_local.backward(self.grad_to_local)\n", + " self.data_model_optimizer.step()\n", + " self.next(self.join)\n", + "\n", + " @aggregator\n", + " def join(self, inputs):\n", + " print(f'Join batch_num = {self.batch_num}')\n", + " self.batch_num += 1\n", + " self.next(self.check_round_completion)\n", + "\n", + " @aggregator\n", + " def check_round_completion(self):\n", + " if self.round == self.total_rounds:\n", + " self.next(self.end)\n", + " else:\n", + " if self.data_remaining:\n", + " print(f'Continuing training loop: batch_num = {self.batch_num}')\n", + " self.next(self.start)\n", + " else:\n", + " print('Start next round')\n", + " self.round += 1\n", + " self.batch_num = 0\n", + " self.next(self.start)\n", + "\n", + " @aggregator\n", + " def end(self):\n", + " print(f'This is the end of the flow')\n" + ] + }, + { + "cell_type": "markdown", + "id": "5806d963-60a8-49be-bafe-0b8d2e027eb6", + "metadata": {}, + "source": [ + "We now initialize private attributes of the aggregator and collaborator, simulation parameters (seed, batch-sizes, optimizer parameters) and create the `LocalRuntime`\n", + "\n", + "> NOTE: The aggregator based workflow is case sensitive. Therefore, the collaborator names should be registered in lowercase only." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "59aff1fc", + "metadata": {}, + "outputs": [], + "source": [ + "#| export\n", + "\n", + "# Setup participants\n", + "aggregator = Aggregator()\n", + "\n", + "def callable_to_initialize_aggregator_private_attributes(train_loader,label_model,label_model_optimizer):\n", + " return {\"trainloader\": train_loader,\n", + " \"label_model\" : label_model,\n", + " \"label_model_optimizer\":label_model_optimizer\n", + " } \n", + "\n", + "# Setup aggregator private attributes via callable function\n", + "aggregator = Aggregator(\n", + " name=\"agg\",\n", + " private_attributes_callable=callable_to_initialize_aggregator_private_attributes,\n", + " train_loader = trainloader,\n", + " label_model=label_model,\n", + " label_model_optimizer=label_model_optimizer\n", + ")\n", + "\n", + "# Setup collaborators private attributes via callable function\n", + "collaborator_names = ['Portland']\n", + "\n", + "def callable_to_initialize_collaborator_private_attributes(index,data_model,data_model_optimizer,train_loader):\n", + " return {\n", + " \"data_model\": data_model,\n", + " \"data_model_optimizer\": data_model_optimizer,\n", + " \"trainloader\" : deepcopy(train_loader)\n", + " }\n", + "\n", + "collaborators = []\n", + "for idx, collaborator_name in enumerate(collaborator_names):\n", + " collaborators.append(\n", + " Collaborator(\n", + " name=collaborator_name,\n", + " private_attributes_callable=callable_to_initialize_collaborator_private_attributes,\n", + " index=idx,\n", + " data_model = data_model,\n", + " data_model_optimizer = data_model_optimizer,\n", + " train_loader = trainloader\n", + " )\n", + " )\n", + "\n", + "local_runtime = LocalRuntime(\n", + " aggregator=aggregator, collaborators=collaborators, backend='single_process')\n", + "print(f'Local runtime collaborators = {local_runtime.collaborators}')\n", + "\n", + "\n", + "total_rounds = 5\n", + "vflow = VerticalTwoPartyFlow(total_rounds=total_rounds)\n", + "vflow.runtime = local_runtime\n", + "# vflow.run()\n" + ] + }, + { + "cell_type": "markdown", + "id": "d5b37d8e-e271-4d72-b8eb-9c357927ebff", + "metadata": {}, + "source": [ + "## Workspace creation" + ] + }, + { + "cell_type": "markdown", + "id": "3777b993-3d8f-404e-aa92-e1ad6f497d41", + "metadata": {}, + "source": [ + "The following cells convert the Jupyter notebook into a Python script and create a Template Workspace that can be utilized by Aggregator based Workflow\n", + "> NOTE: Only Notebook cells that were marked with `#| export` directive shall be included in this Python script\n", + "\n", + "We first import `WorkspaceExport` module and execute `WorkspaceExport.export()` that converts the notebook and generates the template workspace. User is required to specify: \n", + "1. `notebook_path`: path of the Jupyter notebook that is required to be converted\n", + "2. `output_workspace`: path where the converted workspace is stored" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "statutory-prime", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from openfl.experimental.workspace_export import WorkspaceExport\n", + "\n", + "WorkspaceExport.export(\n", + " notebook_path='./Workflow_Interface_VFL_Two_Party_Workspace_Creation_from_JupyterNotebook.ipynb',\n", + " output_workspace=f\"/home/{os.environ['USER']}/generated-workspace\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "9b789fa6-4c77-4eb0-84c6-bc91a87d86a3", + "metadata": {}, + "source": [ + "## Workspace usage\n", + "\n", + "The workspace crated above can be used by the Aggregator based workflow by using the `fx` commands in the following manner" + ] + }, + { + "cell_type": "markdown", + "id": "92c96636-57f0-474d-aac8-158a80e08f0d", + "metadata": {}, + "source": [ + "**Workspace Activation and Creation**\n", + "1. Activate the experimental aggregator-based workflow:\n", + "\n", + " `fx experimental activate`\n", + "\n", + " This will create an 'experimental' directory under ~/.openfl/\n", + "3. Create a workspace using the custom template:\n", + "\n", + " `fx workspace create --prefix workspace_path --custom_template /home/$USER/generated-workspace`\n", + "4. Change to the workspace directory:\n", + "\n", + " `cd workspace_path`\n", + "\n", + "**Workspace Initialization and Certification**\n", + "1. Initialize the FL plan and auto-populate the fully qualified domain name (FQDN) of the aggregator node:\n", + "\n", + " `fx plan initialize`\n", + "2. Certify the workspace:\n", + "\n", + " `fx workspace certify`\n", + " \n", + "**Aggregator Setup and Workspace Export**\n", + "1. Run the aggregator certificate creation command:\n", + "\n", + " `fx aggregator generate-cert-request`\n", + "\n", + " `fx aggregator certify`\n", + "2. Export the workspace for collaboration:\n", + "\n", + " `fx workspace export`\n", + " \n", + "**Collaborator Node Setup**\n", + "\n", + "***On the Collaborator Node:***\n", + "\n", + "1. Copy the workspace archive from the aggregator node to the collaborator nodes. Import the workspace archive:\n", + "\n", + " `fx workspace import --archive WORKSPACE.zip`\n", + " \n", + " `cd workspace_path`\n", + "3. Generate a collaborator certificate request:\n", + "\n", + " `fx collaborator generate-cert-request -n {COL_LABEL}`\n", + "\n", + "***On the Aggregator Node (Certificate Authority):***\n", + "\n", + "3. Sign the Collaborator Certificate Signing Request (CSR) Package from collaborator nodes:\n", + "\n", + " `fx collaborator certify --request-pkg /PATH/TO/col_{COL_LABEL}_to_agg_cert_request.zip`\n", + "\n", + "***On the Collaborator Node:***\n", + "\n", + "4. Import the signed certificate and certificate chain into the workspace:\n", + "\n", + " `fx collaborator certify --import /PATH/TO/agg_to_col_{COL_LABEL}_signed_cert.zip`\n", + " \n", + "**Final Workspace Activation**\n", + "***On the Aggregator Node:***\n", + "\n", + "1. Start the Aggregator:\n", + "\n", + " `fx aggregator start`\n", + " \n", + " The Aggregator is now running and waiting for Collaborators to connect.\n", + "\n", + "***On the Collaborator Nodes:***\n", + "\n", + "2. Run the Collaborator:\n", + "\n", + " `fx collaborator start -n {COL_LABEL}`\n", + "\n", + "**Workspace Deactivation**\n", + "1. To deactivate the experimental aggregator-based workflow and switch back to original aggregator-based workflow:\n", + "\n", + " `fx experimental deactivate`\n", + "\n", + " This will remove the 'experimental' directory under ~/.openfl/" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "v_o", + "language": "python", + "name": "v_o" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openfl-tutorials/experimental/Vertical_FL/Workflow_Interface_Vertical_FL.ipynb b/openfl-tutorials/experimental/Vertical_FL/Workflow_Interface_Vertical_FL.ipynb index 03bd458193..6fd0d09ee0 100644 --- a/openfl-tutorials/experimental/Vertical_FL/Workflow_Interface_Vertical_FL.ipynb +++ b/openfl-tutorials/experimental/Vertical_FL/Workflow_Interface_Vertical_FL.ipynb @@ -166,7 +166,7 @@ "metadata": {}, "outputs": [], "source": [ - "from metaflow import Metaflow, Flow, Task, Step" + "from metaflow import Metaflow, Flow, Step, Task" ] }, { diff --git a/openfl-tutorials/experimental/Workflow_Interface_1001_Workspace_Creation_from_JupyterNotebook.ipynb b/openfl-tutorials/experimental/Workflow_Interface_1001_Workspace_Creation_from_JupyterNotebook.ipynb new file mode 100644 index 0000000000..dd3926f52e --- /dev/null +++ b/openfl-tutorials/experimental/Workflow_Interface_1001_Workspace_Creation_from_JupyterNotebook.ipynb @@ -0,0 +1,1005 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "dc13070c", + "metadata": {}, + "source": [ + "# Workflow Interface 1001: Workspace Creation from Jupyter Notebook" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "8f28c451", + "metadata": {}, + "source": [ + "This tutorial demonstrates the methodology to convert a Federated Learning experiment developed in Jupyter Notebook into a Workspace that can be deployed using Aggregator Based Workflow\n", + "\n", + "OpenFL experimental Workflow Interface enables the user to simulate a Federated Learning experiment using **LocalRuntime**. Once the simulation is ready, the methodology described in this tutorial enables the user to convert this experiment into an OpenFL workspace that can be deployed using the Aggregator-Based-Workflow\n", + "\n", + "##### High Level Overview of Methodology\n", + "1. User annotates the relevant cells of the Jupyter notebook with `#| export` directive\n", + "2. We then Leverage `nbdev` functionality to export these annotated cells of Jupyter notebook into a Python script\n", + "3. Utilize OpenFL experimental module `WorkspaceExport` to convert the Python script into a OpenFL workspace\n", + "4. User can utilize the experimental `fx` commands to deploy and run the federation seamlessly\n", + "\n", + "\n", + "The methodology is described using an existing [OpenFL Watermarking Tutorial](https://github.com/securefederatedai/openfl/blob/develop/openfl-tutorials/experimental/Workflow_Interface_301_MNIST_Watermarking.ipynb). Let's get started !\n", + "\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "a4394089", + "metadata": {}, + "source": [ + "# Getting Started" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "857f9995", + "metadata": {}, + "source": [ + "Initially, we start by specifying the module where cells marked with the `#| export` directive will be automatically exported. \n", + "\n", + "In the following cell, `#| default_exp experiment `indicates that the exported file will be named 'experiment'. This name can be modified based on user's requirement & preferences" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d79eacbd", + "metadata": {}, + "outputs": [], + "source": [ + "#| default_exp experiment" + ] + }, + { + "cell_type": "markdown", + "id": "62449b5f", + "metadata": {}, + "source": [ + "Once we have specified the name of the module, subsequent cells of the notebook need to be *appended* by the `#| export` directive as shown below. User should ensure that *all* the notebook functionality required in the Federated Learning experiment is included in this directive" + ] + }, + { + "cell_type": "markdown", + "id": "2e19dcf2", + "metadata": {}, + "source": [ + "We start by installing OpenFL and dependencies of the workflow interface \n", + "> These dependencies are required to be exported and become the requirements for the Federated Learning Workspace " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f7475cba", + "metadata": {}, + "outputs": [], + "source": [ + "#| export\n", + "\n", + "!pip install git+https://github.com/intel/openfl.git\n", + "!pip install -r requirements_workflow_interface.txt\n", + "\n", + "!pip install matplotlib\n", + "!pip install torch\n", + "!pip install torchvision\n", + "!pip install git+https://github.com/pyviz-topics/imagen.git@master\n", + "!pip install holoviews==1.15.4\n", + "\n", + "\n", + "# Uncomment this if running in Google Colab\n", + "#!pip install -r https://raw.githubusercontent.com/intel/openfl/develop/openfl-tutorials/experimental/requirements_workflow_interface.txt\n", + "#import os\n", + "#os.environ[\"USERNAME\"] = \"colab\"" + ] + }, + { + "cell_type": "markdown", + "id": "9a6ae8e2", + "metadata": {}, + "source": [ + "We now define our dataloaders, model, optimizer, and some helper functions like we would for any other deep learning experiment \n", + "\n", + "> This cell and all the subsequent cells are important ingredients of the Federated Learning experiment and therefore annotated with the `#| export` directive" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9bd8ac2d", + "metadata": {}, + "outputs": [], + "source": [ + "# | export\n", + "\n", + "import torch.nn as nn\n", + "import torch.nn.functional as F\n", + "import torch.optim as optim\n", + "import torch\n", + "import torchvision\n", + "import numpy as np\n", + "import random\n", + "import pathlib\n", + "import os\n", + "import matplotlib\n", + "import matplotlib.pyplot as plt\n", + "import PIL.Image as Image\n", + "import imagen as ig\n", + "import numbergen as ng\n", + "import os\n", + "\n", + "random_seed = 1\n", + "torch.backends.cudnn.enabled = False\n", + "torch.manual_seed(random_seed)\n", + "\n", + "# MNIST Train and Test datasets\n", + "mnist_train = torchvision.datasets.MNIST(\n", + " \"./files/\",\n", + " train=True,\n", + " download=True,\n", + " transform=torchvision.transforms.Compose(\n", + " [\n", + " torchvision.transforms.ToTensor(),\n", + " torchvision.transforms.Normalize((0.1307,), (0.3081,)),\n", + " ]\n", + " ),\n", + ")\n", + "\n", + "mnist_test = torchvision.datasets.MNIST(\n", + " \"./files/\",\n", + " train=False,\n", + " download=True,\n", + " transform=torchvision.transforms.Compose(\n", + " [\n", + " torchvision.transforms.ToTensor(),\n", + " torchvision.transforms.Normalize((0.1307,), (0.3081,)),\n", + " ]\n", + " ),\n", + ")\n", + "\n", + "\n", + "class Net(nn.Module):\n", + " def __init__(self, dropout=0.0):\n", + " super(Net, self).__init__()\n", + " self.dropout = dropout\n", + " self.block = nn.Sequential(\n", + " nn.Conv2d(1, 32, 2),\n", + " nn.MaxPool2d(2),\n", + " nn.ReLU(),\n", + " nn.Conv2d(32, 64, 2),\n", + " nn.MaxPool2d(2),\n", + " nn.ReLU(),\n", + " nn.Conv2d(64, 128, 2),\n", + " nn.ReLU(),\n", + " )\n", + " self.fc1 = nn.Linear(128 * 5**2, 200)\n", + " self.fc2 = nn.Linear(200, 10)\n", + " self.relu = nn.ReLU()\n", + " self.dropout = nn.Dropout(p=dropout)\n", + "\n", + " def forward(self, x):\n", + " x = self.dropout(x)\n", + " out = self.block(x)\n", + " out = out.view(-1, 128 * 5**2)\n", + " out = self.dropout(out)\n", + " out = self.relu(self.fc1(out))\n", + " out = self.dropout(out)\n", + " out = self.fc2(out)\n", + " return F.log_softmax(out, 1)\n", + "\n", + "\n", + "def inference(network, test_loader):\n", + " network.eval()\n", + " correct = 0\n", + " with torch.no_grad():\n", + " for data, target in test_loader:\n", + " output = network(data)\n", + " pred = output.data.max(1, keepdim=True)[1]\n", + " correct += pred.eq(target.data.view_as(pred)).sum()\n", + " accuracy = float(correct / len(test_loader.dataset))\n", + " return accuracy\n", + "\n", + "\n", + "def train_model(model, optimizer, data_loader, entity, round_number, log=False):\n", + " # Helper function to train the model\n", + " train_loss = 0\n", + " log_interval = 20\n", + " model.train()\n", + " for batch_idx, (X, y) in enumerate(data_loader):\n", + " optimizer.zero_grad()\n", + "\n", + " output = model(X)\n", + " loss = F.nll_loss(output, y)\n", + " loss.backward()\n", + "\n", + " optimizer.step()\n", + "\n", + " train_loss += loss.item() * len(X)\n", + " if batch_idx % log_interval == 0 and log:\n", + " print(\"{:<20} Train Epoch: {:<3} [{:<3}/{:<4} ({:<.0f}%)] Loss: {:<.6f}\".format(\n", + " entity,\n", + " round_number,\n", + " batch_idx * len(X),\n", + " len(data_loader.dataset),\n", + " 100.0 * batch_idx / len(data_loader),\n", + " loss.item(),\n", + " )\n", + " )\n", + " train_loss /= len(data_loader.dataset)\n", + " return train_loss" + ] + }, + { + "cell_type": "markdown", + "id": "e4d907d9", + "metadata": {}, + "source": [ + "Next we define the dataset required for watermarking" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bcad2624", + "metadata": {}, + "outputs": [], + "source": [ + "#| export\n", + "\n", + "watermark_dir = \"./files/watermark-dataset/MWAFFLE/\"\n", + "\n", + "\n", + "def generate_watermark(\n", + " x_size=28, y_size=28, num_class=10, num_samples_per_class=10, img_dir=watermark_dir\n", + "):\n", + " \"\"\"\n", + " Generate Watermark by superimposing a pattern on noisy background.\n", + "\n", + " Parameters\n", + " ----------\n", + " x_size: x dimension of the image\n", + " y_size: y dimension of the image\n", + " num_class: number of classes in the original dataset\n", + " num_samples_per_class: number of samples to be generated per class\n", + " img_dir: directory for saving watermark dataset\n", + "\n", + " Reference\n", + " ---------\n", + " WAFFLE: Watermarking in Federated Learning (https://arxiv.org/abs/2008.07298)\n", + "\n", + " \"\"\"\n", + " x_pattern = int(x_size * 2 / 3.0 - 1)\n", + " y_pattern = int(y_size * 2 / 3.0 - 1)\n", + "\n", + " np.random.seed(0)\n", + " for cls in range(num_class):\n", + " patterns = []\n", + " random_seed = 10 + cls\n", + " patterns.append(\n", + " ig.Line(\n", + " xdensity=x_pattern,\n", + " ydensity=y_pattern,\n", + " thickness=0.001,\n", + " orientation=np.pi * ng.UniformRandom(seed=random_seed),\n", + " x=ng.UniformRandom(seed=random_seed) - 0.5,\n", + " y=ng.UniformRandom(seed=random_seed) - 0.5,\n", + " scale=0.8,\n", + " )\n", + " )\n", + " patterns.append(\n", + " ig.Arc(\n", + " xdensity=x_pattern,\n", + " ydensity=y_pattern,\n", + " thickness=0.001,\n", + " orientation=np.pi * ng.UniformRandom(seed=random_seed),\n", + " x=ng.UniformRandom(seed=random_seed) - 0.5,\n", + " y=ng.UniformRandom(seed=random_seed) - 0.5,\n", + " size=0.33,\n", + " )\n", + " )\n", + "\n", + " pat = np.zeros((x_pattern, y_pattern))\n", + " for i in range(6):\n", + " j = np.random.randint(len(patterns))\n", + " pat += patterns[j]()\n", + " res = pat > 0.5\n", + " pat = res.astype(int)\n", + "\n", + " x_offset = np.random.randint(x_size - x_pattern + 1)\n", + " y_offset = np.random.randint(y_size - y_pattern + 1)\n", + "\n", + " for i in range(num_samples_per_class):\n", + " base = np.random.rand(x_size, y_size)\n", + " # base = np.zeros((x_input, y_input))\n", + " base[\n", + " x_offset : x_offset + pat.shape[0],\n", + " y_offset : y_offset + pat.shape[1],\n", + " ] += pat\n", + " d = np.ones((x_size, x_size))\n", + " img = np.minimum(base, d)\n", + " if not os.path.exists(img_dir + str(cls) + \"/\"):\n", + " os.makedirs(img_dir + str(cls) + \"/\")\n", + " plt.imsave(\n", + " img_dir + str(cls) + \"/wm_\" + str(i + 1) + \".png\",\n", + " img,\n", + " cmap=matplotlib.cm.gray,\n", + " )\n", + "\n", + "\n", + "# If the Watermark dataset does not exist, generate and save the Watermark images\n", + "watermark_path = pathlib.Path(watermark_dir)\n", + "if watermark_path.exists() and watermark_path.is_dir():\n", + " print(\n", + " f\"Watermark dataset already exists at: {watermark_path}. Proceeding to next step ... \"\n", + " )\n", + " pass\n", + "else:\n", + " print(f\"Generating Watermark dataset... \")\n", + " generate_watermark()\n", + "\n", + "\n", + "class WatermarkDataset(torch.utils.data.Dataset):\n", + " def __init__(self, images_dir, label_dir=None, transforms=None):\n", + " self.images_dir = os.path.abspath(images_dir)\n", + " self.image_paths = [\n", + " os.path.join(self.images_dir, d) for d in os.listdir(self.images_dir)\n", + " ]\n", + " self.label_paths = label_dir\n", + " self.transform = transforms\n", + " temp = []\n", + "\n", + " # Recursively counting total number of images in the directory\n", + " for image_path in self.image_paths:\n", + " for path in os.walk(image_path):\n", + " if len(path) <= 1:\n", + " continue\n", + " path = path[2]\n", + " for im_n in [image_path + \"/\" + p for p in path]:\n", + " temp.append(im_n)\n", + " self.image_paths = temp\n", + "\n", + " if len(self.image_paths) == 0:\n", + " raise Exception(f\"No file(s) found under {images_dir}\")\n", + "\n", + " def __len__(self):\n", + " return len(self.image_paths)\n", + "\n", + " def __getitem__(self, idx):\n", + " image_filepath = self.image_paths[idx]\n", + " image = Image.open(image_filepath)\n", + " image = image.convert(\"RGB\")\n", + " image = self.transform(image)\n", + " label = int(image_filepath.split(\"/\")[-2])\n", + "\n", + " return image, label\n", + "\n", + "\n", + "def get_watermark_transforms():\n", + " return torchvision.transforms.Compose(\n", + " [\n", + " torchvision.transforms.Grayscale(),\n", + " torchvision.transforms.Resize(28),\n", + " torchvision.transforms.ToTensor(),\n", + " torchvision.transforms.Normalize(mean=(0.5,), std=(0.5,)), # Normalize\n", + " ]\n", + " )\n", + "\n", + "\n", + "watermark_data = WatermarkDataset(\n", + " images_dir=watermark_dir,\n", + " transforms=get_watermark_transforms(),\n", + ")\n", + "\n", + "# Set display_watermark to True to display the Watermark dataset\n", + "display_watermark = False\n", + "if display_watermark:\n", + " # Inspect and plot the Watermark Images\n", + " wm_images = np.empty((100, 28, 28))\n", + " wm_labels = np.empty([100, 1], dtype=int)\n", + "\n", + " for i in range(len(watermark_data)):\n", + " img, label = watermark_data[i]\n", + " wm_labels[label * 10 + i % 10] = label\n", + " wm_images[label * 10 + i % 10, :, :] = img.numpy()\n", + "\n", + " fig = plt.figure(figsize=(120, 120))\n", + " for i in range(100):\n", + " plt.subplot(10, 10, i + 1)\n", + " plt.imshow(wm_images[i], interpolation=\"none\")\n", + " plt.title(\"Label: {}\".format(wm_labels[i]), fontsize=80)" + ] + }, + { + "cell_type": "markdown", + "id": "d0849d57", + "metadata": {}, + "source": [ + "Next we import the `FLSpec`, `LocalRuntime`, placement decorators (`aggregator/collaborator`)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "89cf4866", + "metadata": {}, + "outputs": [], + "source": [ + "#| export\n", + "\n", + "from copy import deepcopy\n", + "\n", + "from openfl.experimental.interface import FLSpec, Aggregator, Collaborator\n", + "from openfl.experimental.runtime import LocalRuntime\n", + "from openfl.experimental.placement import aggregator, collaborator\n", + "\n", + "def FedAvg(agg_model, models, weights=None):\n", + " state_dicts = [model.state_dict() for model in models]\n", + " state_dict = agg_model.state_dict()\n", + " for key in models[0].state_dict():\n", + " state_dict[key] = torch.from_numpy(np.average([state[key].numpy() for state in state_dicts],\n", + " axis=0, \n", + " weights=weights))\n", + " \n", + " agg_model.load_state_dict(state_dict)\n", + " return agg_model" + ] + }, + { + "cell_type": "markdown", + "id": "36ed5e31", + "metadata": {}, + "source": [ + "Let us now define the Workflow for Watermark embedding." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "52c4a752", + "metadata": {}, + "outputs": [], + "source": [ + "#| export\n", + "\n", + "class FederatedFlow_MNIST_Watermarking(FLSpec):\n", + " \"\"\"\n", + " This Flow demonstrates Watermarking on a Deep Learning Model in Federated Learning\n", + " Ref: WAFFLE: Watermarking in Federated Learning (https://arxiv.org/abs/2008.07298)\n", + " \"\"\"\n", + "\n", + " def __init__(\n", + " self,\n", + " model=None,\n", + " optimizer=None,\n", + " watermark_pretrain_optimizer=None,\n", + " watermark_retrain_optimizer=None,\n", + " round_number=0,\n", + " **kwargs,\n", + " ):\n", + " super().__init__(**kwargs)\n", + "\n", + " if model is not None:\n", + " self.model = model\n", + " self.optimizer = optimizer\n", + " self.watermark_pretrain_optimizer = watermark_pretrain_optimizer\n", + " self.watermark_retrain_optimizer = watermark_retrain_optimizer\n", + " else:\n", + " self.model = Net()\n", + " self.optimizer = optim.SGD(\n", + " self.model.parameters(), lr=learning_rate, momentum=momentum\n", + " )\n", + " self.watermark_pretrain_optimizer = optim.SGD(\n", + " self.model.parameters(),\n", + " lr=watermark_pretrain_learning_rate,\n", + " momentum=watermark_pretrain_momentum,\n", + " weight_decay=watermark_pretrain_weight_decay,\n", + " )\n", + " self.watermark_retrain_optimizer = optim.SGD(\n", + " self.model.parameters(), lr=watermark_retrain_learning_rate\n", + " )\n", + " self.round_number = round_number\n", + " self.watermark_pretraining_completed = False\n", + "\n", + " @aggregator\n", + " def start(self):\n", + " \"\"\"\n", + " This is the start of the Flow.\n", + " \"\"\"\n", + "\n", + " print(f\": Start of flow ... \")\n", + " self.collaborators = self.runtime.collaborators\n", + "\n", + " # Randomly select a fraction of actual collaborator every round\n", + " fraction = 0.5\n", + " if int(fraction * len(self.collaborators)) < 1:\n", + " raise Exception(\n", + " f\"Cannot run training with {fraction*100}% selected collaborators out of {len(self.collaborators)} Collaborators. Atleast one collaborator is required to run the training\"\n", + " )\n", + " self.subset_collaborators = random.sample(\n", + " self.collaborators, int(fraction * (len(self.collaborators)))\n", + " )\n", + "\n", + " self.next(self.watermark_pretrain)\n", + "\n", + " @aggregator\n", + " def watermark_pretrain(self):\n", + " \"\"\"\n", + " Pre-Train the Model before starting Federated Learning.\n", + " \"\"\"\n", + " if not self.watermark_pretraining_completed:\n", + "\n", + " print(\": Performing Watermark Pre-training\")\n", + "\n", + " for i in range(self.pretrain_epochs):\n", + "\n", + " watermark_pretrain_loss = train_model(\n", + " self.model,\n", + " self.watermark_pretrain_optimizer,\n", + " self.watermark_data_loader,\n", + " \":\",\n", + " i,\n", + " log=False,\n", + " )\n", + " watermark_pretrain_validation_score = inference(\n", + " self.model, self.watermark_data_loader\n", + " )\n", + "\n", + " print(\n", + " \": Watermark Pretraining: Round: {:<3} Loss: {:<.6f} Acc: {:<.6f}\".format(\n", + " i,\n", + " watermark_pretrain_loss,\n", + " watermark_pretrain_validation_score,\n", + " )\n", + " )\n", + "\n", + " self.watermark_pretraining_completed = True\n", + "\n", + " self.next(\n", + " self.aggregated_model_validation,\n", + " foreach=\"subset_collaborators\",\n", + " exclude=[\"watermark_pretrain_optimizer\", \"watermark_retrain_optimizer\"],\n", + " )\n", + "\n", + " @collaborator\n", + " def aggregated_model_validation(self):\n", + " \"\"\"\n", + " Perform Aggregated Model validation on Collaborators.\n", + " \"\"\"\n", + " self.agg_validation_score = inference(self.model, self.test_loader)\n", + " print(\n", + " f\" Aggregated Model validation score = {self.agg_validation_score}\"\n", + " )\n", + "\n", + " self.next(self.train)\n", + "\n", + " @collaborator\n", + " def train(self):\n", + " \"\"\"\n", + " Train model on Local collab dataset.\n", + "\n", + " \"\"\"\n", + " print(\": Performing Model Training on Local dataset ... \")\n", + "\n", + " self.optimizer = optim.SGD(\n", + " self.model.parameters(), lr=learning_rate, momentum=momentum\n", + " )\n", + "\n", + " self.loss = train_model(\n", + " self.model,\n", + " self.optimizer,\n", + " self.train_loader,\n", + " \"\"),\n", + " self.round_number if self.round_number is not None else 0,\n", + " log=True,\n", + " )\n", + "\n", + " self.next(self.local_model_validation)\n", + "\n", + " @collaborator\n", + " def local_model_validation(self):\n", + " \"\"\"\n", + " Validate locally trained model.\n", + "\n", + " \"\"\"\n", + " self.local_validation_score = inference(self.model, self.test_loader)\n", + " print(\n", + " f\" Local model validation score = {self.local_validation_score}\"\n", + " )\n", + " self.next(self.join)\n", + "\n", + " @aggregator\n", + " def join(self, inputs):\n", + " \"\"\"\n", + " Model aggregation step.\n", + " \"\"\"\n", + "\n", + " self.average_loss = sum(input.loss for input in inputs) / len(inputs)\n", + " self.aggregated_model_accuracy = sum(\n", + " input.agg_validation_score for input in inputs\n", + " ) / len(inputs)\n", + " self.local_model_accuracy = sum(\n", + " input.local_validation_score for input in inputs\n", + " ) / len(inputs)\n", + "\n", + " print(f\": Joining models from collaborators...\")\n", + "\n", + " print(\n", + " f\" Aggregated model validation score = {self.aggregated_model_accuracy}\"\n", + " )\n", + " print(f\" Average training loss = {self.average_loss}\")\n", + " print(f\" Average local model validation values = {self.local_model_accuracy}\")\n", + "\n", + " self.model = FedAvg(self.model, [input.model for input in inputs])\n", + "\n", + " self.next(self.watermark_retrain)\n", + "\n", + " @aggregator\n", + " def watermark_retrain(self):\n", + " \"\"\"\n", + " Retrain the aggregated model.\n", + "\n", + " \"\"\"\n", + " print(\": Performing Watermark Retraining ... \")\n", + " self.watermark_retrain_optimizer = optim.SGD(\n", + " self.model.parameters(), lr=watermark_retrain_learning_rate\n", + " )\n", + "\n", + " retrain_round = 0\n", + "\n", + " # Perform re-training until (accuracy >= acc_threshold) or (retrain_round > number of retrain_epochs)\n", + " self.watermark_retrain_validation_score = inference(\n", + " self.model, self.watermark_data_loader\n", + " )\n", + " while (\n", + " self.watermark_retrain_validation_score < self.watermark_acc_threshold\n", + " ) and (retrain_round < self.retrain_epochs):\n", + " self.watermark_retrain_train_loss = train_model(\n", + " self.model,\n", + " self.watermark_retrain_optimizer,\n", + " self.watermark_data_loader,\n", + " \"\",\n", + " retrain_round,\n", + " log=False,\n", + " )\n", + " self.watermark_retrain_validation_score = inference(\n", + " self.model, self.watermark_data_loader\n", + " )\n", + "\n", + " print(\n", + " \": Watermark Retraining: Train Epoch: {:<3} Retrain Round: {:<3} Loss: {:<.6f}, Acc: {:<.6f}\".format(\n", + " self.round_number,\n", + " retrain_round,\n", + " self.watermark_retrain_train_loss,\n", + " self.watermark_retrain_validation_score,\n", + " )\n", + " )\n", + "\n", + " retrain_round += 1\n", + "\n", + " self.next(self.end)\n", + "\n", + " @aggregator\n", + " def end(self):\n", + " \"\"\"\n", + " This is the last step in the Flow.\n", + "\n", + " \"\"\"\n", + " print(f\"This is the end of the flow\")" + ] + }, + { + "cell_type": "markdown", + "id": "0d1bba1a", + "metadata": {}, + "source": [ + "We now initialize certain attributes of the Flow, simulation parameters (seed, batch-sizes, optimizer parameters) and create the `LocalRuntime`\n", + "\n", + "> NOTE: Aggregator based workflow requires a `FederatedRuntime`. In this methodology `FederatedRuntime` is created automatically and it's usage is transparent to the user" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bffcc141", + "metadata": {}, + "outputs": [], + "source": [ + "#| export\n", + "\n", + "# Set random seed\n", + "random_seed = 42\n", + "torch.manual_seed(random_seed)\n", + "np.random.seed(random_seed)\n", + "torch.backends.cudnn.enabled = False\n", + "\n", + "# Batch sizes\n", + "batch_size_train = 64\n", + "batch_size_test = 64\n", + "batch_size_watermark = 50\n", + "\n", + "# MNIST parameters\n", + "learning_rate = 5e-2\n", + "momentum = 5e-1\n", + "log_interval = 20\n", + "\n", + "# Watermarking parameters\n", + "watermark_pretrain_learning_rate = 1e-1\n", + "watermark_pretrain_momentum = 5e-1\n", + "watermark_pretrain_weight_decay = 5e-05\n", + "watermark_retrain_learning_rate = 5e-3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5f6e104", + "metadata": {}, + "outputs": [], + "source": [ + "#| export\n", + "\n", + "def callable_to_initialize_aggregator_private_attributes(watermark_data, batch_size):\n", + " return {\n", + " \"watermark_data_loader\": torch.utils.data.DataLoader(\n", + " watermark_data, batch_size=batch_size, shuffle=True\n", + " ),\n", + " \"pretrain_epochs\": 25,\n", + " \"retrain_epochs\": 25,\n", + " \"watermark_acc_threshold\": 0.98,\n", + " }\n", + "\n", + "# Setup Aggregator private attributes via callable function\n", + "aggregator = Aggregator(\n", + " name=\"agg\",\n", + " private_attributes_callable=callable_to_initialize_aggregator_private_attributes,\n", + " watermark_data=watermark_data,\n", + " batch_size=batch_size_watermark,\n", + " )\n", + "\n", + "collaborator_names = [\n", + " \"Portland\",\n", + " \"Seattle\",\n", + " \"Chandler\",\n", + " \"Bangalore\",\n", + " \"New Delhi\",\n", + "]\n", + "n_collaborators = len(collaborator_names)\n", + "\n", + "def callable_to_initialize_collaborator_private_attributes(index, n_collaborators, batch_size, train_dataset, test_dataset):\n", + " train = deepcopy(train_dataset)\n", + " test = deepcopy(test_dataset)\n", + " train.data = train_dataset.data[index::n_collaborators]\n", + " train.targets = train_dataset.targets[index::n_collaborators]\n", + " test.data = test_dataset.data[index::n_collaborators]\n", + " test.targets = test_dataset.targets[index::n_collaborators]\n", + "\n", + " return {\n", + " \"train_loader\": torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=True),\n", + " \"test_loader\": torch.utils.data.DataLoader(test, batch_size=batch_size, shuffle=True),\n", + " }\n", + "\n", + "# Setup Collaborators private attributes via callable function\n", + "collaborators = []\n", + "for idx, collaborator_name in enumerate(collaborator_names):\n", + " collaborators.append(\n", + " Collaborator(\n", + " name=collaborator_name, num_cpus=0, num_gpus=0,\n", + " private_attributes_callable=callable_to_initialize_collaborator_private_attributes,\n", + " index=idx, n_collaborators=n_collaborators,\n", + " train_dataset=mnist_train, test_dataset=mnist_test, batch_size=64\n", + " )\n", + " )\n", + "\n", + "local_runtime = LocalRuntime(aggregator=aggregator, collaborators=collaborators, backend=\"ray\")\n", + "print(f\"Local runtime collaborators = {local_runtime.collaborators}\")" + ] + }, + { + "cell_type": "markdown", + "id": "c61813ab", + "metadata": {}, + "source": [ + "Now that we have our flow and runtime defined, let's run the experiment! " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c6d19819", + "metadata": {}, + "outputs": [], + "source": [ + "#| export\n", + "\n", + "model = Net()\n", + "optimizer = optim.SGD(\n", + " model.parameters(), lr=learning_rate, momentum=momentum\n", + ")\n", + "watermark_pretrain_optimizer = optim.SGD(\n", + " model.parameters(),\n", + " lr=watermark_pretrain_learning_rate,\n", + " momentum=watermark_pretrain_momentum,\n", + " weight_decay=watermark_pretrain_weight_decay,\n", + ")\n", + "watermark_retrain_optimizer = optim.SGD(\n", + " model.parameters(), lr=watermark_retrain_learning_rate\n", + ")\n", + "best_model = None\n", + "round_number = 0\n", + "top_model_accuracy = 0\n", + "\n", + "flflow = FederatedFlow_MNIST_Watermarking(\n", + " model,\n", + " optimizer,\n", + " watermark_pretrain_optimizer,\n", + " watermark_retrain_optimizer,\n", + " round_number,\n", + " checkpoint=True,\n", + ")\n", + "flflow.runtime = local_runtime" + ] + }, + { + "cell_type": "markdown", + "id": "b5371b6d", + "metadata": {}, + "source": [ + "## Workspace creation" + ] + }, + { + "cell_type": "markdown", + "id": "41688326", + "metadata": {}, + "source": [ + "The following cells convert the Jupyter notebook into a Python script and create a Template Workspace that can be utilized by Aggregator based Workflow\n", + "> NOTE: Only Notebook cells that were marked with `#| export` directive shall be included in this Python script\n", + "\n", + "We first import `WorkspaceExport` module and execute `WorkspaceExport.export()` that converts the notebook and generates the template workspace. User is required to specify: \n", + "1. `notebook_path`: path of the Jupyter notebook that is required to be converted\n", + "2. `output_workspace`: path where the converted workspace is stored" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "21c98aae", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from openfl.experimental.workspace_export import WorkspaceExport\n", + "\n", + "WorkspaceExport.export(\n", + " notebook_path='./Workflow_Interface_1001_Workspace_Creation_from_JupyterNotebook.ipynb',\n", + " output_workspace=f\"/home/{os.environ['USER']}/generated-workspace\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "f8639a64", + "metadata": {}, + "source": [ + "## Workspace Usage\n", + "\n", + "The workspace crated above can be used by the Aggregator based workflow by using the `fx` commands in the following manner" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "ff55808c-c340-476b-a543-58d43451c54e", + "metadata": {}, + "source": [ + "**Workspace Activation and Creation**\r\n", + "1. Activate the experimental aggregator-based workflow:\r\n", + "\r\n", + " `fx experimental activate`\r\n", + "\r\n", + " This will create an 'experimental' directory under ~/.openfl/\r\n", + "3. Create a workspace using the custom template:\r\n", + "\r\n", + " `fx workspace create --prefix workspace_path --custom_template /home/$USER/generated-workspace`\r\n", + "4. Change to the workspace directory:\r\n", + "\r\n", + " `cd workspace_path`\r\n", + "\r\n", + "**Workspace Initialization and Certification**\r\n", + "1. Initialize the FL plan and auto-populate the fully qualified domain name (FQDN) of the aggregator node:\r\n", + "\r\n", + " `fx plan initialize`\r\n", + "2. Certify the workspace:\r\n", + "\r\n", + " `fx workspace certify`\r\n", + " \r\n", + "**Aggregator Setup and Workspace Export**\r\n", + "1. Run the aggregator certificate creation command:\r\n", + "\r\n", + " `fx aggregator generate-cert-request`\r\n", + "\r\n", + " `fx aggregator certify`\r\n", + "2. Export the workspace for collaboration:\r\n", + "\r\n", + " `fx workspace export`\r\n", + " \r\n", + "**Collaborator Node Setup**\r\n", + "\r\n", + "***On the Collaborator Node:***\r\n", + "\r\n", + "1. Copy the workspace archive from the aggregator node to the collaborator nodes. Import the workspace archive:\r\n", + "\r\n", + " `fx workspace import --archive WORKSPACE.zip`\r\n", + " \r\n", + " `cd workspace_path`\r\n", + "3. Generate a collaborator certificate request:\r\n", + "\r\n", + " `fx collaborator generate-cert-request -n {COL_LABEL}`\r\n", + "\r\n", + "***On the Aggregator Node (Certificate Authority):***\r\n", + "\r\n", + "3. Sign the Collaborator Certificate Signing Request (CSR) Package from collaborator nodes:\r\n", + "\r\n", + " `fx collaborator certify --request-pkg /PATH/TO/col_{COL_LABEL}_to_agg_cert_request.zip`\r\n", + "\r\n", + "***On the Collaborator Node:***\r\n", + "\r\n", + "4. Import the signed certificate and certificate chain into the workspace:\r\n", + "\r\n", + " `fx collaborator certify --import /PATH/TO/agg_to_col_{COL_LABEL}_signed_cert.zip`\r\n", + " \r\n", + "**Final Workspace Activation**\r\n", + "***On the Aggregator Node:***\r\n", + "\r\n", + "1. Start the Aggregator:\r\n", + "\r\n", + " `fx aggregator start`\r\n", + " \r\n", + " The Aggregator is now running and waiting for Collaborators to connect.\r\n", + "\r\n", + "***On the Collaborator Nodes:***\r\n", + "\r\n", + "2. Run the Collaborator:\r\n", + "\r\n", + " `fx collaborator start -n {COL_LABEL}`\r\n", + "\r\n", + "**Workspace Deactivation**\r\n", + "1. To deactivate the experimental aggregator-based workflow and switch back to original aggregator-based workflow:\r\n", + "\r\n", + " `fx experimental deactivate`\r\n", + "\r\n", + " This will remove the 'experimental' directory under ~/.openfl/\r\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "v_o", + "language": "python", + "name": "v_o" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openfl-tutorials/experimental/Workflow_Interface_101_MNIST.ipynb b/openfl-tutorials/experimental/Workflow_Interface_101_MNIST.ipynb index 156fd56dde..05e5ec7a5d 100644 --- a/openfl-tutorials/experimental/Workflow_Interface_101_MNIST.ipynb +++ b/openfl-tutorials/experimental/Workflow_Interface_101_MNIST.ipynb @@ -153,7 +153,7 @@ " x = F.dropout(x, training=self.training)\n", " x = self.fc2(x)\n", " return F.log_softmax(x)\n", - " \n", + "\n", "def inference(network,test_loader):\n", " network.eval()\n", " test_loss = 0\n", @@ -322,11 +322,9 @@ "id": "2aabf61e", "metadata": {}, "source": [ - "You'll notice in the `FederatedFlow` definition above that there were certain attributes that the flow was not initialized with, namely the `train_loader` and `test_loader` for each of the collaborators. These are **private attributes** of the participant which are specified via a callback function while instantiating the participant. The callback function returns the private attributes in form of a dictionary where the key is the attribute name, and the value is the object that will be made accessible to that participant's task\n", - "\n", - "The callback function, `callable_to_initialize_collaborator_private_attributes`, segment shards of the MNIST dataset for four collaborators: `Portland`, `Seattle`, `Chandler`, and `Bangalore`. Each collaborator has their own slice of the dataset that is accessible through the `train_loader` and `test_loader` attributes. Parameters required by the callback function `index`, `n_collaborators`, `train_dataset`, `test_dataset` and `batch_size` are passed appropriate values with the same names in the Collaborator constructor\n", + "Note that the private attributes are flexible, and you can choose to pass in a completely different type of object to any of the collaborators or aggregator (with an arbitrary name). These private attributes will always be filtered out of the current state when transferring from collaborator to aggregator, or vice versa. \n", "\n", - "Note that the private attributes are flexible, and you can choose to pass in a completely different type of object to any of the collaborators or aggregator (with an arbitrary name). These private attributes will always be filtered out of the current state when transfering from collaborator to aggregator, or vice versa" + "Private attributes can be set using callback function while instantiating the participant. Parameters required by the callback function are specified as arguments while instantiating the participant. In this example callback function, `callable_to_initialize_collaborator_private_attributes`, returns the private attributes `train_loader` and `test_loader` of the collaborator. Parameters required by the callback function `index`, `n_collaborators`, `batch_size`, `train_dataset`, `test_dataset` are passed appropriate values with the same names in the Collaborator constructor." ] }, { @@ -461,16 +459,6 @@ "run_id = flflow2._run_id" ] }, - { - "cell_type": "code", - "execution_count": null, - "id": "composed-burst", - "metadata": {}, - "outputs": [], - "source": [ - "import metaflow" - ] - }, { "cell_type": "code", "execution_count": null, @@ -694,9 +682,9 @@ ], "metadata": { "kernelspec": { - "display_name": "workflow-interface-py38", + "display_name": "env-workspace-builder-openfl", "language": "python", - "name": "workflow-interface-py38" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -708,7 +696,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.8.18" } }, "nbformat": 4, diff --git a/openfl-tutorials/experimental/Workflow_Interface_102_Aggregator_Validation.ipynb b/openfl-tutorials/experimental/Workflow_Interface_102_Aggregator_Validation.ipynb index 79e9ec7ec0..5917a9bcff 100644 --- a/openfl-tutorials/experimental/Workflow_Interface_102_Aggregator_Validation.ipynb +++ b/openfl-tutorials/experimental/Workflow_Interface_102_Aggregator_Validation.ipynb @@ -295,7 +295,9 @@ "source": [ "You'll notice in the `FederatedFlow` definition above that there were certain attributes that the flow was not initialized with, namely the `train_loader` and `test_loader` for each of the collaborators. Each participant has it's own set of private attributes which can be set using callback function while instantiating the participant. The callback function returns the private attributes (`train_loader` & `test_loader`) in form of a dictionary where the key is the attribute name, and the value is the object that will be made accessible to that participant's task\n", "\n", - "Callback function, `callable_to_initialize_collaborator_private_attributes`, segment shards of the MNIST dataset for four collaborators: `Portland`, `Seattle`, `Chandler`, and `Bangalore`. Callback function, `callable_to_initialize_aggregator_private_attributes`, returns the private attribute `test_loader` of the Aggregator." + "Below, we segment shards of the MNIST dataset for **four collaborators**: `Portland`, `Seattle`, `Chandler`, and `Portland`. Each has their own slice of the dataset that is accessible through the `train_loader` and `test_loader` attributes, which are set using the `callable_to_initialize_collaborator_private_attributes` callable function. Note that the private attributes are flexible, and you can choose to pass in a completely different type of object to any of the collaborators or aggregator (with an arbitrary name). These private attributes will always be filtered out of the current state when transfering from collaborator to aggregator, or vice versa.\n", + "\n", + "Private attributes can be set using callback function while instantiating the participant. Parameters required by the callback function are specified as arguments while instantiating the participant. In this example callback function, `callable_to_initialize_collaborator_private_attributes`, returns the private attributes `train_loader` and `test_loader` of the collaborator. Callback function, `callable_to_initialize_aggregator_private_attributes`, returns the private attribute `test_loader` of the Aggregator." ] }, { @@ -305,7 +307,7 @@ "metadata": {}, "outputs": [], "source": [ - "collaborator_names = ['Portland', 'Seattle', 'Chandler','Bangalore']\n", + "collaborator_names = ['Portland', 'Seattle', 'Chandler', 'Bangalore']\n", "\n", "def callable_to_initialize_aggregator_private_attributes(n_collaborators, test_dataset, batch_size_train):\n", " aggregator_test = deepcopy(test_dataset)\n", @@ -395,9 +397,9 @@ ], "metadata": { "kernelspec": { - "display_name": "workflow-interface-py38", + "display_name": "new_test_env", "language": "python", - "name": "workflow-interface-py38" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -409,7 +411,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.8.18" } }, "nbformat": 4, diff --git a/openfl-tutorials/experimental/Workflow_Interface_104_Keras_MNIST_with_GPU.ipynb b/openfl-tutorials/experimental/Workflow_Interface_104_Keras_MNIST_with_GPU.ipynb index 5046f373ca..0845647d67 100644 --- a/openfl-tutorials/experimental/Workflow_Interface_104_Keras_MNIST_with_GPU.ipynb +++ b/openfl-tutorials/experimental/Workflow_Interface_104_Keras_MNIST_with_GPU.ipynb @@ -83,7 +83,6 @@ "from tensorflow.keras.utils import to_categorical\n", "\n", "nb_classes = 10\n", - "batch_size=32\n", "(X_train, y_train), (X_test, y_test) = mnist.load_data()\n", "print(\"X_train original shape\", X_train.shape)\n", "print(\"y_train original shape\", y_train.shape)\n", @@ -98,8 +97,6 @@ "Y_train = to_categorical(y_train, nb_classes)\n", "Y_test = to_categorical(y_test, nb_classes)\n", "\n", - "train_dataset=(X_train, Y_train)\n", - "test_dataset=(X_test, Y_test)\n", "\n", "model = Sequential([\n", " Conv2D(filters=32, kernel_size=(3, 3), activation=\"relu\", input_shape=(28, 28, 1)),\n", @@ -146,7 +143,7 @@ "metadata": {}, "outputs": [], "source": [ - "from openfl.experimental.interface import FLSpec\n", + "from openfl.experimental.interface import FLSpec, Aggregator, Collaborator\n", "from openfl.experimental.runtime import LocalRuntime\n", "from openfl.experimental.placement import aggregator, collaborator\n", "import numpy as np\n", @@ -253,9 +250,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's define the Participants and runtime now ! Each participant has it's own set of private attributes which can be set using callback function while instantiating the participant. The callback function returns the private attributes in form of a dictionary where the key is the attribute name, and the value is the object that will be made accessible to that participant's task\n", + "Note that the private attributes are flexible, and you can choose to pass in a completely different type of object to any of the collaborators or aggregator (with an arbitrary name). These private attributes will always be filtered out of the current state when transferring from collaborator to aggregator, or vice versa. \n", "\n", - "Callback function, `callable_to_initialize_collaborator_private_attributes`, segment shards of the MNIST dataset for two collaborators: `Portland`, and `Seattle`and returns the private attribute `train_loader` and `test_loader`" + "Private attributes can be set using callback function while instantiating the participant. Parameters required by the callback function are specified as arguments while instantiating the participant. In this example callback function, `callable_to_initialize_collaborator_private_attributes`, returns the private attributes `train_loader`, `test_loader` and `batch_size` of the collaborator. Parameters required by the callback function `index`, `n_collaborators`, `batch_size`, `train_dataset`, `test_dataset` are passed appropriate values with the same names in the Collaborator constructor." ] }, { @@ -264,13 +261,10 @@ "metadata": {}, "outputs": [], "source": [ - "from openfl.experimental.interface import Aggregator, Collaborator\n", - "\n", - "# Aggregator\n", "agg = Aggregator()\n", "\n", - "# Setup collaborators with private attributes\n", "collaborator_names = [\"Portland\", \"Seattle\"]\n", + "\n", "def callable_to_initialize_collaborator_private_attributes(n_collaborators, index, train_dataset, test_dataset, batch_size):\n", " from openfl.utilities.data_splitters import EqualNumPyDataSplitter\n", " train_splitter = EqualNumPyDataSplitter()\n", @@ -290,6 +284,7 @@ " \"batch_size\": batch_size\n", " }\n", "\n", + "# Setup collaborators private attributes via callable function\n", "collaborators = []\n", "for idx, collaborator_name in enumerate(collaborator_names):\n", " collaborators.append(\n", @@ -354,7 +349,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.17" + "version": "3.8.18" }, "orig_nbformat": 4 }, diff --git a/openfl-tutorials/experimental/Workflow_Interface_201_Exclusive_GPUs_with_Ray.ipynb b/openfl-tutorials/experimental/Workflow_Interface_201_Exclusive_GPUs_with_Ray.ipynb index 34e7adcb4c..64be69ea15 100644 --- a/openfl-tutorials/experimental/Workflow_Interface_201_Exclusive_GPUs_with_Ray.ipynb +++ b/openfl-tutorials/experimental/Workflow_Interface_201_Exclusive_GPUs_with_Ray.ipynb @@ -181,8 +181,7 @@ " axis=0, \n", " weights=weights))\n", " new_model.load_state_dict(state_dict)\n", - " return new_model\n", - "\n" + " return new_model" ] }, { @@ -291,7 +290,7 @@ "source": [ "In this step we define entities necessary to run the flow and create a function which returns dataset as private attributes of collaborator. As described in [quickstart](https://github.com/securefederatedai/openfl/blob/develop/openfl-tutorials/experimental/Workflow_Interface_101_MNIST.ipynb) we define entities necessary for the flow.\n", "\n", - "To request GPU(s) with ray-backend, we specify `num_gpus=0.3` as the argument while instantiating Aggregator and Collaborator, this will reserve 0.3 GPU for each of the 2 collaborators and the aggregator and therefore require a dedicated GPU for the experiment. Tune this based on your use case, for example `num_gpus=0.4` for an experiment with 4 collaborators and the aggregator will require 2 dedicated GPUs. **NOTE:** Collaborator cannot span over multiple GPUs, for example `num_gpus=0.4` with 5 collaborators will require 3 dedicated GPUs. In this case collaborator 1 and 2 use GPU#1, collaborator 3 and 4 use GPU#2, and collaborator 5 uses GPU#3." + "To request GPU(s) with ray-backend, we specify `num_gpus=0.5` as the argument while instantiating Collaborator, this will reserve 0.5 GPU for each of the 2 collaborators and therefore require a dedicated GPU for the experiment. Tune this based on your use case, for example `num_gpus=0.5` for an experiment with 4 collaborators will require 2 dedicated GPUs. **NOTE:** Collaborator cannot span over multiple GPUs, for example `num_gpus=0.4` with 5 collaborators will require 3 dedicated GPUs. In this case collaborator 1 and 2 use GPU#1, collaborator 3 and 4 use GPU#2, and collaborator 5 uses GPU#3." ] }, { @@ -346,7 +345,7 @@ "source": [ "Now that we have our flow and runtime defined, let's run the experiment! \n", "\n", - "(If you run this example on Google Colab with the GPU Runtime, you should see two task executing at a time.)" + "(If you run this example on Google Colab with the GPU Runtime, you should see two tasks executing at a time.)" ] }, { @@ -640,7 +639,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.0" + "version": "3.8.18" } }, "nbformat": 4, diff --git a/openfl-tutorials/experimental/Workflow_Interface_301_MNIST_Watermarking.ipynb b/openfl-tutorials/experimental/Workflow_Interface_301_MNIST_Watermarking.ipynb index b6c1706162..ac0751f2d1 100644 --- a/openfl-tutorials/experimental/Workflow_Interface_301_MNIST_Watermarking.ipynb +++ b/openfl-tutorials/experimental/Workflow_Interface_301_MNIST_Watermarking.ipynb @@ -59,6 +59,7 @@ "!pip install torchvision\n", "!pip install matplotlib\n", "!pip install git+https://github.com/pyviz-topics/imagen.git@master\n", + "!pip install holoviews==1.15.4\n", "\n", "\n", "# Uncomment this if running in Google Colab\n", @@ -97,7 +98,6 @@ "import PIL.Image as Image\n", "import imagen as ig\n", "import numbergen as ng\n", - "import os\n", "\n", "random_seed = 1\n", "torch.backends.cudnn.enabled = False\n", @@ -174,6 +174,7 @@ "def train_model(model, optimizer, data_loader, entity, round_number, log=False):\n", " # Helper function to train the model\n", " train_loss = 0\n", + " log_interval = 20\n", " model.train()\n", " for batch_idx, (X, y) in enumerate(data_loader):\n", " optimizer.zero_grad()\n", @@ -186,8 +187,7 @@ "\n", " train_loss += loss.item() * len(X)\n", " if batch_idx % log_interval == 0 and log:\n", - " print(\n", - " \"{:<20} Train Epoch: {:<3} [{:<3}/{:<4} ({:<.0f}%)] Loss: {:<.6f}\".format(\n", + " print(\"{:<20} Train Epoch: {:<3} [{:<3}/{:<4} ({:<.0f}%)] Loss: {:<.6f}\".format(\n", " entity,\n", " round_number,\n", " batch_idx * len(X),\n", @@ -578,7 +578,7 @@ " self.optimizer,\n", " self.train_loader,\n", " \"\"),\n", - " self.round_number,\n", + " self.round_number if self.round_number is not None else 0,\n", " log=True,\n", " )\n", "\n", @@ -727,7 +727,10 @@ "id": "3d7ce52f", "metadata": {}, "source": [ - "Private attributes can be set using callback function while instantiating the participant\n", + "## Setup Federation\n", + "\n", + "Private attributes can be set using callback function while instantiating the participant. Parameters required by the callback function are specified as arguments while instantiating the participant. In this example callback function, there are 2 callable function namely `callable_to_initialize_aggregator_private_attributes`, and `callable_to_initialize_collaborator_private_attributes`, returns the private attributes respectively for aggregator and collaborator.\n", + "\n", "\n", "Aggregator callable function `callable_to_initialize_aggregator_private_attributes` returns `watermark_data_loader`, `pretrain_epochs`, `retrain_epochs`, `watermark_acc_threshold`, and `watermark_pretraining_completed`. Collaborator callable function `callable_to_initialize_aggregator_private_attributes` returns `train_loader` and `test_loader` of the collaborator." ] @@ -811,11 +814,11 @@ "outputs": [], "source": [ "model = None\n", - "best_model = None\n", "optimizer = None\n", "watermark_pretrain_optimizer = None\n", "watermark_retrain_optimizer = None\n", - "\n", + "best_model = None\n", + "round_number = 0\n", "top_model_accuracy = 0\n", "\n", "flflow = FederatedFlow_MNIST_Watermarking(\n", @@ -823,24 +826,24 @@ " optimizer,\n", " watermark_pretrain_optimizer,\n", " watermark_retrain_optimizer,\n", - " 0,\n", + " round_number,\n", " checkpoint=True,\n", ")\n", "flflow.runtime = local_runtime\n", - "\n", "for i in range(5):\n", " print(f\"Starting round {i}...\")\n", " flflow.run()\n", " flflow.round_number += 1\n", - " aggregated_model_accuracy = flflow.aggregated_model_accuracy\n", - " if aggregated_model_accuracy > top_model_accuracy:\n", - " print(\n", - " f\"\\nAccuracy improved to {aggregated_model_accuracy} for round {i}, Watermark Acc: {flflow.watermark_retrain_validation_score}\\n\"\n", - " )\n", - " top_model_accuracy = aggregated_model_accuracy\n", - " best_model = flflow.model\n", + " if hasattr(flflow, \"aggregated_model_accuracy\"):\n", + " aggregated_model_accuracy = flflow.aggregated_model_accuracy\n", + " if aggregated_model_accuracy > top_model_accuracy:\n", + " print(\n", + " f\"\\nAccuracy improved to {aggregated_model_accuracy} for round {i}, Watermark Acc: {flflow.watermark_retrain_validation_score}\\n\"\n", + " )\n", + " top_model_accuracy = aggregated_model_accuracy\n", + " best_model = flflow.model\n", "\n", - "torch.save(best_model.state_dict(), \"watermarked_mnist_model.pth\")" + " torch.save(best_model.state_dict(), \"watermarked_mnist_model.pth\")" ] }, { @@ -867,9 +870,9 @@ ], "metadata": { "kernelspec": { - "display_name": "workflow-interface-py38", + "display_name": "env-workspace-builder-openfl", "language": "python", - "name": "workflow-interface-py38" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -881,12 +884,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" - }, - "vscode": { - "interpreter": { - "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" - } + "version": "3.8.18" } }, "nbformat": 4, diff --git a/openfl-tutorials/experimental/requirements_workflow_interface.txt b/openfl-tutorials/experimental/requirements_workflow_interface.txt index e487d1aeb6..988bf7886d 100644 --- a/openfl-tutorials/experimental/requirements_workflow_interface.txt +++ b/openfl-tutorials/experimental/requirements_workflow_interface.txt @@ -1,5 +1,9 @@ dill==0.3.6 +chardet +charset-normalizer metaflow==2.7.15 +nbdev==2.3.12 +astor==0.8.1 ray==2.9.2 torch torchvision diff --git a/openfl-workspace/experimental/101_torch_cnn_mnist/.workspace b/openfl-workspace/experimental/101_torch_cnn_mnist/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/openfl-workspace/experimental/101_torch_cnn_mnist/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/openfl-workspace/experimental/101_torch_cnn_mnist/plan/cols.yaml b/openfl-workspace/experimental/101_torch_cnn_mnist/plan/cols.yaml new file mode 100644 index 0000000000..95307de3bc --- /dev/null +++ b/openfl-workspace/experimental/101_torch_cnn_mnist/plan/cols.yaml @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: + \ No newline at end of file diff --git a/openfl-workspace/experimental/101_torch_cnn_mnist/plan/data.yaml b/openfl-workspace/experimental/101_torch_cnn_mnist/plan/data.yaml new file mode 100644 index 0000000000..0950198725 --- /dev/null +++ b/openfl-workspace/experimental/101_torch_cnn_mnist/plan/data.yaml @@ -0,0 +1,27 @@ +## Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +# collaborator_name ,data_directory_path +col1: + callable_func: + settings: + batch_size: 32 + index: 0 + n_collaborators: 2 + train_dataset: src.collaborator_private_attrs.train_dataset + test_dataset: src.collaborator_private_attrs.test_dataset + template: src.collaborator_private_attrs.collaborator_private_attrs + +col2: + callable_func: + settings: + batch_size: 32 + index: 1 + n_collaborators: 2 + train_dataset: src.collaborator_private_attrs.train_dataset + test_dataset: src.collaborator_private_attrs.test_dataset + template: src.collaborator_private_attrs.collaborator_private_attrs diff --git a/openfl-workspace/experimental/101_torch_cnn_mnist/plan/defaults b/openfl-workspace/experimental/101_torch_cnn_mnist/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/openfl-workspace/experimental/101_torch_cnn_mnist/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/openfl-workspace/experimental/101_torch_cnn_mnist/plan/plan.yaml b/openfl-workspace/experimental/101_torch_cnn_mnist/plan/plan.yaml new file mode 100644 index 0000000000..159d747cf3 --- /dev/null +++ b/openfl-workspace/experimental/101_torch_cnn_mnist/plan/plan.yaml @@ -0,0 +1,54 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.Aggregator + settings : + rounds_to_train : 1 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.Collaborator + settings : {} + + +federated_flow: + template: src.flow.MNISTFlow + settings: + model: + template: src.flow.Net + settings: + convolutional_block: + template: src.flow.convolutional_block + settings: + block_sequential: + template: src.flow.sequential_block + settings: + conv2d1: + template: src.flow.conv2d1 + settings: + in_channels: 1 + out_channels: 10 + kernel_size: 5 + maxPool2d1: + template: src.flow.maxpool2d1 + settings: + kernel_size: 2 + relu: src.flow.relu + conv2d2: src.flow.conv2d2 + dropout2d: src.flow.dropout2d + maxPool2d2: src.flow.maxpool2d2 + relu: src.flow.relu + in_features: 50 + out_features: 10 + optimizer: null + rounds: 4 + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml diff --git a/openfl-workspace/experimental/101_torch_cnn_mnist/requirements.txt b/openfl-workspace/experimental/101_torch_cnn_mnist/requirements.txt new file mode 100644 index 0000000000..16b349007c --- /dev/null +++ b/openfl-workspace/experimental/101_torch_cnn_mnist/requirements.txt @@ -0,0 +1,4 @@ +torch==1.13.1 +torchvision==0.14.1 +tensorboard +wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability diff --git a/openfl-workspace/experimental/101_torch_cnn_mnist/src/__init__.py b/openfl-workspace/experimental/101_torch_cnn_mnist/src/__init__.py new file mode 100644 index 0000000000..6e02c1c951 --- /dev/null +++ b/openfl-workspace/experimental/101_torch_cnn_mnist/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/openfl-workspace/experimental/101_torch_cnn_mnist/src/collaborator_private_attrs.py b/openfl-workspace/experimental/101_torch_cnn_mnist/src/collaborator_private_attrs.py new file mode 100644 index 0000000000..097c81634b --- /dev/null +++ b/openfl-workspace/experimental/101_torch_cnn_mnist/src/collaborator_private_attrs.py @@ -0,0 +1,55 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from copy import deepcopy + +import torch +import torchvision + + +mnist_train = torchvision.datasets.MNIST( + "./files/", + train=True, + download=True, + transform=torchvision.transforms.Compose( + [ + torchvision.transforms.ToTensor(), + torchvision.transforms.Normalize((0.1307,), (0.3081,)), + ] + ), +) + +mnist_test = torchvision.datasets.MNIST( + "./files/", + train=False, + download=True, + transform=torchvision.transforms.Compose( + [ + torchvision.transforms.ToTensor(), + torchvision.transforms.Normalize((0.1307,), (0.3081,)), + ] + ), +) + +train_dataset = mnist_train +test_dataset = mnist_test + + +def collaborator_private_attrs( + index, n_collaborators, batch_size, train_dataset, test_dataset +): + train = deepcopy(train_dataset) + test = deepcopy(test_dataset) + train.data = train_dataset.data[index::n_collaborators] + train.targets = train_dataset.targets[index::n_collaborators] + test.data = test_dataset.data[index::n_collaborators] + test.targets = test_dataset.targets[index::n_collaborators] + + return { + "train_loader": torch.utils.data.DataLoader( + train, batch_size=batch_size, shuffle=True + ), + "test_loader": torch.utils.data.DataLoader( + test, batch_size=batch_size, shuffle=True + ), + } diff --git a/openfl-workspace/experimental/101_torch_cnn_mnist/src/flow.py b/openfl-workspace/experimental/101_torch_cnn_mnist/src/flow.py new file mode 100644 index 0000000000..d7f5bb953c --- /dev/null +++ b/openfl-workspace/experimental/101_torch_cnn_mnist/src/flow.py @@ -0,0 +1,182 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim +import torch +import numpy as np + +from openfl.experimental.interface import FLSpec +from openfl.experimental.placement import aggregator, collaborator + +learning_rate = 0.01 +momentum = 0.5 +log_interval = 10 + +random_seed = 1 +torch.backends.cudnn.enabled = False +torch.manual_seed(random_seed) + +convolutional_block = nn.Sequential +sequential_block = nn.Sequential +conv2d1 = nn.Conv2d +conv2d2 = nn.Conv2d(10, 20, 5) +maxpool2d1 = nn.MaxPool2d +maxpool2d2 = nn.MaxPool2d(2) +relu = nn.ReLU() +dropout2d = nn.Dropout2d() + + +class Net(nn.Module): + def __init__(self, convolutional_block, + in_features: int, out_features: int): + super(Net, self).__init__() + self.conv_block = convolutional_block + self.linear_block = nn.Sequential( + nn.Linear(320, in_features), + nn.ReLU(), + nn.Dropout(), + nn.Linear(in_features, out_features) + ) + + def forward(self, x): + x = self.conv_block(x) + x = x.view(-1, 320) + x = self.linear_block(x) + return F.log_softmax(x) + + +def inference(network, test_loader): + network.eval() + test_loss = 0 + correct = 0 + with torch.no_grad(): + for data, target in test_loader: + output = network(data) + test_loss += F.nll_loss(output, target, size_average=False).item() + pred = output.data.max(1, keepdim=True)[1] + correct += pred.eq(target.data.view_as(pred)).sum() + test_loss /= len(test_loader.dataset) + print( + f"\nTest set: Avg. loss: {test_loss:.4f}, Accuracy: " + + f"{correct}/{len(test_loader.dataset)} " + + f"({100.0 * correct / len(test_loader.dataset):.0f}%)\n" + ) + + accuracy = float(correct / len(test_loader.dataset)) + return accuracy + + +def fedavg(models): + new_model = models[0] + state_dicts = [model.state_dict() for model in models] + state_dict = new_model.state_dict() + for key in models[1].state_dict(): + state_dict[key] = np.sum( + np.array([state[key] for state in state_dicts], dtype=object), axis=0 + ) / len(models) + new_model.load_state_dict(state_dict) + return new_model + + +class MNISTFlow(FLSpec): + def __init__(self, model=None, optimizer=None, rounds=3, **kwargs): + super().__init__(**kwargs) + if model is not None: + self.model = model + self.optimizer = optimizer + else: + self.model = Net() + self.optimizer = optim.SGD( + self.model.parameters(), lr=learning_rate, momentum=momentum + ) + self.rounds = rounds + + @aggregator + def start(self): + print("Performing initialization for model") + self.collaborators = self.runtime.collaborators + self.private = 10 + self.current_round = 0 + self.next( + self.aggregated_model_validation, + foreach="collaborators", + exclude=["private"], + ) + + @collaborator + def aggregated_model_validation(self): + print(f"Performing aggregated model validation for collaborator {self.input}") + self.agg_validation_score = inference(self.model, self.test_loader) + print(f"{self.input} value of {self.agg_validation_score}") + self.next(self.train) + + @collaborator + def train(self): + self.model.train() + self.optimizer = optim.SGD( + self.model.parameters(), lr=learning_rate, momentum=momentum + ) + for batch_idx, (data, target) in enumerate(self.train_loader): + self.optimizer.zero_grad() + output = self.model(data) + loss = F.nll_loss(output, target) + loss.backward() + self.optimizer.step() + if batch_idx % log_interval == 0: + print( + f"Train Epoch: 1 [{batch_idx * len(data)}/" + + f"{len(self.train_loader.dataset)} (" + + f"{100.0 * batch_idx / len(self.train_loader):.0f}%)" + + f"]\tLoss: {loss.item():.6f}" + ) + + self.loss = loss.item() + torch.save(self.model.state_dict(), "model.pth") + torch.save(self.optimizer.state_dict(), "optimizer.pth") + self.training_completed = True + self.next(self.local_model_validation) + + @collaborator + def local_model_validation(self): + self.local_validation_score = inference(self.model, self.test_loader) + print( + "Doing local model validation for collaborator " + + f"{self.input}: {self.local_validation_score}" + ) + self.next(self.join, exclude=["training_completed"]) + + @aggregator + def join(self, inputs): + self.average_loss = sum(input.loss for input in inputs) / len(inputs) + self.aggregated_model_accuracy = sum( + input.agg_validation_score for input in inputs + ) / len(inputs) + self.local_model_accuracy = sum( + input.local_validation_score for input in inputs + ) / len(inputs) + print( + f"Average aggregated model validation values = {self.aggregated_model_accuracy}" + ) + print(f"Average training loss = {self.average_loss}") + print(f"Average local model validation values = {self.local_model_accuracy}") + self.model = fedavg([input.model for input in inputs]) + self.optimizer = [input.optimizer for input in inputs][0] + self.next(self.internal_loop) + + @aggregator + def internal_loop(self): + self.current_round += 1 + if self.current_round < self.rounds: + self.next( + self.aggregated_model_validation, + foreach="collaborators", + exclude=["private"], + ) + else: + self.next(self.end) + + @aggregator + def end(self): + print("This is the end of the flow") diff --git a/openfl-workspace/experimental/101_torch_cnn_mnist/src/utils.py b/openfl-workspace/experimental/101_torch_cnn_mnist/src/utils.py new file mode 100644 index 0000000000..1e56f3e68d --- /dev/null +++ b/openfl-workspace/experimental/101_torch_cnn_mnist/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/openfl-workspace/experimental/102_aggregator_validation/.workspace b/openfl-workspace/experimental/102_aggregator_validation/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/openfl-workspace/experimental/102_aggregator_validation/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/openfl-workspace/experimental/102_aggregator_validation/plan/cols.yaml b/openfl-workspace/experimental/102_aggregator_validation/plan/cols.yaml new file mode 100644 index 0000000000..95307de3bc --- /dev/null +++ b/openfl-workspace/experimental/102_aggregator_validation/plan/cols.yaml @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: + \ No newline at end of file diff --git a/openfl-workspace/experimental/102_aggregator_validation/plan/data.yaml b/openfl-workspace/experimental/102_aggregator_validation/plan/data.yaml new file mode 100644 index 0000000000..460c514ea7 --- /dev/null +++ b/openfl-workspace/experimental/102_aggregator_validation/plan/data.yaml @@ -0,0 +1,55 @@ +## Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +# collaborator_name ,data_directory_path +col1: + callable_func: + settings: + batch_size: 64 + index: 0 + n_collaborators: 4 + train_dataset: src.collaborator_private_attrs.train_dataset + test_dataset: src.collaborator_private_attrs.test_dataset + template: src.collaborator_private_attrs.callable_to_initialize_collaborator_private_attributes + +col2: + callable_func: + settings: + batch_size: 64 + index: 1 + n_collaborators: 4 + train_dataset: src.collaborator_private_attrs.train_dataset + test_dataset: src.collaborator_private_attrs.test_dataset + template: src.collaborator_private_attrs.callable_to_initialize_collaborator_private_attributes + +col3: + callable_func: + settings: + batch_size: 64 + index: 2 + n_collaborators: 4 + train_dataset: src.collaborator_private_attrs.train_dataset + test_dataset: src.collaborator_private_attrs.test_dataset + template: src.collaborator_private_attrs.callable_to_initialize_collaborator_private_attributes + +col4: + callable_func: + settings: + batch_size: 64 + index: 3 + n_collaborators: 4 + train_dataset: src.collaborator_private_attrs.train_dataset + test_dataset: src.collaborator_private_attrs.test_dataset + template: src.collaborator_private_attrs.callable_to_initialize_collaborator_private_attributes + +aggregator: + callable_func: + settings: + n_collaborators: 4 + batch_size: 64 + test_dataset: src.aggregator_private_attrs.test_dataset + template: src.aggregator_private_attrs.callable_to_initialize_aggregator_private_attributes diff --git a/openfl-workspace/experimental/102_aggregator_validation/plan/defaults b/openfl-workspace/experimental/102_aggregator_validation/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/openfl-workspace/experimental/102_aggregator_validation/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/openfl-workspace/experimental/102_aggregator_validation/plan/plan.yaml b/openfl-workspace/experimental/102_aggregator_validation/plan/plan.yaml new file mode 100644 index 0000000000..11d15ec013 --- /dev/null +++ b/openfl-workspace/experimental/102_aggregator_validation/plan/plan.yaml @@ -0,0 +1,31 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.Aggregator + settings : + rounds_to_train : 1 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.Collaborator + settings : {} + + +federated_flow: + template: src.flow.AggregatorValidationFlow + settings: + model: + template: src.flow.Net + settings: {} + optimizer: null + rounds: 3 + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml diff --git a/openfl-workspace/experimental/102_aggregator_validation/requirements.txt b/openfl-workspace/experimental/102_aggregator_validation/requirements.txt new file mode 100644 index 0000000000..16b349007c --- /dev/null +++ b/openfl-workspace/experimental/102_aggregator_validation/requirements.txt @@ -0,0 +1,4 @@ +torch==1.13.1 +torchvision==0.14.1 +tensorboard +wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability diff --git a/openfl-workspace/experimental/102_aggregator_validation/src/__init__.py b/openfl-workspace/experimental/102_aggregator_validation/src/__init__.py new file mode 100644 index 0000000000..6e02c1c951 --- /dev/null +++ b/openfl-workspace/experimental/102_aggregator_validation/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/openfl-workspace/experimental/102_aggregator_validation/src/aggregator_private_attrs.py b/openfl-workspace/experimental/102_aggregator_validation/src/aggregator_private_attrs.py new file mode 100644 index 0000000000..1c08515afd --- /dev/null +++ b/openfl-workspace/experimental/102_aggregator_validation/src/aggregator_private_attrs.py @@ -0,0 +1,29 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from copy import deepcopy + +import torch +import torchvision + + +mnist_test = torchvision.datasets.MNIST('files/', train=False, download=True, + transform=torchvision.transforms.Compose([ + torchvision.transforms.ToTensor(), + torchvision.transforms.Normalize( + (0.1307,), (0.3081,)) + ])) + +test_dataset = mnist_test + + +def callable_to_initialize_aggregator_private_attributes(n_collaborators, + test_dataset, batch_size): + aggregator_test = deepcopy(test_dataset) + aggregator_test.targets = test_dataset.targets[n_collaborators::n_collaborators + 1] + aggregator_test.data = test_dataset.data[n_collaborators::n_collaborators + 1] + + return { + 'test_loader': torch.utils.data.DataLoader( + aggregator_test, batch_size=batch_size, shuffle=True) + } diff --git a/openfl-workspace/experimental/102_aggregator_validation/src/collaborator_private_attrs.py b/openfl-workspace/experimental/102_aggregator_validation/src/collaborator_private_attrs.py new file mode 100644 index 0000000000..54e7ad5f2d --- /dev/null +++ b/openfl-workspace/experimental/102_aggregator_validation/src/collaborator_private_attrs.py @@ -0,0 +1,43 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from copy import deepcopy + +import torch +import torchvision + + +mnist_train = torchvision.datasets.MNIST('files/', train=True, download=True, + transform=torchvision.transforms.Compose([ + torchvision.transforms.ToTensor(), + torchvision.transforms.Normalize( + (0.1307,), (0.3081,)) + ])) + +mnist_test = torchvision.datasets.MNIST('files/', train=False, download=True, + transform=torchvision.transforms.Compose([ + torchvision.transforms.ToTensor(), + torchvision.transforms.Normalize( + (0.1307,), (0.3081,)) + ])) + +train_dataset = mnist_train +test_dataset = mnist_test + + +# Setup collaborators private attributes via callable function +def callable_to_initialize_collaborator_private_attributes( + index, n_collaborators, train_dataset, test_dataset, batch_size): + local_train = deepcopy(train_dataset) + local_test = deepcopy(test_dataset) + local_train.data = train_dataset.data[index::n_collaborators] + local_train.targets = train_dataset.targets[index::n_collaborators] + local_test.data = test_dataset.data[index::n_collaborators] + local_test.targets = test_dataset.targets[index::n_collaborators] + + return { + 'train_loader': torch.utils.data.DataLoader( + local_train, batch_size=batch_size, shuffle=True), + 'test_loader': torch.utils.data.DataLoader( + local_test, batch_size=batch_size, shuffle=True) + } diff --git a/openfl-workspace/experimental/102_aggregator_validation/src/flow.py b/openfl-workspace/experimental/102_aggregator_validation/src/flow.py new file mode 100644 index 0000000000..451b39d131 --- /dev/null +++ b/openfl-workspace/experimental/102_aggregator_validation/src/flow.py @@ -0,0 +1,162 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import torch +from torch import nn +from torch.nn import functional as F +from torch import optim +from openfl.experimental.interface import FLSpec +from openfl.experimental.placement import aggregator, collaborator + +import numpy as np + + +learning_rate = 0.01 +momentum = 0.5 +log_interval = 10 + +random_seed = 1 +torch.backends.cudnn.enabled = False +torch.manual_seed(random_seed) + + +class Net(nn.Module): + def __init__(self): + super(Net, self).__init__() + self.conv1 = nn.Conv2d(1, 10, kernel_size=5) + self.conv2 = nn.Conv2d(10, 20, kernel_size=5) + self.conv2_drop = nn.Dropout2d() + self.fc1 = nn.Linear(320, 50) + self.fc2 = nn.Linear(50, 10) + + def forward(self, x): + x = F.relu(F.max_pool2d(self.conv1(x), 2)) + x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) + x = x.view(-1, 320) + x = F.relu(self.fc1(x)) + x = F.dropout(x, training=self.training) + x = self.fc2(x) + return F.log_softmax(x) + + +def fedavg(models, weights=None): + new_model = models[0] + state_dicts = [model.state_dict() for model in models] + state_dict = new_model.state_dict() + for key in models[1].state_dict(): + state_dict[key] = np.average( + np.array([state[key] for state in state_dicts], dtype=object), axis=0, weights=weights) + new_model.load_state_dict(state_dict) + return new_model + + +def inference(network, test_loader): + network.eval() + test_loss = 0 + correct = 0 + with torch.no_grad(): + for data, target in test_loader: + output = network(data) + test_loss += F.nll_loss(output, target, size_average=False).item() + pred = output.data.max(1, keepdim=True)[1] + correct += pred.eq(target.data.view_as(pred)).sum() + test_loss /= len(test_loader.dataset) + accuracy = 100. * correct / len(test_loader.dataset) + print(f'\nTest set: Avg. loss: {test_loss:.4f},' + + f' Accuracy: {correct}/{len(test_loader.dataset)} ({accuracy:.0f}%)\n') + return accuracy + + +class AggregatorValidationFlow(FLSpec): + + def __init__(self, model=None, optimizer=None, rounds=3, **kwargs): + super().__init__(**kwargs) + if model is not None: + self.model = model + self.optimizer = optimizer + else: + self.model = Net() + self.optimizer = optim.SGD(self.model.parameters(), lr=learning_rate, + momentum=momentum) + self.rounds = rounds + + @aggregator + def start(self): + print('Performing initialization for model') + self.collaborators = self.runtime.collaborators + self.private = 10 + self.current_round = 0 + self.next(self.aggregated_model_validation, foreach='collaborators', exclude=['private']) + + @collaborator + def aggregated_model_validation(self): + print(f'Performing aggregated model validation for collaborator {self.input}') + self.agg_validation_score = inference(self.model, self.test_loader) + print(f'{self.input} value of {self.agg_validation_score}') + self.next(self.train) + + @collaborator + def train(self): + self.model.train() + self.optimizer = optim.SGD(self.model.parameters(), lr=learning_rate, + momentum=momentum) + for batch_idx, (data, target) in enumerate(self.train_loader): + self.optimizer.zero_grad() + output = self.model(data) + loss = F.nll_loss(output, target) + loss.backward() + self.optimizer.step() + if batch_idx % log_interval == 0: + self.loss = loss.item() + accuracy = 100. * batch_idx / len(self.train_loader) + print(f'Train Epoch: 1 [{batch_idx * len(data)}/{len(self.train_loader.dataset)}' + + f' ({accuracy:.0f}%)]\tLoss: {self.loss:.6f}') + torch.save(self.model.state_dict(), 'model.pth') + torch.save(self.optimizer.state_dict(), 'optimizer.pth') + self.training_completed = True + self.next(self.local_model_validation) + + @collaborator + def local_model_validation(self): + self.local_validation_score = inference(self.model, self.test_loader) + print(f'Doing local model validation for collaborator {self.input}:' + + f' {self.local_validation_score}') + self.next(self.join, exclude=['training_completed']) + + @aggregator + def join(self, inputs): + self.average_loss = sum(input.loss for input in inputs) / len(inputs) + self.aggregated_model_accuracy = sum( + input.agg_validation_score for input in inputs) / len(inputs) + self.local_model_accuracy = sum( + input.local_validation_score for input in inputs) / len(inputs) + print(f'Average aggregated model validation values = {self.aggregated_model_accuracy}') + print(f'Average training loss = {self.average_loss}') + print(f'Average local model validation values = {self.local_model_accuracy}') + + highest_accuracy = 0 + highest_accuracy_model_idx = -1 + for idx, col in enumerate(inputs): + accuracy_for_held_out_agg_data = inference(col.model, self.test_loader) + if accuracy_for_held_out_agg_data > highest_accuracy: + highest_accuracy = accuracy_for_held_out_agg_data + highest_accuracy_model_idx = idx + + relative_model_weights = len(inputs) * [1] + # Give highest accuracy model (on held out aggregator data) 2x the importance + relative_model_weights[highest_accuracy_model_idx] = 2 + print(f'Aggregator validation score: {highest_accuracy}') + print(f'Highest accuracy model sent from {inputs[highest_accuracy_model_idx].input}.' + + ' Receiving 2x weight in updated model') + self.model = fedavg([input.model for input in inputs], weights=relative_model_weights) + self.optimizer = [input.optimizer for input in inputs][0] + self.current_round += 1 + if self.current_round < self.rounds: + self.next(self.aggregated_model_validation, + foreach='collaborators', exclude=['private']) + else: + self.next(self.end) + + @aggregator + def end(self): + print('This is the end of the flow') diff --git a/openfl-workspace/experimental/102_aggregator_validation/src/utils.py b/openfl-workspace/experimental/102_aggregator_validation/src/utils.py new file mode 100644 index 0000000000..1e56f3e68d --- /dev/null +++ b/openfl-workspace/experimental/102_aggregator_validation/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/openfl-workspace/experimental/104_keras_mnist/.workspace b/openfl-workspace/experimental/104_keras_mnist/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/openfl-workspace/experimental/104_keras_mnist/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/openfl-workspace/experimental/104_keras_mnist/plan/cols.yaml b/openfl-workspace/experimental/104_keras_mnist/plan/cols.yaml new file mode 100644 index 0000000000..95307de3bc --- /dev/null +++ b/openfl-workspace/experimental/104_keras_mnist/plan/cols.yaml @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: + \ No newline at end of file diff --git a/openfl-workspace/experimental/104_keras_mnist/plan/data.yaml b/openfl-workspace/experimental/104_keras_mnist/plan/data.yaml new file mode 100644 index 0000000000..1455f8c5b7 --- /dev/null +++ b/openfl-workspace/experimental/104_keras_mnist/plan/data.yaml @@ -0,0 +1,27 @@ +## Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +# collaborator_name ,data_directory_path +col1: + callable_func: + settings: + batch_size: 32 + index: 0 + n_collaborators: 2 + test_dataset: src.collaborator_private_attrs.test_dataset + train_dataset: src.collaborator_private_attrs.train_dataset + template: src.collaborator_private_attrs.collaborator_private_attrs + +col2: + callable_func: + settings: + batch_size: 32 + index: 1 + n_collaborators: 2 + test_dataset: src.collaborator_private_attrs.test_dataset + train_dataset: src.collaborator_private_attrs.train_dataset + template: src.collaborator_private_attrs.collaborator_private_attrs diff --git a/openfl-workspace/experimental/104_keras_mnist/plan/defaults b/openfl-workspace/experimental/104_keras_mnist/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/openfl-workspace/experimental/104_keras_mnist/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/openfl-workspace/experimental/104_keras_mnist/plan/plan.yaml b/openfl-workspace/experimental/104_keras_mnist/plan/plan.yaml new file mode 100644 index 0000000000..5418331059 --- /dev/null +++ b/openfl-workspace/experimental/104_keras_mnist/plan/plan.yaml @@ -0,0 +1,28 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.Aggregator + settings : + rounds_to_train : 1 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.Collaborator + settings : {} + + +federated_flow: + template: src.flow.KerasMNISTFlow + settings: + model: src.flow.model + rounds: 4 + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml diff --git a/openfl-workspace/experimental/104_keras_mnist/requirements.txt b/openfl-workspace/experimental/104_keras_mnist/requirements.txt new file mode 100644 index 0000000000..2e72e61717 --- /dev/null +++ b/openfl-workspace/experimental/104_keras_mnist/requirements.txt @@ -0,0 +1 @@ +tensorflow==2.7.0 diff --git a/openfl-workspace/experimental/104_keras_mnist/src/__init__.py b/openfl-workspace/experimental/104_keras_mnist/src/__init__.py new file mode 100644 index 0000000000..49883934a8 --- /dev/null +++ b/openfl-workspace/experimental/104_keras_mnist/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/openfl-workspace/experimental/104_keras_mnist/src/collaborator_private_attrs.py b/openfl-workspace/experimental/104_keras_mnist/src/collaborator_private_attrs.py new file mode 100644 index 0000000000..33e8751259 --- /dev/null +++ b/openfl-workspace/experimental/104_keras_mnist/src/collaborator_private_attrs.py @@ -0,0 +1,43 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +from tensorflow.keras.datasets import mnist +from tensorflow.keras.utils import to_categorical + + +nb_classes = 10 +(X_train, y_train), (X_test, y_test) = mnist.load_data() +print("X_train original shape", X_train.shape) +print("y_train original shape", y_train.shape) + +X_train = X_train.astype("float32") +X_test = X_test.astype("float32") +X_train /= 255.0 +X_test /= 255.0 +print("Training matrix shape", X_train.shape) +print("Testing matrix shape", X_test.shape) + +Y_train = to_categorical(y_train, nb_classes) +Y_test = to_categorical(y_test, nb_classes) + +train_dataset = (X_train, Y_train) +test_dataset = (X_test, Y_test) + + +def collaborator_private_attrs(n_collaborators, index, train_dataset, test_dataset, batch_size): + from openfl.utilities.data_splitters import EqualNumPyDataSplitter + train_splitter = EqualNumPyDataSplitter() + test_splitter = EqualNumPyDataSplitter() + + X_train, y_train = train_dataset + X_test, y_test = test_dataset + + train_idx = train_splitter.split(y_train, n_collaborators) + valid_idx = test_splitter.split(y_test, n_collaborators) + + train_dataset = X_train[train_idx[index]], y_train[train_idx[index]] + test_dataset = X_test[valid_idx[index]], y_test[valid_idx[index]] + + return { + "train_loader": train_dataset, "test_loader": test_dataset, + "batch_size": batch_size + } diff --git a/openfl-workspace/experimental/104_keras_mnist/src/flow.py b/openfl-workspace/experimental/104_keras_mnist/src/flow.py new file mode 100644 index 0000000000..b5e183eb3a --- /dev/null +++ b/openfl-workspace/experimental/104_keras_mnist/src/flow.py @@ -0,0 +1,113 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from openfl.experimental.interface import FLSpec +from openfl.experimental.placement import aggregator, collaborator +from tensorflow.keras.layers import Flatten, Dense, Dropout, Conv2D, MaxPool2D +from tensorflow.keras.models import Sequential +import numpy as np + + +nb_classes = 10 +model = Sequential([ + Conv2D(filters=32, kernel_size=(3, 3), activation="relu", input_shape=(28, 28, 1)), + MaxPool2D(), + Flatten(), + Dense(512, activation="relu"), + Dropout(0.2), + Dense(512, activation="relu"), + Dropout(0.2), + Dense(nb_classes, activation="softmax"), +]) +model.compile(optimizer="sgd", loss="categorical_crossentropy", metrics=["accuracy"]) + + +def fedavg(models): + new_model = models[0] + state_dicts = [model.weights for model in models] + state_dict = new_model.weights + for idx, _ in enumerate(models[1].weights): + state_dict[idx] = np.sum(np.array([state[idx] + for state in state_dicts], + dtype=object), axis=0) / len(models) + new_model.set_weights(state_dict) + return new_model + + +def inference(model, test_loader, batch_size): + x_test, y_test = test_loader + loss, accuracy = model.evaluate( + x_test, + y_test, + batch_size=batch_size, + verbose=0 + ) + accuracy_percentage = accuracy * 100 + print(f"Test set: Avg. loss: {loss}, Accuracy: {accuracy_percentage:.2f}%") + return accuracy + + +class KerasMNISTFlow(FLSpec): + def __init__(self, model, rounds=3, **kwargs): + super().__init__(**kwargs) + self.model = model + self.n_rounds = rounds + self.current_round = 1 + + @aggregator + def start(self): + self.collaborators = self.runtime.collaborators + self.next(self.aggregated_model_validation, foreach='collaborators') + + @collaborator + def aggregated_model_validation(self): + print(f'Performing aggregated model validation for collaborator {self.input}') + self.agg_validation_score = inference(self.model, self.test_loader, self.batch_size) + print(f'{self.input} value of {self.agg_validation_score}') + self.next(self.train) + + @collaborator + def train(self): + x_train, y_train = self.train_loader + history = self.model.fit( + x_train, y_train, + batch_size=self.batch_size, + epochs=1, + verbose=1, + ) + self.loss = history.history["loss"][0] + self.next(self.local_model_validation) + + @collaborator + def local_model_validation(self): + self.local_validation_score = inference(self.model, self.test_loader, self.batch_size) + print(f'Doing local model validation for collaborator {self.input}:' + + f' {self.local_validation_score}') + self.next(self.join) + + @aggregator + def join(self, inputs): + self.average_loss = sum(input.loss for input in inputs) / len(inputs) + self.aggregated_model_accuracy = sum( + input.agg_validation_score for input in inputs) / len(inputs) + self.local_model_accuracy = sum( + input.local_validation_score for input in inputs) / len(inputs) + print(f'Average aggregated model validation values = {self.aggregated_model_accuracy}') + print(f'Average training loss = {self.average_loss}') + print(f'Average local model validation values = {self.local_model_accuracy}') + print("Taking FedAvg of models of all collaborators") + self.model = fedavg([input.model for input in inputs]) + + self.next(self.internal_loop) + + @aggregator + def internal_loop(self): + if self.current_round == self.n_rounds: + self.next(self.end) + else: + self.current_round += 1 + self.next(self.aggregated_model_validation, foreach='collaborators') + + @aggregator + def end(self): + print('This is the end of the flow') diff --git a/openfl-workspace/experimental/104_keras_mnist/src/utils.py b/openfl-workspace/experimental/104_keras_mnist/src/utils.py new file mode 100644 index 0000000000..96fe885713 --- /dev/null +++ b/openfl-workspace/experimental/104_keras_mnist/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from tensorflow.summary import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/.workspace b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/plan/cols.yaml b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/plan/cols.yaml new file mode 100644 index 0000000000..95307de3bc --- /dev/null +++ b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/plan/cols.yaml @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: + \ No newline at end of file diff --git a/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/plan/data.yaml b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/plan/data.yaml new file mode 100644 index 0000000000..b4c59467e2 --- /dev/null +++ b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/plan/data.yaml @@ -0,0 +1,33 @@ +## Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +col1: + callable_func: + settings: + batch_size: 64 + index: 1 + n_collaborators: 2 + test_dataset: src.collaborator_private_attrs.test_dataset + train_dataset: src.collaborator_private_attrs.train_dataset + template: src.collaborator_private_attrs.collaborator_private_attrs + +col2: + callable_func: + settings: + batch_size: 64 + index: 2 + n_collaborators: 2 + test_dataset: src.collaborator_private_attrs.test_dataset + train_dataset: src.collaborator_private_attrs.train_dataset + template: src.collaborator_private_attrs.collaborator_private_attrs + +aggregator: + callable_func: + settings: + batch_size: 50 + watermark_data: src.aggregator_private_attrs.watermark_data + template: src.aggregator_private_attrs.aggregator_private_attrs diff --git a/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/plan/defaults b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/plan/plan.yaml b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/plan/plan.yaml new file mode 100644 index 0000000000..79b13ee2c1 --- /dev/null +++ b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/plan/plan.yaml @@ -0,0 +1,29 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.Aggregator + settings : + rounds_to_train : 1 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.Collaborator + settings : {} + + +federated_flow: + template: src.flow.FederatedFlow_MNIST_Watermarking + settings: + model: null + optimizer: null + n_rounds: 4 + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml diff --git a/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/requirements.txt b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/requirements.txt new file mode 100644 index 0000000000..874ba23c3e --- /dev/null +++ b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/requirements.txt @@ -0,0 +1,7 @@ +torch==1.13.1 +torchvision==0.14.1 +tensorboard +wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability +matplotlib +imagen @ git+https://github.com/pyviz-topics/imagen.git@master +param==1.13.0 diff --git a/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/__init__.py b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/__init__.py new file mode 100644 index 0000000000..49883934a8 --- /dev/null +++ b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/aggregator_private_attrs.py b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/aggregator_private_attrs.py new file mode 100644 index 0000000000..5d55454f7c --- /dev/null +++ b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/aggregator_private_attrs.py @@ -0,0 +1,169 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import torch +import torchvision +import numpy as np +import pathlib +import os +import matplotlib +import matplotlib.pyplot as plt +import PIL.Image as Image +import imagen as ig +import numbergen as ng + + +watermark_dir = "./files/watermark-dataset/MWAFFLE/" + + +def generate_watermark( + x_size=28, y_size=28, num_class=10, num_samples_per_class=10, img_dir=watermark_dir +): + """ + Generate Watermark by superimposing a pattern on noisy background. + + Parameters + ---------- + x_size: x dimension of the image + y_size: y dimension of the image + num_class: number of classes in the original dataset + num_samples_per_class: number of samples to be generated per class + img_dir: directory for saving watermark dataset + + Reference + --------- + WAFFLE: Watermarking in Federated Learning (https://arxiv.org/abs/2008.07298) + + """ + x_pattern = int(x_size * 2 / 3.0 - 1) + y_pattern = int(y_size * 2 / 3.0 - 1) + + np.random.seed(0) + for cls in range(num_class): + patterns = [] + random_seed = 10 + cls + patterns.append( + ig.Line( + xdensity=x_pattern, + ydensity=y_pattern, + thickness=0.001, + orientation=np.pi * ng.UniformRandom(seed=random_seed), + x=ng.UniformRandom(seed=random_seed) - 0.5, + y=ng.UniformRandom(seed=random_seed) - 0.5, + scale=0.8, + ) + ) + patterns.append( + ig.Arc( + xdensity=x_pattern, + ydensity=y_pattern, + thickness=0.001, + orientation=np.pi * ng.UniformRandom(seed=random_seed), + x=ng.UniformRandom(seed=random_seed) - 0.5, + y=ng.UniformRandom(seed=random_seed) - 0.5, + size=0.33, + ) + ) + + pat = np.zeros((x_pattern, y_pattern)) + for i in range(6): + j = np.random.randint(len(patterns)) + pat += patterns[j]() + res = pat > 0.5 + pat = res.astype(int) + + x_offset = np.random.randint(x_size - x_pattern + 1) + y_offset = np.random.randint(y_size - y_pattern + 1) + + for i in range(num_samples_per_class): + base = np.random.rand(x_size, y_size) + base[ + x_offset: x_offset + pat.shape[0], + y_offset: y_offset + pat.shape[1], + ] += pat + d = np.ones((x_size, x_size)) + img = np.minimum(base, d) + if not os.path.exists(img_dir + str(cls) + "/"): + os.makedirs(img_dir + str(cls) + "/") + plt.imsave( + img_dir + str(cls) + "/wm_" + str(i + 1) + ".png", + img, + cmap=matplotlib.cm.gray, + ) + + +# If the Watermark dataset does not exist, generate and save the Watermark images +watermark_path = pathlib.Path(watermark_dir) +if watermark_path.exists() and watermark_path.is_dir(): + print( + f"Watermark dataset already exists at: {watermark_path}. Proceeding to next step ... " + ) + pass +else: + print("Generating Watermark dataset... ") + generate_watermark() + + +class WatermarkDataset(torch.utils.data.Dataset): + def __init__(self, images_dir, label_dir=None, transforms=None): + self.images_dir = os.path.abspath(images_dir) + self.image_paths = [ + os.path.join(self.images_dir, d) for d in os.listdir(self.images_dir) + ] + self.label_paths = label_dir + self.transform = transforms + temp = [] + + # Recursively counting total number of images in the directory + for image_path in self.image_paths: + for path in os.walk(image_path): + if len(path) <= 1: + continue + path = path[2] + for im_n in [image_path + "/" + p for p in path]: + temp.append(im_n) + self.image_paths = temp + + if len(self.image_paths) == 0: + raise Exception(f"No file(s) found under {images_dir}") + + def __len__(self): + return len(self.image_paths) + + def __getitem__(self, idx): + image_filepath = self.image_paths[idx] + image = Image.open(image_filepath) + image = image.convert("RGB") + image = self.transform(image) + label = int(image_filepath.split("/")[-2]) + + return image, label + + +def get_watermark_transforms(): + return torchvision.transforms.Compose( + [ + torchvision.transforms.Grayscale(), + torchvision.transforms.Resize(28), + torchvision.transforms.ToTensor(), + torchvision.transforms.Normalize(mean=(0.5,), std=(0.5,)), # Normalize + ] + ) + + +watermark_data = WatermarkDataset( + images_dir=watermark_dir, + transforms=get_watermark_transforms(), +) + + +def aggregator_private_attrs(watermark_data, batch_size): + return { + "watermark_data_loader": torch.utils.data.DataLoader( + watermark_data, batch_size=batch_size, shuffle=True + ), + "pretrain_epochs": 25, + "retrain_epochs": 25, + "watermark_acc_threshold": 0.98, + "watermark_pretraining_completed": False, + } diff --git a/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/collaborator_private_attrs.py b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/collaborator_private_attrs.py new file mode 100644 index 0000000000..f7f455f2aa --- /dev/null +++ b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/collaborator_private_attrs.py @@ -0,0 +1,45 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from copy import deepcopy + +import torch +import torchvision + +train_dataset = torchvision.datasets.MNIST( + "./files/", + train=True, + download=True, + transform=torchvision.transforms.Compose( + [ + torchvision.transforms.ToTensor(), + torchvision.transforms.Normalize((0.1307,), (0.3081,)), + ] + ), +) + +test_dataset = torchvision.datasets.MNIST( + "./files/", + train=False, + download=True, + transform=torchvision.transforms.Compose( + [ + torchvision.transforms.ToTensor(), + torchvision.transforms.Normalize((0.1307,), (0.3081,)), + ] + ), +) + + +def collaborator_private_attrs(index, n_collaborators, batch_size, train_dataset, test_dataset): + train = deepcopy(train_dataset) + test = deepcopy(test_dataset) + train.data = train_dataset.data[index::n_collaborators] + train.targets = train_dataset.targets[index::n_collaborators] + test.data = test_dataset.data[index::n_collaborators] + test.targets = test_dataset.targets[index::n_collaborators] + + return { + "train_loader": torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=True), + "test_loader": torch.utils.data.DataLoader(test, batch_size=batch_size, shuffle=True), + } diff --git a/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/flow.py b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/flow.py new file mode 100644 index 0000000000..429412c209 --- /dev/null +++ b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/flow.py @@ -0,0 +1,304 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from openfl.experimental.interface import FLSpec +from openfl.experimental.placement import aggregator, collaborator + +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim +import torch +import numpy as np + +# MNIST parameters +learning_rate = 5e-2 +momentum = 5e-1 +log_interval = 20 + +# Watermarking parameters +watermark_pretrain_learning_rate = 1e-1 +watermark_pretrain_momentum = 5e-1 +watermark_pretrain_weight_decay = 5e-05 +watermark_retrain_learning_rate = 5e-3 + + +def inference(network, test_loader): + network.eval() + correct = 0 + with torch.no_grad(): + for data, target in test_loader: + output = network(data) + pred = output.data.max(1, keepdim=True)[1] + correct += pred.eq(target.data.view_as(pred)).sum() + accuracy = float(correct / len(test_loader.dataset)) + return accuracy + + +def train_model(model, optimizer, data_loader, entity, round_number, log=False): + # Helper function to train the model + train_loss = 0 + model.train() + for batch_idx, (X, y) in enumerate(data_loader): + optimizer.zero_grad() + + output = model(X) + loss = F.nll_loss(output, y) + loss.backward() + + optimizer.step() + + train_loss += loss.item() * len(X) + if batch_idx % log_interval == 0 and log: + print(f"{entity:<20} Train Epoch: {round_number:<3}" + + f" [{batch_idx * len(X):<3}/{len(data_loader.dataset):<4}" + + f" ({100.0 * batch_idx / len(data_loader):<.0f}%)]" + + f" Loss: {loss.item():<.6f}") + train_loss /= len(data_loader.dataset) + return train_loss + + +def fedavg(agg_model, models): + state_dicts = [model.state_dict() for model in models] + state_dict = agg_model.state_dict() + for key in models[0].state_dict(): + state_dict[key] = np.sum( + np.array([state[key] for state in state_dicts], dtype=object), + axis=0) / len(models) + agg_model.load_state_dict(state_dict) + return agg_model + + +class Net(nn.Module): + def __init__(self, dropout=0.0): + super(Net, self).__init__() + self.dropout = dropout + self.block = nn.Sequential( + nn.Conv2d(1, 32, 2), + nn.MaxPool2d(2), + nn.ReLU(), + nn.Conv2d(32, 64, 2), + nn.MaxPool2d(2), + nn.ReLU(), + nn.Conv2d(64, 128, 2), + nn.ReLU(), + ) + self.fc1 = nn.Linear(128 * 5**2, 200) + self.fc2 = nn.Linear(200, 10) + self.relu = nn.ReLU() + self.dropout = nn.Dropout(p=dropout) + + def forward(self, x): + x = self.dropout(x) + out = self.block(x) + out = out.view(-1, 128 * 5**2) + out = self.dropout(out) + out = self.relu(self.fc1(out)) + out = self.dropout(out) + out = self.fc2(out) + return F.log_softmax(out, 1) + + +class FederatedFlow_MNIST_Watermarking(FLSpec): # NOQA N801 + """ + This Flow demonstrates Watermarking on a Deep Learning Model in Federated Learning + Ref: WAFFLE: Watermarking in Federated Learning (https://arxiv.org/abs/2008.07298) + """ + + def __init__( + self, + model=None, + optimizer=None, + watermark_pretrain_optimizer=None, + watermark_retrain_optimizer=None, + round_number=0, + n_rounds=4, + **kwargs, + ): + super().__init__(**kwargs) + + if model is not None: + self.model = model + self.optimizer = optimizer + self.watermark_pretrain_optimizer = watermark_pretrain_optimizer + self.watermark_retrain_optimizer = watermark_retrain_optimizer + else: + self.model = Net() + self.optimizer = optim.SGD( + self.model.parameters(), lr=learning_rate, momentum=momentum + ) + self.watermark_pretrain_optimizer = optim.SGD( + self.model.parameters(), + lr=watermark_pretrain_learning_rate, + momentum=watermark_pretrain_momentum, + weight_decay=watermark_pretrain_weight_decay, + ) + self.watermark_retrain_optimizer = optim.SGD( + self.model.parameters(), lr=watermark_retrain_learning_rate + ) + self.round_number = round_number + self.n_rounds = n_rounds + + @aggregator + def start(self): + """ + This is the start of the Flow. + """ + print(": Start of flow ... ") + self.collaborators = self.runtime.collaborators + + self.next(self.watermark_pretrain) + + @aggregator + def watermark_pretrain(self): + """ + Pre-Train the Model before starting Federated Learning. + """ + if not self.watermark_pretraining_completed: + + print(": Performing Watermark Pre-training") + + for i in range(self.pretrain_epochs): + + watermark_pretrain_loss = train_model( + self.model, + self.watermark_pretrain_optimizer, + self.watermark_data_loader, + ":", + i, + log=False, + ) + watermark_pretrain_validation_score = inference( + self.model, self.watermark_data_loader + ) + + print(f": Watermark Pretraining: Round: {i:<3}" + + f" Loss: {watermark_pretrain_loss:<.6f}" + + f" Acc: {watermark_pretrain_validation_score:<.6f}") + + self.watermark_pretraining_completed = True + + self.next( + self.aggregated_model_validation, + foreach="collaborators", + ) + + @collaborator + def aggregated_model_validation(self): + """ + Perform Aggregated Model validation on Collaborators. + """ + self.agg_validation_score = inference(self.model, self.test_loader) + print(f"" + + f" Aggregated Model validation score = {self.agg_validation_score}" + ) + + self.next(self.train) + + @collaborator + def train(self): + """ + Train model on Local collab dataset. + """ + print(": Performing Model Training on Local dataset ... ") + + self.optimizer = optim.SGD( + self.model.parameters(), lr=learning_rate, momentum=momentum + ) + + self.loss = train_model( + self.model, + self.optimizer, + self.train_loader, + f"", + self.round_number, + log=True, + ) + + self.next(self.local_model_validation) + + @collaborator + def local_model_validation(self): + """ + Validate locally trained model. + """ + self.local_validation_score = inference(self.model, self.test_loader) + print( + f" Local model validation score = {self.local_validation_score}" + ) + self.next(self.join) + + @aggregator + def join(self, inputs): + """ + Model aggregation step. + """ + self.average_loss = sum(input.loss for input in inputs) / len(inputs) + self.aggregated_model_accuracy = sum( + input.agg_validation_score for input in inputs + ) / len(inputs) + self.local_model_accuracy = sum( + input.local_validation_score for input in inputs + ) / len(inputs) + + print(": Joining models from collaborators...") + + print( + f" Aggregated model validation score = {self.aggregated_model_accuracy}" + ) + print(f" Average training loss = {self.average_loss}") + print(f" Average local model validation values = {self.local_model_accuracy}") + + self.model = fedavg(self.model, [input.model for input in inputs]) + + self.next(self.watermark_retrain) + + @aggregator + def watermark_retrain(self): + """ + Retrain the aggregated model. + """ + print(": Performing Watermark Retraining ... ") + self.watermark_retrain_optimizer = optim.SGD( + self.model.parameters(), lr=watermark_retrain_learning_rate + ) + + retrain_round = 0 + + # Perform re-training until (accuracy >= acc_threshold) or + # (retrain_round > number of retrain_epochs) + self.watermark_retrain_validation_score = inference( + self.model, self.watermark_data_loader + ) + while ( + self.watermark_retrain_validation_score < self.watermark_acc_threshold + ) and (retrain_round < self.retrain_epochs): + self.watermark_retrain_train_loss = train_model( + self.model, + self.watermark_retrain_optimizer, + self.watermark_data_loader, + "", + retrain_round, + log=False, + ) + self.watermark_retrain_validation_score = inference( + self.model, self.watermark_data_loader + ) + + print(f": Watermark Retraining: Train Epoch: {self.round_number:<3}" + + f" Retrain Round: {retrain_round:<3}" + + f" Loss: {self.watermark_retrain_train_loss:<.6f}," + + f" Acc: {self.watermark_retrain_validation_score:<.6f}") + retrain_round += 1 + + if self.round_number < self.n_rounds: + self.round_number += 1 + self.next(self.start) + else: + self.next(self.end) + + @aggregator + def end(self): + """ + This is the last step in the Flow. + """ + print("This is the end of the flow") diff --git a/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/utils.py b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/utils.py new file mode 100644 index 0000000000..a3db4c1ecf --- /dev/null +++ b/openfl-workspace/experimental/301_torch_cnn_mnist_watermarking/src/utils.py @@ -0,0 +1,22 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""You may copy this file as the starting point of your own model.""" + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/.workspace b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/plan/cols.yaml b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/plan/cols.yaml new file mode 100644 index 0000000000..95307de3bc --- /dev/null +++ b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/plan/cols.yaml @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: + \ No newline at end of file diff --git a/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/plan/data.yaml b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/plan/data.yaml new file mode 100644 index 0000000000..483a77536e --- /dev/null +++ b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/plan/data.yaml @@ -0,0 +1,32 @@ +## Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +# collaborator_name ,data_directory_path +col1: + callable_func: + settings: + batch_size: 64 + index: 0 + n_collaborators: 2 + train_dataset: src.collaborator_private_attrs.train_dataset + test_dataset: src.collaborator_private_attrs.test_dataset + template: src.collaborator_private_attrs.collaborator_private_attrs + +col2: + callable_func: + settings: + batch_size: 64 + index: 1 + n_collaborators: 2 + train_dataset: src.collaborator_private_attrs.train_dataset + test_dataset: src.collaborator_private_attrs.test_dataset + template: src.collaborator_private_attrs.collaborator_private_attrs + +aggregator: + callable_func: + settings: {} + template: src.aggregator_private_attrs.aggregator_private_attrs diff --git a/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/plan/defaults b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/plan/plan.yaml b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/plan/plan.yaml new file mode 100644 index 0000000000..9be6450562 --- /dev/null +++ b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/plan/plan.yaml @@ -0,0 +1,49 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.Aggregator + settings : + rounds_to_train : 1 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.Collaborator + settings : {} + + +federated_flow: + template: src.flow.TinyImageNetFlow + settings: + model: + template: src.flow.Net + settings: + mobilenetv2: + template: src.flow.MobileNetV2 + settings: + num_classes: 1000 + inverted_residual_setting: src.flow.inverted_residual_setting + classifier_block: + template: src.flow.classifier_block + settings: + dropout: + template: src.flow.dropout + settings: + p: 0.2 + linear_layer: + template: src.flow.linear_layer + settings: + in_features: src.flow.in_features + out_features: 1000 + in_features: 1000 + out_features: 200 + rounds: 4 + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml diff --git a/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/requirements.txt b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/requirements.txt new file mode 100644 index 0000000000..16b349007c --- /dev/null +++ b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/requirements.txt @@ -0,0 +1,4 @@ +torch==1.13.1 +torchvision==0.14.1 +tensorboard +wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability diff --git a/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/__init__.py b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/__init__.py new file mode 100644 index 0000000000..6e02c1c951 --- /dev/null +++ b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/aggregator_private_attrs.py b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/aggregator_private_attrs.py new file mode 100644 index 0000000000..2b866a8d90 --- /dev/null +++ b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/aggregator_private_attrs.py @@ -0,0 +1,17 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import torchvision + + +def aggregator_private_attrs(): + # Load the pre-trained model weights from a file. For example: + # we have used pre-trained weights from torchvision + model = torchvision.models.mobilenet_v2( + weights=torchvision.models.MobileNet_V2_Weights.DEFAULT, + progress=True + ) + + return { + 'pretrained_state_dict': model.state_dict() + } diff --git a/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/collaborator_private_attrs.py b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/collaborator_private_attrs.py new file mode 100644 index 0000000000..e0898f344d --- /dev/null +++ b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/collaborator_private_attrs.py @@ -0,0 +1,144 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import os +import glob +import shutil +import torch +import torchvision.transforms as T + +from pathlib import Path +from copy import deepcopy +from PIL import Image +from torch.utils.data import Dataset, random_split + + +common_data_folder = os.path.join(os.getcwd(), 'data') +zip_file_path = os.path.join(common_data_folder, 'tiny-imagenet-200.zip') +os.makedirs(common_data_folder, exist_ok=True) +os.system(f'wget --no-clobber http://cs231n.stanford.edu/tiny-imagenet-200.zip' + f' -O {zip_file_path}') +print('Unpacking tiny-imagenet-200.zip') +shutil.unpack_archive(str(zip_file_path), str(common_data_folder)) + +normalize = T.Normalize( + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225] +) + +augmentation = T.RandomApply( + [T.RandomHorizontalFlip(), + T.RandomRotation(10), + T.RandomResizedCrop(64)], + p=.8 +) + +training_transform = T.Compose( + [T.Lambda(lambda x: x.convert('RGB')), + T.ToTensor(), + augmentation, + normalize] +) + +valid_transform = T.Compose( + [T.Lambda(lambda x: x.convert('RGB')), + T.ToTensor(), + normalize] +) + + +class TinyImageNetDataset(Dataset): + """TinyImageNet shard dataset class.""" + + NUM_IMAGES_PER_CLASS = 500 + + def __init__(self, data_folder: Path, data_type='train', transform=None): + """Initialize TinyImageNetDataset.""" + super(TinyImageNetDataset, self).__init__() + self.data_type = data_type + self._common_data_folder = data_folder + self._data_folder = os.path.join(data_folder, data_type) + self.labels = {} # fname - label number mapping + self.image_paths = sorted( + glob.iglob( + os.path.join(self._data_folder, '**', '*.JPEG'), + recursive=True + ) + ) + with open(os.path.join(self._common_data_folder, 'wnids.txt'), 'r') as fp: + self.label_texts = sorted( + [text.strip() for text in fp.readlines()] + ) + self.label_text_to_number = { + text: i for i, text in enumerate(self.label_texts) + } + self.fill_labels() + self.transform = transform + + def __len__(self) -> int: + """Return the len of the shard dataset.""" + return len(self.image_paths) + + def __getitem__(self, index: int): + """Return an item by the index.""" + file_path = self.image_paths[index] + sample = self.read_image(file_path) + if self.transform: + sample = self.transform(sample) + label = self.labels[os.path.basename(file_path)] + return sample, label + + def read_image(self, path: Path): + """Read the image.""" + img = Image.open(path) + return img + + def fill_labels(self) -> None: + """Fill labels.""" + if self.data_type == 'train': + for label_text, i in self.label_text_to_number.items(): + for cnt in range(self.NUM_IMAGES_PER_CLASS): + self.labels[f'{label_text}_{cnt}.JPEG'] = i + elif self.data_type == 'val': + with open(os.path.join(self._data_folder, 'val_annotations.txt'), 'r') as fp: + for line in fp.readlines(): + terms = line.split('\t') + file_name, label_text = terms[0], terms[1] + self.labels[file_name] = self.label_text_to_number[ + label_text + ] + + +train_dataset = TinyImageNetDataset( + os.path.join(common_data_folder, 'tiny-imagenet-200'), + transform=training_transform +) +test_dataset = TinyImageNetDataset( + os.path.join(common_data_folder, 'tiny-imagenet-200'), + data_type='val', + transform=valid_transform +) + + +def collaborator_private_attrs( + index, n_collaborators, batch_size, train_dataset, test_dataset +): + + train = deepcopy(train_dataset) + test = deepcopy(test_dataset) + + train = random_split( + train, [len(train) // n_collaborators] * n_collaborators + )[index] + test = random_split( + test, [len(test) // n_collaborators] * n_collaborators + )[index] + + return { + 'train_loader': torch.utils.data.DataLoader( + train, batch_size=batch_size, shuffle=True + ), + 'test_loader': torch.utils.data.DataLoader( + test, batch_size=batch_size, + ), + } diff --git a/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/flow.py b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/flow.py new file mode 100644 index 0000000000..ef06c2cb84 --- /dev/null +++ b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/flow.py @@ -0,0 +1,449 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import collections +from itertools import repeat +from typing import Callable, List, Optional, Union, Tuple, Sequence, Any +from types import FunctionType + +import numpy as np +import tqdm +import warnings + +import torch +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim +from torch import Tensor + +from openfl.experimental.interface import FLSpec +from openfl.experimental.placement import aggregator, collaborator + + +device = 'cuda' if torch.cuda.is_available() else 'cpu' + + +def _log_api_usage_once(obj: Any) -> None: + module = obj.__module__ + if not module.startswith('torchvision'): + module = f'torchvision.internal.{module}' + name = obj.__class__.__name__ + if isinstance(obj, FunctionType): + name = obj.__name__ + torch._C._log_api_usage_once(f'{module}.{name}') + + +def _make_ntuple(x: Any, n: int) -> Tuple[Any, ...]: + if isinstance(x, collections.abc.Iterable): + return tuple(x) + return tuple(repeat(x, n)) + + +class ConvNormActivation(torch.nn.Sequential): + def __init__( + self, + in_channels: int, + out_channels: int, + kernel_size: Union[int, Tuple[int, ...]] = 3, + stride: Union[int, Tuple[int, ...]] = 1, + padding: Optional[Union[int, Tuple[int, ...], str]] = None, + groups: int = 1, + norm_layer: Optional[Callable[..., torch.nn.Module]] = torch.nn.BatchNorm2d, + activation_layer: Optional[Callable[..., torch.nn.Module]] = torch.nn.ReLU, + dilation: Union[int, Tuple[int, ...]] = 1, + inplace: Optional[bool] = True, + bias: Optional[bool] = None, + conv_layer: Callable[..., torch.nn.Module] = torch.nn.Conv2d, + ) -> None: + + if padding is None: + if isinstance(kernel_size, int) and isinstance(dilation, int): + padding = (kernel_size - 1) // 2 * dilation + else: + _conv_dim = (len(kernel_size) + if isinstance(kernel_size, Sequence) + else len(dilation)) + kernel_size = _make_ntuple(kernel_size, _conv_dim) + dilation = _make_ntuple(dilation, _conv_dim) + padding = tuple((kernel_size[i] - 1) // 2 * dilation[i] for i in range(_conv_dim)) + if bias is None: + bias = norm_layer is None + + layers = [ + conv_layer( + in_channels, + out_channels, + kernel_size, + stride, + padding, + dilation=dilation, + groups=groups, + bias=bias, + ) + ] + + if norm_layer is not None: + layers.append(norm_layer(out_channels)) + + if activation_layer is not None: + params = {} if inplace is None else {"inplace": inplace} + layers.append(activation_layer(**params)) + super().__init__(*layers) + _log_api_usage_once(self) + self.out_channels = out_channels + + if self.__class__ == ConvNormActivation: + warnings.warn( + "Don't use ConvNormActivation directly, please use" + + " Conv2dNormActivation and Conv3dNormActivation instead." + ) + + +class Conv2dNormActivation(ConvNormActivation): + def __init__( + self, + in_channels: int, + out_channels: int, + kernel_size: Union[int, Tuple[int, int]] = 3, + stride: Union[int, Tuple[int, int]] = 1, + padding: Optional[Union[int, Tuple[int, int], str]] = None, + groups: int = 1, + norm_layer: Optional[Callable[..., torch.nn.Module]] = torch.nn.BatchNorm2d, + activation_layer: Optional[Callable[..., torch.nn.Module]] = torch.nn.ReLU, + dilation: Union[int, Tuple[int, int]] = 1, + inplace: Optional[bool] = True, + bias: Optional[bool] = None, + ) -> None: + + super().__init__( + in_channels, + out_channels, + kernel_size, + stride, + padding, + groups, + norm_layer, + activation_layer, + dilation, + inplace, + bias, + torch.nn.Conv2d, + ) + + +def _make_divisible(v: float, divisor: int, min_value: Optional[int] = None) -> int: + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + # Make sure that round down does not go down by more than 10%. + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +# necessary for backwards compatibility +class InvertedResidual(nn.Module): + def __init__( + self, inp: int, oup: int, stride: int, expand_ratio: int, + norm_layer: Optional[Callable[..., nn.Module]] = None + ) -> None: + super().__init__() + self.stride = stride + if stride not in [1, 2]: + raise ValueError(f"stride should be 1 or 2 insted of {stride}") + + if norm_layer is None: + norm_layer = nn.BatchNorm2d + + hidden_dim = int(round(inp * expand_ratio)) + self.use_res_connect = self.stride == 1 and inp == oup + + layers: List[nn.Module] = [] + if expand_ratio != 1: + # pw + layers.append( + Conv2dNormActivation(inp, hidden_dim, kernel_size=1, + norm_layer=norm_layer, + activation_layer=nn.ReLU6 + ) + ) + layers.extend( + [ + # dw + Conv2dNormActivation( + hidden_dim, + hidden_dim, + stride=stride, + groups=hidden_dim, + norm_layer=norm_layer, + activation_layer=nn.ReLU6, + ), + # pw-linear + nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False), + norm_layer(oup), + ] + ) + self.conv = nn.Sequential(*layers) + self.out_channels = oup + self._is_cn = stride > 1 + + def forward(self, x: Tensor) -> Tensor: + if self.use_res_connect: + return x + self.conv(x) + else: + return self.conv(x) + + +class MobileNetV2(nn.Module): + def __init__( + self, + num_classes: int = 1000, + width_mult: float = 1.0, + inverted_residual_setting: Optional[List[List[int]]] = None, + round_nearest: int = 8, + block: Optional[Callable[..., nn.Module]] = None, + norm_layer: Optional[Callable[..., nn.Module]] = None, + dropout: float = 0.2, + classifier_block: Optional = None + ) -> None: + super().__init__() + + if block is None: + block = InvertedResidual + + if norm_layer is None: + norm_layer = nn.BatchNorm2d + + input_channel = 32 + last_channel = 1280 + + # only check the first element, assuming user knows t,c,n,s are required + if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4: + raise ValueError( + "inverted_residual_setting should be non-empty " + + f"or a 4-element list, got {inverted_residual_setting}" + ) + + # building first layer + input_channel = _make_divisible( + input_channel * width_mult, round_nearest + ) + self.last_channel = _make_divisible( + last_channel * max(1.0, width_mult), round_nearest + ) + features: List[nn.Module] = [ + Conv2dNormActivation( + 3, input_channel, stride=2, norm_layer=norm_layer, + activation_layer=nn.ReLU6 + ) + ] + # building inverted residual blocks + for t, c, n, s in inverted_residual_setting: + output_channel = _make_divisible(c * width_mult, round_nearest) + for i in range(n): + stride = s if i == 0 else 1 + features.append( + block( + input_channel, output_channel, stride, + expand_ratio=t, norm_layer=norm_layer + ) + ) + input_channel = output_channel + # building last several layers + features.append( + Conv2dNormActivation( + input_channel, self.last_channel, kernel_size=1, + norm_layer=norm_layer, activation_layer=nn.ReLU6 + ) + ) + # make it nn.Sequential + self.features = nn.Sequential(*features) + + # building classifier + self.classifier = classifier_block + + # weight initialization + for m in self.modules(): + if isinstance(m, nn.Conv2d): + nn.init.kaiming_normal_(m.weight, mode='fan_out') + if m.bias is not None: + nn.init.zeros_(m.bias) + elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)): + nn.init.ones_(m.weight) + nn.init.zeros_(m.bias) + elif isinstance(m, nn.Linear): + nn.init.normal_(m.weight, 0, 0.01) + nn.init.zeros_(m.bias) + + def _forward_impl(self, x: Tensor) -> Tensor: + # This exists since TorchScript doesn't support inheritance, so the superclass method + # (this one) needs to have a name other than `forward` that can be accessed in a subclass + x = self.features(x) + # Cannot use "squeeze" as batch-size can be 1 + x = nn.functional.adaptive_avg_pool2d(x, (1, 1)) + x = torch.flatten(x, 1) + x = self.classifier(x) + return x + + def forward(self, x: Tensor) -> Tensor: + return self._forward_impl(x) + + +classifier_block = nn.Sequential +dropout = nn.Dropout +linear_layer = nn.Linear +in_features = _make_divisible(1280 * 1.0, 8) +inverted_residual_setting = [ + # t, c, n, s + [1, 16, 1, 1], + [6, 24, 2, 2], + [6, 32, 3, 2], + [6, 64, 4, 2], + [6, 96, 3, 1], + [6, 160, 3, 2], + [6, 320, 1, 1], +] + + +class Net(nn.Module): + def __init__(self, mobilenetv2, in_features, out_features): + super(Net, self).__init__() + self.base_model = mobilenetv2 + self.base_model.requires_grad_(False) + self.linear = torch.nn.Linear( + in_features=in_features, out_features=out_features, bias=True + ) + self.linear.requires_grad_(True) + + def forward(self, x): + x = self.base_model.forward(x) + x = self.linear(x) + return x + + +def inference(network, test_loader): + network = network.to(device).eval() + + data_loader = tqdm.tqdm(test_loader, desc="validate") + val_score = 0 + total_samples = 0 + + with torch.no_grad(): + for data, target in data_loader: + samples = target.shape[0] + total_samples += samples + data, target = torch.tensor(data).to(device), torch.tensor( + target).to(device, dtype=torch.int64) + output = network(data) + pred = output.argmax(dim=1, keepdim=True) + val_score += pred.eq(target).sum().cpu().numpy() + + accuracy = (val_score / total_samples) + print(f'Validation Accuracy: {100*accuracy:.2f}%') + return accuracy + + +def fedavg(models): + new_model = models[0] + state_dicts = [model.state_dict() for model in models] + state_dict = new_model.state_dict() + for key in models[1].state_dict(): + state_dict[key] = np.sum( + np.array([state[key] for state in state_dicts], dtype=object), axis=0 + ) / len(models) + new_model.load_state_dict(state_dict) + return new_model + + +class TinyImageNetFlow(FLSpec): + def __init__(self, model=None, rounds=3, **kwargs): + super().__init__(**kwargs) + self.model = model + self.rounds = rounds + + @aggregator + def start(self): + print('Performing initialization for model') + self.collaborators = self.runtime.collaborators + self.private = 10 + self.current_round = 0 + self.model.base_model.load_state_dict( + self.pretrained_state_dict + ) + self.next( + self.aggregated_model_validation, + foreach="collaborators", + exclude=["private"], + ) + + @collaborator + def aggregated_model_validation(self): + print(f"Performing aggregated model validation for collaborator {self.input}") + self.agg_validation_score = inference(self.model, self.test_loader) + print(f"{self.input} value of {self.agg_validation_score}") + self.next(self.train) + + @collaborator + def train(self): + data_loader = tqdm.tqdm(self.train_loader, desc="train") + self.model = self.model.to(device).train() + + losses = [] + self.optimizer = optim.Adam( + [x for x in self.model.parameters() if x.requires_grad], + lr=1e-4 + ) + for data, target in data_loader: + data, target = torch.tensor(data).to(device), torch.tensor( + target).to(device) + self.optimizer.zero_grad() + output = self.model(data) + loss = F.cross_entropy(output, target) + loss.backward() + self.optimizer.step() + losses.append(loss.detach().cpu().numpy()) + self.loss = loss.item() + + print(f'train_loss: {np.mean(losses)}') + self.training_completed = True + self.next(self.local_model_validation) + + @collaborator + def local_model_validation(self): + self.local_validation_score = inference(self.model, self.test_loader) + print( + "Doing local model validation" + + f"for collaborator {self.input}: {self.local_validation_score}" + ) + self.next(self.join, exclude=["training_completed"]) + + @aggregator + def join(self, inputs): + self.average_loss = sum(input.loss for input in inputs) / len(inputs) + self.aggregated_model_accuracy = sum( + input.agg_validation_score for input in inputs + ) / len(inputs) + self.local_model_accuracy = sum( + input.local_validation_score for input in inputs + ) / len(inputs) + print( + "Average aggregated model " + + f"validation values = {self.aggregated_model_accuracy}" + ) + print(f"Average training loss = {self.average_loss}") + print(f"Average local model validation values = {self.local_model_accuracy}") + self.model = fedavg([input.model.to("cpu") for input in inputs]) + self.next(self.internal_loop) + + @aggregator + def internal_loop(self): + self.current_round += 1 + if self.current_round < self.rounds: + self.next( + self.aggregated_model_validation, foreach="collaborators", exclude=["private"]) + else: + self.next(self.end) + + @aggregator + def end(self): + print("This is the end of the flow") diff --git a/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/utils.py b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/utils.py new file mode 100644 index 0000000000..1e56f3e68d --- /dev/null +++ b/openfl-workspace/experimental/501_pytorch_tinyimagenet_transfer_learning/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/openfl-workspace/experimental/template_workspace/.workspace b/openfl-workspace/experimental/template_workspace/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/openfl-workspace/experimental/template_workspace/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/openfl-workspace/experimental/template_workspace/plan/cols.yaml b/openfl-workspace/experimental/template_workspace/plan/cols.yaml new file mode 100644 index 0000000000..95307de3bc --- /dev/null +++ b/openfl-workspace/experimental/template_workspace/plan/cols.yaml @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: + \ No newline at end of file diff --git a/openfl-workspace/experimental/template_workspace/plan/data.yaml b/openfl-workspace/experimental/template_workspace/plan/data.yaml new file mode 100644 index 0000000000..7125ed836e --- /dev/null +++ b/openfl-workspace/experimental/template_workspace/plan/data.yaml @@ -0,0 +1,8 @@ +## Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +# collaborator_name ,data_directory_path \ No newline at end of file diff --git a/openfl-workspace/experimental/template_workspace/plan/defaults b/openfl-workspace/experimental/template_workspace/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/openfl-workspace/experimental/template_workspace/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/openfl-workspace/experimental/template_workspace/plan/plan.yaml b/openfl-workspace/experimental/template_workspace/plan/plan.yaml new file mode 100644 index 0000000000..f8892f66d8 --- /dev/null +++ b/openfl-workspace/experimental/template_workspace/plan/plan.yaml @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator: + defaults: plan/defaults/aggregator.yaml + template: openfl.experimental.component.Aggregator + settings: + rounds_to_train: 1 + +collaborator: + defaults: plan/defaults/collaborator.yaml + template: openfl.experimental.component.Collaborator + settings: {} + +federated_flow: + template: src.flow.TinyImageNetFlow + settings: {} + +network: + defaults: plan/defaults/network.yaml diff --git a/openfl-workspace/experimental/template_workspace/requirements.txt b/openfl-workspace/experimental/template_workspace/requirements.txt new file mode 100644 index 0000000000..32a96eaef3 --- /dev/null +++ b/openfl-workspace/experimental/template_workspace/requirements.txt @@ -0,0 +1 @@ +wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability diff --git a/openfl-workspace/experimental/template_workspace/src/__init__.py b/openfl-workspace/experimental/template_workspace/src/__init__.py new file mode 100644 index 0000000000..49883934a8 --- /dev/null +++ b/openfl-workspace/experimental/template_workspace/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/openfl-workspace/experimental/vertical_fl/.workspace b/openfl-workspace/experimental/vertical_fl/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/openfl-workspace/experimental/vertical_fl/plan/cols.yaml b/openfl-workspace/experimental/vertical_fl/plan/cols.yaml new file mode 100644 index 0000000000..2ac4e56fa5 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl/plan/cols.yaml @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: + \ No newline at end of file diff --git a/openfl-workspace/experimental/vertical_fl/plan/data.yaml b/openfl-workspace/experimental/vertical_fl/plan/data.yaml new file mode 100644 index 0000000000..d5017aaf7c --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl/plan/data.yaml @@ -0,0 +1,14 @@ +## Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +portland: + +seattle: + +chandler: + +bangalore: \ No newline at end of file diff --git a/openfl-workspace/experimental/vertical_fl/plan/defaults b/openfl-workspace/experimental/vertical_fl/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/openfl-workspace/experimental/vertical_fl/plan/plan.yaml b/openfl-workspace/experimental/vertical_fl/plan/plan.yaml new file mode 100644 index 0000000000..37b2c4be28 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl/plan/plan.yaml @@ -0,0 +1,26 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.aggregator.Aggregator + settings : + rounds_to_train : 1 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.collaborator.Collaborator + settings : {} + + +federated_flow: + template: src.workflow_interface_vertical_fl.VerticalFlow + settings: + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml diff --git a/openfl-workspace/experimental/vertical_fl/requirements.txt b/openfl-workspace/experimental/vertical_fl/requirements.txt new file mode 100644 index 0000000000..a721bb7e28 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl/requirements.txt @@ -0,0 +1,3 @@ +dill==0.3.6 +metaflow==2.7.15 +ray==2.2.0 \ No newline at end of file diff --git a/openfl-workspace/experimental/vertical_fl/src/__init__.py b/openfl-workspace/experimental/vertical_fl/src/__init__.py new file mode 100644 index 0000000000..6e02c1c951 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/openfl-workspace/experimental/vertical_fl/src/utils.py b/openfl-workspace/experimental/vertical_fl/src/utils.py new file mode 100644 index 0000000000..1e56f3e68d --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/openfl-workspace/experimental/vertical_fl/src/workflow_interface_vertical_fl.py b/openfl-workspace/experimental/vertical_fl/src/workflow_interface_vertical_fl.py new file mode 100644 index 0000000000..9b915e2e80 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl/src/workflow_interface_vertical_fl.py @@ -0,0 +1,83 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from openfl.experimental.interface import FLSpec +from openfl.experimental.placement import aggregator, collaborator + + +class VerticalFlow(FLSpec): + + def __init__(self, checkpoint: bool = False): + super().__init__(checkpoint) + + @aggregator + def start(self): + self.collaborators = self.runtime.collaborators + self.round = 0 + self.next_collaborator = ['portland'] + self.next(self.custom_task_portland, foreach='next_collaborator') + + @collaborator + def custom_task_portland(self): + print(f'Collaborator {self.input}: performing custom task') + self.result = 0 + self.next(self.gather_portland_results) + + @aggregator + def gather_portland_results(self, inputs): + self.results = [] + self.results.append(inputs[0].result) + self.next_collaborator = ['seattle'] + self.next(self.custom_task_seattle, foreach='next_collaborator', exclude=['results']) + + @collaborator + def custom_task_seattle(self): + print(f'Collaborator {self.input}: performing custom task') + self.result = 1 + self.next(self.gather_seattle_results) + + @aggregator + def gather_seattle_results(self, inputs): + self.results.append(inputs[0].result) + self.next_collaborator = ['chandler'] + self.next(self.custom_task_chandler, foreach='next_collaborator', exclude=['results']) + + @collaborator + def custom_task_chandler(self): + print(f'Collaborator {self.input}: performing custom task') + self.result = 2 + self.next(self.gather_chandler_results) + + @aggregator + def gather_chandler_results(self, inputs): + self.results.append(inputs[0].result) + self.next_collaborator = ['bangalore'] + self.next(self.custom_task_bangalore, foreach='next_collaborator', exclude=['results']) + + @collaborator + def custom_task_bangalore(self): + print(f'Collaborator {self.input}: performing custom task') + self.result = 3 + self.next(self.gather_bangalore_results) + + @aggregator + def gather_bangalore_results(self, inputs): + self.results.append(inputs[0].result) + self.next(self.combine) + + @aggregator + def combine(self): + print(f'The results from each of the collaborators are: {self.results}') + print(f'Their average = {sum(self.results) / len(self.results)}') + self.round += 1 + if self.round < 10: + print() + print(f'Starting round {self.round}...') + self.next_collaborator = ['portland'] + self.next(self.custom_task_portland, foreach='next_collaborator') + else: + self.next(self.end) + + @aggregator + def end(self): + print('This is the end of the flow') diff --git a/openfl-workspace/experimental/vertical_fl_two_party/.workspace b/openfl-workspace/experimental/vertical_fl_two_party/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl_two_party/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/openfl-workspace/experimental/vertical_fl_two_party/plan/cols.yaml b/openfl-workspace/experimental/vertical_fl_two_party/plan/cols.yaml new file mode 100644 index 0000000000..2ac4e56fa5 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl_two_party/plan/cols.yaml @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: + \ No newline at end of file diff --git a/openfl-workspace/experimental/vertical_fl_two_party/plan/data.yaml b/openfl-workspace/experimental/vertical_fl_two_party/plan/data.yaml new file mode 100644 index 0000000000..0ed304ef36 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl_two_party/plan/data.yaml @@ -0,0 +1,30 @@ +## Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +col1: + callable_func: + settings: + data_model: src.collaborator_private_attrs.data_model + data_model_optimizer: src.collaborator_private_attrs.data_model_optimizer + train_loader: src.collaborator_private_attrs.train_loader + template: src.collaborator_private_attrs.collaborator_private_attrs + +col2: + callable_func: + settings: + data_model: src.collaborator_private_attrs.data_model + data_model_optimizer: src.collaborator_private_attrs.data_model_optimizer + train_loader: src.collaborator_private_attrs.train_loader + template: src.collaborator_private_attrs.collaborator_private_attrs + +aggregator: + callable_func: + settings: + train_loader: src.aggregator_private_attrs.train_loader + label_model: src.aggregator_private_attrs.label_model + label_model_optimizer: src.aggregator_private_attrs.label_model_optimizer + template: src.aggregator_private_attrs.aggregator_private_attrs \ No newline at end of file diff --git a/openfl-workspace/experimental/vertical_fl_two_party/plan/defaults b/openfl-workspace/experimental/vertical_fl_two_party/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl_two_party/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/openfl-workspace/experimental/vertical_fl_two_party/plan/plan.yaml b/openfl-workspace/experimental/vertical_fl_two_party/plan/plan.yaml new file mode 100644 index 0000000000..13a9d70088 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl_two_party/plan/plan.yaml @@ -0,0 +1,27 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.aggregator.Aggregator + settings : + rounds_to_train : 10 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.collaborator.Collaborator + settings : {} + + +federated_flow: + template: src.workflow_interface_vertical_fl_two_party.VerticalTwoPartyFlow + settings: + batch_num: 0 + checkpoint: True + + +network : + defaults : plan/defaults/network.yaml diff --git a/openfl-workspace/experimental/vertical_fl_two_party/requirements.txt b/openfl-workspace/experimental/vertical_fl_two_party/requirements.txt new file mode 100644 index 0000000000..a721bb7e28 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl_two_party/requirements.txt @@ -0,0 +1,3 @@ +dill==0.3.6 +metaflow==2.7.15 +ray==2.2.0 \ No newline at end of file diff --git a/openfl-workspace/experimental/vertical_fl_two_party/src/__init__.py b/openfl-workspace/experimental/vertical_fl_two_party/src/__init__.py new file mode 100644 index 0000000000..6e02c1c951 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl_two_party/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/openfl-workspace/experimental/vertical_fl_two_party/src/aggregator_private_attrs.py b/openfl-workspace/experimental/vertical_fl_two_party/src/aggregator_private_attrs.py new file mode 100644 index 0000000000..ab9faefeba --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl_two_party/src/aggregator_private_attrs.py @@ -0,0 +1,34 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import torch +from torchvision import datasets, transforms +from torch import nn, optim + + +hidden_sizes = [128, 640] +output_size = 10 +batch_size = 2048 + +transform = transforms.Compose([ + transforms.ToTensor(), + transforms.Normalize((0.5,), (0.5,)), +]) +trainset = datasets.MNIST('mnist', download=True, train=True, transform=transform) + +train_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True) + +label_model = nn.Sequential( + nn.Linear(hidden_sizes[1], output_size), + nn.LogSoftmax(dim=1) +) + +label_model_optimizer = optim.SGD(label_model.parameters(), lr=0.03) + + +def aggregator_private_attrs(train_loader, label_model, label_model_optimizer): + return { + "trainloader": train_loader, + "label_model": label_model, + "label_model_optimizer": label_model_optimizer + } diff --git a/openfl-workspace/experimental/vertical_fl_two_party/src/collaborator_private_attrs.py b/openfl-workspace/experimental/vertical_fl_two_party/src/collaborator_private_attrs.py new file mode 100644 index 0000000000..c9755b3b48 --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl_two_party/src/collaborator_private_attrs.py @@ -0,0 +1,36 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from copy import deepcopy +from torch import nn, optim +from torchvision import datasets, transforms +import torch + +input_size = 784 +hidden_sizes = [128, 640] +batch_size = 2048 + +data_model = nn.Sequential( + nn.Linear(input_size, hidden_sizes[0]), + nn.ReLU(), + nn.Linear(hidden_sizes[0], hidden_sizes[1]), + nn.ReLU(), +) + +data_model_optimizer = optim.SGD(data_model.parameters(), lr=0.03) + +transform = transforms.Compose([ + transforms.ToTensor(), + transforms.Normalize((0.5,), (0.5,)), +]) +trainset = datasets.MNIST('mnist', download=True, train=True, transform=transform) + +train_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True) + + +def collaborator_private_attrs(data_model, data_model_optimizer, train_loader): + return { + "data_model": data_model, + "data_model_optimizer": data_model_optimizer, + "trainloader": deepcopy(train_loader) + } diff --git a/openfl-workspace/experimental/vertical_fl_two_party/src/utils.py b/openfl-workspace/experimental/vertical_fl_two_party/src/utils.py new file mode 100644 index 0000000000..1e56f3e68d --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl_two_party/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/openfl-workspace/experimental/vertical_fl_two_party/src/workflow_interface_vertical_fl_two_party.py b/openfl-workspace/experimental/vertical_fl_two_party/src/workflow_interface_vertical_fl_two_party.py new file mode 100644 index 0000000000..5066fbf92e --- /dev/null +++ b/openfl-workspace/experimental/vertical_fl_two_party/src/workflow_interface_vertical_fl_two_party.py @@ -0,0 +1,74 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from openfl.experimental.interface import FLSpec +from openfl.experimental.placement import aggregator, collaborator +from torch import nn, optim + + +class VerticalTwoPartyFlow(FLSpec): + + def __init__(self, batch_num=0, checkpoint: bool = False): + super().__init__(checkpoint) + self.batch_num = batch_num + + @aggregator + def start(self): + self.collaborators = self.runtime.collaborators + print(f'Batch_num = {self.batch_num}') + # 1) Zero the gradients + self.label_model_optimizer.zero_grad() + self.next(self.data_model_forward_pass, foreach='collaborators') + + @collaborator + def data_model_forward_pass(self): + self.data_model_output_local = '' + for idx, (images, _) in enumerate(self.trainloader): + if idx < self.batch_num: + continue + self.data_model_optimizer.zero_grad() + images = images.view(images.shape[0], -1) + model_output = self.data_model(images) + self.data_model_output_local = model_output + self.data_model_output = model_output.detach().requires_grad_() + break + self.next(self.label_model_forward_pass) + + @aggregator + def label_model_forward_pass(self, inputs): + criterion = nn.NLLLoss() + self.grad_to_local = [] + total_loss = 0 + self.data_remaining = False + for idx, (_, labels) in enumerate(self.trainloader): + if idx < self.batch_num: + continue + self.data_remaining = True + pred = self.label_model(inputs[0].data_model_output) + loss = criterion(pred, labels) + loss.backward() + self.grad_to_local = inputs[0].data_model_output.grad.clone() + self.label_model_optimizer.step() + total_loss += loss + break + print(f'Total loss = {total_loss}') + self.next(self.data_model_backprop, foreach='collaborators') + + @collaborator + def data_model_backprop(self): + if self.data_remaining: + self.data_model_optimizer = optim.SGD(self.data_model.parameters(), lr=0.03) + self.data_model_optimizer.zero_grad() + self.data_model_output_local.backward(self.grad_to_local) + self.data_model_optimizer.step() + self.next(self.join) + + @aggregator + def join(self, inputs): + print(f'Join batch_num = {self.batch_num}') + self.batch_num += 1 + self.next(self.end) + + @aggregator + def end(self): + print('This is the end of the flow') diff --git a/openfl-workspace/experimental/workspace/.workspace b/openfl-workspace/experimental/workspace/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/openfl-workspace/experimental/workspace/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/openfl-workspace/experimental/workspace/__init__.py b/openfl-workspace/experimental/workspace/__init__.py new file mode 100644 index 0000000000..f1410b1298 --- /dev/null +++ b/openfl-workspace/experimental/workspace/__init__.py @@ -0,0 +1,3 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""You may copy this file as the starting point of your own model.""" diff --git a/openfl-workspace/experimental/workspace/plan/defaults/aggregator.yaml b/openfl-workspace/experimental/workspace/plan/defaults/aggregator.yaml new file mode 100644 index 0000000000..78f0242dc6 --- /dev/null +++ b/openfl-workspace/experimental/workspace/plan/defaults/aggregator.yaml @@ -0,0 +1 @@ +template : openfl.experimental.component.Aggregator \ No newline at end of file diff --git a/openfl-workspace/experimental/workspace/plan/defaults/collaborator.yaml b/openfl-workspace/experimental/workspace/plan/defaults/collaborator.yaml new file mode 100644 index 0000000000..1c561cf5f5 --- /dev/null +++ b/openfl-workspace/experimental/workspace/plan/defaults/collaborator.yaml @@ -0,0 +1 @@ +template : openfl.experimental.component.Collaborator \ No newline at end of file diff --git a/openfl-workspace/experimental/workspace/plan/defaults/network.yaml b/openfl-workspace/experimental/workspace/plan/defaults/network.yaml new file mode 100644 index 0000000000..07d2e3aeec --- /dev/null +++ b/openfl-workspace/experimental/workspace/plan/defaults/network.yaml @@ -0,0 +1,9 @@ +template: openfl.federation.Network +settings: + agg_addr : auto + agg_port : auto + hash_salt : auto + tls : True + client_reconnect_interval : 5 + disable_client_auth : False + cert_folder : cert diff --git a/openfl-workspace/experimental/workspace/plan/plans/default/base_plan_interactive_api.yaml b/openfl-workspace/experimental/workspace/plan/plans/default/base_plan_interactive_api.yaml new file mode 100644 index 0000000000..06370bd272 --- /dev/null +++ b/openfl-workspace/experimental/workspace/plan/plans/default/base_plan_interactive_api.yaml @@ -0,0 +1,36 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.component.Aggregator + settings : + init_state_path : save/init.pbuf + best_state_path : save/best.pbuf + last_state_path : save/last.pbuf + rounds_to_train : 10 + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.component.Collaborator + settings : + delta_updates : false + opt_treatment : RESET + +data_loader : + defaults : plan/defaults/data_loader.yaml + +task_runner : + template : openfl.federated.task.task_runner.CoreTaskRunner + +network : + defaults : plan/defaults/network.yaml + +assigner : + defaults : plan/defaults/assigner.yaml + +tasks : + defaults : null + +compression_pipeline : + defaults : plan/defaults/compression_pipeline.yaml \ No newline at end of file diff --git a/openfl-workspace/experimental/workspace/plan/plans/default/plan.yaml b/openfl-workspace/experimental/workspace/plan/plans/default/plan.yaml new file mode 100644 index 0000000000..af976f3f43 --- /dev/null +++ b/openfl-workspace/experimental/workspace/plan/plans/default/plan.yaml @@ -0,0 +1,39 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.component.Aggregator + settings : + init_state_path : save/init.pbuf + best_state_path : save/best.pbuf + last_state_path : save/last.pbuf + rounds_to_train : 10 + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.component.Collaborator + settings : + delta_updates : false + opt_treatment : RESET + +data_loader : + defaults : plan/defaults/data_loader.yaml + template : src.tfmnist_inmemory.TensorFlowMNISTInMemory + settings : + collaborator_count : 2 + data_group_name : mnist + batch_size : 256 + +task_runner : + defaults : plan/defaults/task_runner.yaml + template : src.keras_cnn.KerasCNN + +network : + defaults : plan/defaults/network.yaml + +assigner : + defaults : plan/defaults/assigner.yaml + +tasks : + defaults : plan/defaults/tasks_keras.yaml diff --git a/openfl/experimental/component/__init__.py b/openfl/experimental/component/__init__.py new file mode 100644 index 0000000000..6b815db0c7 --- /dev/null +++ b/openfl/experimental/component/__init__.py @@ -0,0 +1,9 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""openfl.experimental.component package.""" + +from .aggregator import Aggregator +from .collaborator import Collaborator + +__all__ = ["Aggregator", "Collaborator"] diff --git a/openfl/experimental/component/aggregator/__init__.py b/openfl/experimental/component/aggregator/__init__.py new file mode 100644 index 0000000000..34e42f18f2 --- /dev/null +++ b/openfl/experimental/component/aggregator/__init__.py @@ -0,0 +1,8 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""openfl.experimental.component.aggregator package.""" + +from .aggregator import Aggregator + +__all__ = ["Aggregator",] diff --git a/openfl/experimental/component/aggregator/aggregator.py b/openfl/experimental/component/aggregator/aggregator.py new file mode 100644 index 0000000000..977753a26b --- /dev/null +++ b/openfl/experimental/component/aggregator/aggregator.py @@ -0,0 +1,519 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""Experimental Aggregator module.""" +import time +import queue +import pickle +import inspect +from threading import Event +from logging import getLogger +from typing import Any, Callable +from typing import Dict, List, Tuple + +from openfl.experimental.utilities import aggregator_to_collaborator +from openfl.experimental.runtime import FederatedRuntime +from openfl.experimental.utilities import checkpoint +from openfl.experimental.utilities.metaflow_utils import MetaflowInterface + + +class Aggregator: + r"""An Aggregator is the central node in federated learning. + + Args: + aggregator_uuid (str): Aggregation ID. + federation_uuid (str): Federation ID. + authorized_cols (list of str): The list of IDs of enrolled collaborators. + + flow (Any): Flow class. + rounds_to_train (int): External loop rounds. + checkpoint (bool): Whether to save checkpoint or noe (default=False). + private_attrs_callable (Callable): Function for Aggregator private attriubtes + (default=None). + private_attrs_kwargs (Dict): Arguments to call private_attrs_callable (default={}). + + Returns: + None + """ + + def __init__( + self, + aggregator_uuid: str, + federation_uuid: str, + authorized_cols: List, + + flow: Any, + rounds_to_train: int = 1, + checkpoint: bool = False, + private_attributes_callable: Callable = None, + private_attributes_kwargs: Dict = {}, + + single_col_cert_common_name: str = None, + + log_metric_callback: Callable = None, + **kwargs) -> None: + + self.logger = getLogger(__name__) + + self.single_col_cert_common_name = single_col_cert_common_name + if self.single_col_cert_common_name is not None: + self._log_big_warning() + else: + # FIXME: "" instead of None is just for protobuf compatibility. + # Cleaner solution? + self.single_col_cert_common_name = "" + + self.log_metric_callback = log_metric_callback + if log_metric_callback is not None: + self.log_metric = log_metric_callback + self.logger.info(f"Using custom log metric: {self.log_metric}") + + self.uuid = aggregator_uuid + self.federation_uuid = federation_uuid + self.authorized_cols = authorized_cols + + self.rounds_to_train = rounds_to_train + self.current_round = 1 + self.collaborators_counter = 0 + self.quit_job_sent_to = [] + self.time_to_quit = False + + # Event to inform aggregator that collaborators have sent the results + self.collaborator_task_results = Event() + # A queue for each task + self.__collaborator_tasks_queue = {collab: queue.Queue() for collab + in self.authorized_cols} + + self.flow = flow + self.checkpoint = checkpoint + self.flow._foreach_methods = [] + self.logger.info("MetaflowInterface creation.") + self.flow._metaflow_interface = MetaflowInterface( + self.flow.__class__, "single_process" + ) + self.flow._run_id = self.flow._metaflow_interface.create_run() + self.flow.runtime = FederatedRuntime() + self.flow.runtime.aggregator = "aggregator" + self.flow.runtime.collaborators = self.authorized_cols + + self.__private_attrs_callable = private_attributes_callable + self.__private_attrs = {} + self.connected_collaborators = [] + self.tasks_sent_to_collaborators = 0 + self.collaborator_results_received = [] + + if self.__private_attrs_callable is not None: + self.logger.info("Initializing aggregator private attributes...") + self.__initialize_private_attributes(private_attributes_kwargs) + + def __initialize_private_attributes(self, kwargs: Dict) -> None: + """ + Call private_attrs_callable function set + attributes to self.__private_attrs. + """ + self.__private_attrs = self.__private_attrs_callable( + **kwargs + ) + + def __set_attributes_to_clone(self, clone: Any) -> None: + """ + Set private_attrs to clone as attributes. + """ + if len(self.__private_attrs) > 0: + for name, attr in self.__private_attrs.items(): + setattr(clone, name, attr) + + def __delete_agg_attrs_from_clone(self, clone: Any, replace_str: str = None) -> None: + """ + Remove aggregator private attributes from FLSpec clone before + transition from Aggregator step to collaborator steps. + """ + # Update aggregator private attributes by taking latest + # parameters from clone, then delete attributes from clone. + if len(self.__private_attrs) > 0: + for attr_name in self.__private_attrs: + if hasattr(clone, attr_name): + self.__private_attrs.update({attr_name: getattr(clone, attr_name)}) + if replace_str: + setattr(clone, attr_name, replace_str) + else: + delattr(clone, attr_name) + + def _log_big_warning(self) -> None: + """Warn user about single collaborator cert mode.""" + self.logger.warning( + f"\n{the_dragon}\nYOU ARE RUNNING IN SINGLE COLLABORATOR CERT MODE! THIS IS" + f" NOT PROPER PKI AND " + f"SHOULD ONLY BE USED IN DEVELOPMENT SETTINGS!!!! YE HAVE BEEN" + f" WARNED!!!" + ) + + @staticmethod + def _get_sleep_time() -> int: + """ + Sleep 10 seconds. + + Returns: + sleep_time: int + """ + return 10 + + def run_flow(self) -> None: + """ + Start the execution and run flow until transition. + """ + # Start function will be the first step if any flow + f_name = "start" + + self.logger.info(f"Starting round {self.current_round}...") + while True: + next_step = self.do_task(f_name) + + if self.time_to_quit: + self.logger.info("Experiment Completed.") + self.quit_job_sent_to = self.authorized_cols + break + + # Prepare queue for collaborator task, with clones + for k, v in self.__collaborator_tasks_queue.items(): + if k in self.selected_collaborators: + v.put((next_step, self.clones_dict[k])) + else: + self.logger.info(f"Tasks will not be sent to {k}") + + while not self.collaborator_task_results.is_set(): + len_sel_collabs = len(self.selected_collaborators) + len_connected_collabs = len(self.connected_collaborators) + if len_connected_collabs < len_sel_collabs: + # Waiting for collaborators to connect. + self.logger.info("Waiting for " + + f"{len_connected_collabs}/{len_sel_collabs}" + + " collaborators to connect...") + elif self.tasks_sent_to_collaborators != len_sel_collabs: + self.logger.info("Waiting for " + + f"{self.tasks_sent_to_collaborators}/{len_sel_collabs}" + + " to make requests for tasks...") + else: + # Waiting for selected collaborators to send the results. + self.logger.info("Waiting for " + + f"{self.collaborators_counter}/{len_sel_collabs}" + + " collaborators to send results...") + time.sleep(Aggregator._get_sleep_time()) + + self.collaborator_task_results.clear() + f_name = self.next_step + if hasattr(self, "instance_snapshot"): + self.flow.restore_instance_snapshot(self.flow, list(self.instance_snapshot)) + delattr(self, "instance_snapshot") + + def call_checkpoint(self, ctx: Any, f: Callable, stream_buffer: bytes = None) -> None: + """ + Perform checkpoint task. + + Args: + ctx (FLSpec / bytes): Collaborator FLSpec object for which checkpoint is to be + performed. + f (Callable / bytes): Collaborator Step (Function) which is to be checkpointed. + stream_buffer (bytes): Captured object for output and error (default=None). + reserved_attributes (List[str]): List of attribute names which is to be excluded + from checkpoint (default=[]). + + Returns: + None + """ + if self.checkpoint: + from openfl.experimental.interface import ( + FLSpec, + ) + + # Check if arguments are pickled, if yes then unpickle + if not isinstance(ctx, FLSpec): + ctx = pickle.loads(ctx) + # Updating metaflow interface object + ctx._metaflow_interface = self.flow._metaflow_interface + if not isinstance(f, Callable): + f = pickle.loads(f) + if isinstance(stream_buffer, bytes): + # Set stream buffer as function parameter + setattr(f.__func__, "_stream_buffer", pickle.loads(stream_buffer)) + + checkpoint(ctx, f) + + def get_tasks(self, collaborator_name: str) -> Tuple: + """ + RPC called by a collaborator to determine which tasks to perform. + Tasks will only be sent to selected collaborators. + + Args: + collaborator_name (str): Collaborator name which requested tasks. + + Returns: + next_step (str): Next function to be executed by collaborator + clone_bytes (bytes): Function execution context for collaborator + """ + # If requesting collaborator is not registered as connected collaborator, + # then register it + if collaborator_name not in self.connected_collaborators: + self.logger.info(f"Collaborator {collaborator_name} is connected.") + self.connected_collaborators.append(collaborator_name) + + self.logger.debug( + f"Aggregator GetTasks function reached from collaborator {collaborator_name}..." + ) + + # If queue of requesting collaborator is empty + while self.__collaborator_tasks_queue[collaborator_name].qsize() == 0: + # If it is time to then inform the collaborator + if self.time_to_quit: + self.logger.info( + f"Sending signal to collaborator {collaborator_name} to shutdown...") + # FIXME: 0, and "" instead of None is just for protobuf compatibility. + # Cleaner solution? + return 0, "", None, Aggregator._get_sleep_time(), self.time_to_quit + + # If not time to quit then sleep for 10 seconds + time.sleep(Aggregator._get_sleep_time()) + + # Get collaborator step, and clone for requesting collaborator + next_step, clone = self.__collaborator_tasks_queue[ + collaborator_name].get() + + self.tasks_sent_to_collaborators += 1 + self.logger.info("Sending tasks to collaborator" + + f" {collaborator_name} for round {self.current_round}...") + return self.current_round, next_step, pickle.dumps(clone), 0, self.time_to_quit + + def do_task(self, f_name: str) -> Any: + """ + Execute aggregator steps until transition. + + Args: + f_name (str): Aggregator step + + Returns: + string / None: Next collaborator function or None end of the flow. + """ + # Set aggregator private attributes to flow object + self.__set_attributes_to_clone(self.flow) + + not_at_transition_point = True + # Run a loop to execute flow steps until not_at_transition_point is False + while not_at_transition_point: + f = getattr(self.flow, f_name) + # Get the list of parameters of function f + args = inspect.signature(f)._parameters + + if f.__name__ == "end": + f() + # Take the checkpoint of "end" step + self.__delete_agg_attrs_from_clone(self.flow, "Private attributes: Not Available.") + self.call_checkpoint(self.flow, f) + self.__set_attributes_to_clone(self.flow) + # Check if all rounds of external loop is executed + if self.current_round is self.rounds_to_train: + # All rounds execute, it is time to quit + self.time_to_quit = True + # It is time to quit - Break the loop + not_at_transition_point = False + # Start next round of execution + else: + self.current_round += 1 + self.logger.info(f"Starting round {self.current_round}...") + f_name = "start" + continue + + selected_clones = () + # If function requires arguments then it is join step of the flow + if len(args) > 0: + # Check if total number of collaborators and number of selected collaborators + # are the same + if len(self.selected_collaborators) != len(self.clones_dict): + # Create list of selected collaborator clones + selected_clones = ([],) + for name, clone in self.clones_dict.items(): + # Check if collaboraotr is in the list of selected collaborators + if name in self.selected_collaborators: + selected_clones[0].append(clone) + else: + # Number of selected collaborators, and number of total collaborators + # are same + selected_clones = (list(self.clones_dict.values()),) + # Call the join function with selected collaborators + # clones are arguments + f(*selected_clones) + + self.__delete_agg_attrs_from_clone(self.flow, "Private attributes: Not Available.") + # Take the checkpoint of executed step + self.call_checkpoint(self.flow, f) + self.__set_attributes_to_clone(self.flow) + + # Next function in the flow + _, f, parent_func = self.flow.execute_task_args[:3] + f_name = f.__name__ + + self.flow._display_transition_logs(f, parent_func) + + # Transition check + if aggregator_to_collaborator(f, parent_func): + # Extract clones, instance snapshot and kwargs when reached + # foreach loop first time + if len(self.flow.execute_task_args) > 4: + temp = self.flow.execute_task_args[3:] + self.clones_dict, self.instance_snapshot, self.kwargs = temp + + self.selected_collaborators = getattr(self.flow, self.kwargs["foreach"]) + else: + self.kwargs = self.flow.execute_task_args[3] + + # Transition encountered, break the loop + not_at_transition_point = False + + # Delete private attributes from flow object + self.__delete_agg_attrs_from_clone(self.flow) + + return f_name if f_name != "end" else None + + def send_task_results(self, collab_name: str, round_number: int, next_step: str, + clone_bytes: bytes) -> None: + """ + After collaborator execution, collaborator will call this function via gRPc + to send next function. + + Args: + collab_name (str): Collaborator name which is sending results + round_number (int): Round number for which collaborator is sending results + next_step (str): Next aggregator step in the flow + clone_bytes (bytes): Collaborator FLSpec object + + Returns: + None + """ + # Log a warning if collaborator is sending results for old round + if round_number is not self.current_round: + self.logger.warning( + f"Collaborator {collab_name} is reporting results" + f" for the wrong round: {round_number}. Ignoring..." + ) + else: + self.logger.info( + f"Collaborator {collab_name} sent task results" + f" for round {round_number}." + ) + # Unpickle the clone (FLSpec object) + clone = pickle.loads(clone_bytes) + # Update the clone in clones_dict dictionary + self.clones_dict[clone.input] = clone + self.next_step = next_step[0] + + self.collaborators_counter += 1 + # If selected collaborator have sent the results + if self.collaborators_counter is len(self.selected_collaborators): + self.collaborators_counter = 0 + # Set the event to inform aggregator to resume the flow execution + self.collaborator_task_results.set() + # Empty tasks_sent_to_collaborators list for next time. + if self.tasks_sent_to_collaborators == len(self.selected_collaborators): + self.tasks_sent_to_collaborators = 0 + + def valid_collaborator_cn_and_id(self, cert_common_name: str, + collaborator_common_name: str) -> bool: + """ + Determine if the collaborator certificate and ID are valid for this federation. + + Args: + cert_common_name: Common name for security certificate + collaborator_common_name: Common name for collaborator + + Returns: + bool: True means the collaborator common name matches the name in + the security certificate. + """ + # if self.test_mode_whitelist is None, then the common_name must + # match collaborator_common_name and be in authorized_cols + # FIXME: "" instead of None is just for protobuf compatibility. + # Cleaner solution? + if self.single_col_cert_common_name == "": + return (cert_common_name == collaborator_common_name + and collaborator_common_name in self.authorized_cols) + # otherwise, common_name must be in whitelist and + # collaborator_common_name must be in authorized_cols + else: + return (cert_common_name == self.single_col_cert_common_name + and collaborator_common_name in self.authorized_cols) + + def all_quit_jobs_sent(self) -> bool: + """Assert all quit jobs are sent to collaborators.""" + return set(self.quit_job_sent_to) == set(self.authorized_cols) + + +the_dragon = """ + + ,@@.@@+@@##@,@@@@.`@@#@+ *@@@@ #@##@ `@@#@# @@@@@ @@ @@@@` #@@@ :@@ `@#`@@@#.@ + @@ #@ ,@ +. @@.@* #@ :` @+*@ .@`+. @@ *@::@`@@ @@# @@ #`;@`.@@ @@@`@`#@* +:@` + @@@@@ ,@@@ @@@@ +@@+ @@@@ .@@@ @@ .@+:@@@: .;+@` @@ ,;,#@` @@ @@@@@ ,@@@* @ + @@ #@ ,@`*. @@.@@ #@ ,; `@+,@#.@.*` @@ ,@::@`@@` @@@@# @@`:@;*@+ @@ @`:@@`@ *@@ ` + .@@`@@,+@+;@.@@ @@`@@;*@ ;@@#@:*@+;@ `@@;@@ #@**@+;@ `@@:`@@@@ @@@@.`@+ .@ +@+@*,@ + `` `` ` `` . ` ` ` ` ` .` ` `` `` `` ` . ` + + + + .** + ;` `****: + @**`******* + *** +***********; + ,@***;` .*:,;************ + ;***********@@*********** + ;************************, + `************************* + ************************* + ,************************ + **#********************* + *@****` :**********; + +**; .********. + ;*; `*******#: `,: + ****@@@++:: ,,;***. + *@@@**;#;: +: **++*, + @***#@@@: +*; ,**** + @*@+**** ***` ****, + ,@#******. , **** **;,**. + * ******** :, ;*:*+ ** :,** + # ********:: *,.*:**` * ,*; + . *********: .+,*:;*: : `:** + ; :********: ***::** ` ` ** + + :****::*** , *;;::**` :* + `` .****::;**::: *;::::*; ;* + * *****::***:. **::::** ;: + # *****;:**** ;*::;*** ,*` + ; ************` ,**:****; ::* + : *************;:;*;*++: *. + : *****************;* `* + `. `*****************; : *. + .` .*+************+****;: :* + `. :;+***********+******;` : .,* + ; ::*+*******************. `:: .`:. + + :::**********************;;:` * + + ,::;*************;:::*******. * + # `:::+*************:::;******** :, * + @ :::***************;:;*********;:, * + @ ::::******:*********************: ,:* + @ .:::******:;*********************, :* + # :::******::******###@*******;;**** *, + # .::;*****::*****#****@*****;:::***; `` ** + * ::;***********+*****+#******::*****,,,,** + : :;***********#******#****************** + .` `;***********#******+****+************ + `, ***#**@**+***+*****+**************;` + ; *++**#******#+****+` `.,.. + + `@***#*******#****# + + +***@********+**+: + * .+**+;**;;;**;#**# + ,` ****@ +*+: + # +**+ :+** + @ ;**+, ,***+ + # #@+**** *#****+ + `; @+***+@ `#**+#++ + # #*#@##, .++:.,# + `* @# +. + @@@ + # `@ + , """ diff --git a/openfl/experimental/component/collaborator/__init__.py b/openfl/experimental/component/collaborator/__init__.py new file mode 100644 index 0000000000..29ce6da9d3 --- /dev/null +++ b/openfl/experimental/component/collaborator/__init__.py @@ -0,0 +1,8 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""openfl.experimental.component.collaborator package.""" + +from .collaborator import Collaborator + +__all__ = ["Collaborator",] diff --git a/openfl/experimental/component/collaborator/collaborator.py b/openfl/experimental/component/collaborator/collaborator.py new file mode 100644 index 0000000000..65e6210ca4 --- /dev/null +++ b/openfl/experimental/component/collaborator/collaborator.py @@ -0,0 +1,224 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""Experimental Collaborator module.""" +import time +import pickle + +from typing import Any, Callable +from typing import Dict, Tuple +from logging import getLogger + + +class Collaborator: + r"""The Collaborator object class. + + Args: + collaborator_name (str): The common name for the collaborator. + aggregator_uuid (str): The unique id for the client. + federation_uuid (str): The unique id for the federation. + + client (AggregatorGRPCClient): GRPC Client to connect to + Aggregator Server. + + private_attrs_callable (Callable): Function for Collaborator + private attriubtes. + private_attrs_kwargs (Dict): Arguments to call private_attrs_callable. + + Note: + \* - Plan setting. + """ + def __init__(self, + collaborator_name: str, + aggregator_uuid: str, + federation_uuid: str, + client: Any, + private_attributes_callable: Any = None, + private_attributes_kwargs: Dict = {}, + **kwargs) -> None: + + self.name = collaborator_name + self.aggregator_uuid = aggregator_uuid + self.federation_uuid = federation_uuid + + self.client = client + + self.logger = getLogger(__name__) + + self.__private_attrs_callable = private_attributes_callable + + self.__private_attrs = {} + if self.__private_attrs_callable is not None: + self.logger.info("Initializing collaborator.") + self.__initialize_private_attributes(private_attributes_kwargs) + + def __initialize_private_attributes(self, kwrags: Dict) -> None: + """ + Call private_attrs_callable function set + attributes to self.__private_attrs + + Args: + kwargs (Dict): Private attributes callable function arguments + + Returns: + None + """ + self.__private_attrs = self.__private_attrs_callable( + **kwrags + ) + + def __set_attributes_to_clone(self, clone: Any) -> None: + """ + Set private_attrs to clone as attributes. + + Args: + clone (FLSpec): Clone to which private attributes are to be + set + + Returns: + None + """ + if len(self.__private_attrs) > 0: + for name, attr in self.__private_attrs.items(): + setattr(clone, name, attr) + + def __delete_agg_attrs_from_clone(self, clone: Any, replace_str: str = None) -> None: + """ + Remove aggregator private attributes from FLSpec clone before + transition from Aggregator step to collaborator steps + + Args: + clone (FLSpec): Clone from which private attributes are to be + removed + + Returns: + None + """ + # Update aggregator private attributes by taking latest + # parameters from clone, then delete attributes from clone. + if len(self.__private_attrs) > 0: + for attr_name in self.__private_attrs: + if hasattr(clone, attr_name): + self.__private_attrs.update({attr_name: getattr(clone, attr_name)}) + if replace_str: + setattr(clone, attr_name, replace_str) + else: + delattr(clone, attr_name) + + def call_checkpoint(self, ctx: Any, f: Callable, stream_buffer: Any) -> None: + """ + Call checkpoint gRPC. + + Args: + ctx (FLSpec): FLSPec object. + f (Callable): Flow step which is be checkpointed. + stream_buffer (Any): Captured object for output and error. + + Returns: + None + """ + self.client.call_checkpoint( + self.name, + pickle.dumps(ctx), pickle.dumps(f), pickle.dumps(stream_buffer) + ) + + def run(self) -> None: + """ + Run the collaborator. + + Args: + None + + Returns: + None + """ + while True: + next_step, clone, sleep_time, time_to_quit = self.get_tasks() + if time_to_quit: + break + elif sleep_time > 0: + time.sleep(sleep_time) + else: + self.logger.info(f"Received the following tasks: {next_step}.") + f_name, ctx = self.do_task(next_step, clone) + self.send_task_results(f_name, ctx) + + self.logger.info("End of Federation reached. Exiting...") + + def send_task_results(self, next_step: str, clone: Any) -> None: + """ + After collaborator is executed, send next aggregator + step to Aggregator for continue execution. + + Args: + next_step (str): Send next function to aggregator + clone (FLSpec): Updated clone object (Private attributes atr not included) + + Returns: + None + """ + self.logger.info(f"Round {self.round_number}," + f" collaborator {self.name} is sending results...") + self.client.send_task_results( + self.name, self.round_number, + next_step, pickle.dumps(clone) + ) + + def get_tasks(self) -> Tuple: + """ + Get tasks from the aggregator. + + Args: + None + + Returns: + next_step (str): Next collaborator function to start execution from + ctx (FLSpec): Function context + sleep_time (int): Sleep for given seconds if not ready yet + time_to_quit (bool): True if end of reached + """ + self.logger.info("Waiting for tasks...") + temp = self.client.get_tasks(self.name) + self.round_number, next_step, clone_bytes, sleep_time, time_to_quit = temp + + return next_step, pickle.loads(clone_bytes), sleep_time, time_to_quit + + def do_task(self, f_name: str, ctx: Any) -> Tuple: + """ + Run collaborator steps until transition. + + Args: + f_name (str): Function name which is to be executed. + ctx (FLSpec): Function context. + + Returns: + Tuple(str, FLSpec): Next aggregator function, and updated context. + """ + # Set private attributes to context + self.__set_attributes_to_clone(ctx) + + # Loop control variable + not_at_transition_point = True + while not_at_transition_point: + f = getattr(ctx, f_name) + f() + # Checkpoint the function + self.__delete_agg_attrs_from_clone(ctx, "Private attributes: Not Available.") + self.call_checkpoint(ctx, f, f._stream_buffer) + self.__set_attributes_to_clone(ctx) + + _, f, parent_func = ctx.execute_task_args[:3] + # Display transition logs if transition + ctx._display_transition_logs(f, parent_func) + + # If transition break the loop + if ctx._is_at_transition_point(f, parent_func): + not_at_transition_point = False + + # Update the function name + f_name = f.__name__ + + # Reomve private attributes from context + self.__delete_agg_attrs_from_clone(ctx) + + return f_name, ctx diff --git a/openfl/experimental/federated/__init__.py b/openfl/experimental/federated/__init__.py new file mode 100644 index 0000000000..77a79d67f1 --- /dev/null +++ b/openfl/experimental/federated/__init__.py @@ -0,0 +1,8 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""openfl.experimental.federated package.""" + +from .plan import Plan # NOQA + +__all__ = ["Plan"] diff --git a/openfl/experimental/federated/plan/__init__.py b/openfl/experimental/federated/plan/__init__.py new file mode 100644 index 0000000000..eb1f085d43 --- /dev/null +++ b/openfl/experimental/federated/plan/__init__.py @@ -0,0 +1,8 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""Experimental Plan package.""" + +from .plan import Plan + +__all__ = ['Plan',] diff --git a/openfl/experimental/federated/plan/plan.py b/openfl/experimental/federated/plan/plan.py new file mode 100644 index 0000000000..4fcda43703 --- /dev/null +++ b/openfl/experimental/federated/plan/plan.py @@ -0,0 +1,468 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""Plan module.""" +import inspect +from hashlib import sha384 +from importlib import import_module +from logging import getLogger +from os.path import splitext +from pathlib import Path + +from yaml import dump +from yaml import safe_load +from yaml import SafeDumper + +from openfl.experimental.interface.cli.cli_helper import WORKSPACE +from openfl.experimental.transport import AggregatorGRPCClient +from openfl.experimental.transport import AggregatorGRPCServer +from openfl.utilities.utils import getfqdn_env + +SETTINGS = "settings" +TEMPLATE = "template" +DEFAULTS = "defaults" +AUTO = "auto" + + +class Plan: + """Federated Learning plan.""" + + logger = getLogger(__name__) + + @staticmethod + def load(yaml_path: Path, default: dict = None): + """Load the plan from YAML file.""" + if default is None: + default = {} + if yaml_path and yaml_path.exists(): + return safe_load(yaml_path.read_text()) + return default + + @staticmethod + def dump(yaml_path, config, freeze=False): + """Dump the plan config to YAML file.""" + + class NoAliasDumper(SafeDumper): + def ignore_aliases(self, data): + return True + + if freeze: + plan = Plan() + plan.config = config + frozen_yaml_path = Path( + f"{yaml_path.parent}/{yaml_path.stem}_{plan.hash[:8]}.yaml" + ) + if frozen_yaml_path.exists(): + Plan.logger.info(f"{yaml_path.name} is already frozen") + return + frozen_yaml_path.write_text(dump(config)) + frozen_yaml_path.chmod(0o400) + Plan.logger.info(f"{yaml_path.name} frozen successfully") + else: + yaml_path.write_text(dump(config)) + + @staticmethod + def parse( + plan_config_path: Path, + cols_config_path: Path = None, + data_config_path: Path = None, + resolve=True, + ): + """ + Parse the Federated Learning plan. + + Args: + plan_config_path (string): The filepath to the federated learning + plan + cols_config_path (string): The filepath to the federation + collaborator list [optional] + data_config_path (string): The filepath to the federation + collaborator data configuration + [optional] + Returns: + A federated learning plan object + """ + try: + plan = Plan() + plan.config = Plan.load(plan_config_path) # load plan configuration + plan.name = plan_config_path.name + plan.files = [plan_config_path] # collect all the plan files + + # ensure 'settings' appears in each top-level section + for section in plan.config.keys(): + if plan.config[section].get(SETTINGS) is None: + plan.config[section][SETTINGS] = {} + + # walk the top level keys and load 'defaults' in sorted order + for section in sorted(plan.config.keys()): + defaults = plan.config[section].pop(DEFAULTS, None) + + if defaults is not None: + defaults = WORKSPACE / "workspace" / defaults + + plan.files.append(defaults) + + if resolve: + Plan.logger.info( + f"Loading DEFAULTS for section [red]{section}[/] " + f"from file [red]{defaults}[/].", + extra={"markup": True}, + ) + + defaults = Plan.load(Path(defaults)) + + if SETTINGS in defaults: + # override defaults with section settings + defaults[SETTINGS].update(plan.config[section][SETTINGS]) + plan.config[section][SETTINGS] = defaults[SETTINGS] + + defaults.update(plan.config[section]) + + plan.config[section] = defaults + + plan.authorized_cols = Plan.load(cols_config_path).get("collaborators", []) + + if resolve: + plan.resolve() + + Plan.logger.info( + f"Parsing Federated Learning Plan : [green]SUCCESS[/] : " + f"[blue]{plan_config_path}[/].", + extra={"markup": True}, + ) + Plan.logger.info(dump(plan.config)) + + return plan + + except Exception: + Plan.logger.exception( + f"Parsing Federated Learning Plan : " + f"[red]FAILURE[/] : [blue]{plan_config_path}[/].", + extra={"markup": True}, + ) + raise + + @staticmethod + def accept_args(cls): + """ + Determines whether a class's constructor (__init__ method) accepts + variable positional arguments (*args). + + Returns: + Boolean: True or False + """ + init_signature = inspect.signature(cls.__init__) + for param in init_signature.parameters.values(): + if param.kind == param.VAR_POSITIONAL: + return True + return False + + @staticmethod + def build(template, settings, **override): + """ + Create an instance of a openfl Component or Federated DataLoader/TaskRunner. + + Args: + template: Fully qualified class template path + settings: Keyword arguments to class constructor + + Returns: + A Python object + """ + class_name = splitext(template)[1].strip(".") + module_path = splitext(template)[0] + + Plan.logger.info( + f"Building [red]🡆[/] Object [red]{class_name}[/] " + f"from [red]{module_path}[/] Module.", + extra={"markup": True}, + ) + Plan.logger.debug(f"Settings [red]🡆[/] {settings}", extra={"markup": True}) + Plan.logger.debug(f"Override [red]🡆[/] {override}", extra={"markup": True}) + + settings.update(**override) + module = import_module(module_path) + + if Plan.accept_args(getattr(module, class_name)): + args = list(settings.values()) + instance = getattr(module, class_name)(*args) + else: + instance = getattr(module, class_name)(**settings) + + return instance + + @staticmethod + def import_(template): + """ + Import an instance of a openfl Component or Federated DataLoader/TaskRunner. + + Args: + template: Fully qualified object path + + Returns: + A Python object + """ + class_name = splitext(template)[1].strip(".") + module_path = splitext(template)[0] + Plan.logger.info( + f"Importing [red]🡆[/] Object [red]{class_name}[/] " + f"from [red]{module_path}[/] Module.", + extra={"markup": True}, + ) + module = import_module(module_path) + instance = getattr(module, class_name) + + return instance + + def __init__(self): + """Initialize.""" + self.config = {} # dictionary containing patched plan definition + self.authorized_cols = [] # authorized collaborator list + self.cols_data_paths = {} # collaborator data paths dict + + self.collaborator_ = None # collaborator object + self.aggregator_ = None # aggregator object + + self.server_ = None # gRPC server object + self.client_ = None # gRPC client object + + self.hash_ = None + + @property + def hash(self): # NOQA + """Generate hash for this instance.""" + self.hash_ = sha384(dump(self.config).encode("utf-8")) + Plan.logger.info( + f"FL-Plan hash is [blue]{self.hash_.hexdigest()}[/]", extra={"markup": True} + ) + + return self.hash_.hexdigest() + + def resolve(self): + """Resolve the federation settings.""" + self.federation_uuid = f"{self.name}_{self.hash[:8]}" + self.aggregator_uuid = f"aggregator_{self.federation_uuid}" + + self.rounds_to_train = self.config["aggregator"][SETTINGS]["rounds_to_train"] + + if self.config["network"][SETTINGS]["agg_addr"] == AUTO: + self.config["network"][SETTINGS]["agg_addr"] = getfqdn_env() + + if self.config["network"][SETTINGS]["agg_port"] == AUTO: + self.config["network"][SETTINGS]["agg_port"] = ( + int(self.hash[:8], 16) % (60999 - 49152) + 49152 + ) + + def get_aggregator(self): + """Get federation aggregator.""" + defaults = self.config.get( + "aggregator", { + TEMPLATE: "openfl.experimental.Aggregator", + SETTINGS: {} + } + ) + + defaults[SETTINGS]["aggregator_uuid"] = self.aggregator_uuid + defaults[SETTINGS]["federation_uuid"] = self.federation_uuid + defaults[SETTINGS]["authorized_cols"] = self.authorized_cols + + private_attrs_callable, private_attrs_kwargs = self.get_private_attr("aggregator") + defaults[SETTINGS]["private_attributes_callable"] = private_attrs_callable + defaults[SETTINGS]["private_attributes_kwargs"] = private_attrs_kwargs + + defaults[SETTINGS]["flow"] = self.get_flow() + checkpoint = self.config.get("federated_flow", False) + if not checkpoint: + checkpoint = checkpoint["settings"]["checkpoint"] + defaults[SETTINGS]["checkpoint"] = checkpoint + + log_metric_callback = defaults[SETTINGS].get("log_metric_callback") + if log_metric_callback: + if isinstance(log_metric_callback, dict): + log_metric_callback = Plan.import_(**log_metric_callback) + elif not callable(log_metric_callback): + raise TypeError( + f"log_metric_callback should be callable object " + f"or be import from code part, get {log_metric_callback}" + ) + defaults[SETTINGS]["log_metric_callback"] = log_metric_callback + + if self.aggregator_ is None: + self.aggregator_ = Plan.build(**defaults) + + return self.aggregator_ + + def get_collaborator( + self, + collaborator_name, + root_certificate=None, + private_key=None, + certificate=None, + client=None, + ): + """Get collaborator.""" + defaults = self.config.get( + "collaborator", { + TEMPLATE: "openfl.experimental.Collaborator", + SETTINGS: {} + } + ) + + defaults[SETTINGS]["collaborator_name"] = collaborator_name + defaults[SETTINGS]["aggregator_uuid"] = self.aggregator_uuid + defaults[SETTINGS]["federation_uuid"] = self.federation_uuid + + private_attrs_callable, private_attrs_kwargs = self.get_private_attr(collaborator_name) + + defaults[SETTINGS]["private_attributes_callable"] = private_attrs_callable + defaults[SETTINGS]["private_attributes_kwargs"] = private_attrs_kwargs + + if client is not None: + defaults[SETTINGS]["client"] = client + else: + defaults[SETTINGS]["client"] = self.get_client( + collaborator_name, + self.aggregator_uuid, + self.federation_uuid, + root_certificate, + private_key, + certificate, + ) + + if self.collaborator_ is None: + self.collaborator_ = Plan.build(**defaults) + + return self.collaborator_ + + def get_client( + self, + collaborator_name, + aggregator_uuid, + federation_uuid, + root_certificate=None, + private_key=None, + certificate=None, + ): + """Get gRPC client for the specified collaborator.""" + common_name = collaborator_name + if not root_certificate or not private_key or not certificate: + root_certificate = "cert/cert_chain.crt" + certificate = f"cert/client/col_{common_name}.crt" + private_key = f"cert/client/col_{common_name}.key" + + client_args = self.config["network"][SETTINGS] + + # patch certificates + + client_args["root_certificate"] = root_certificate + client_args["certificate"] = certificate + client_args["private_key"] = private_key + + client_args["aggregator_uuid"] = aggregator_uuid + client_args["federation_uuid"] = federation_uuid + + if self.client_ is None: + self.client_ = AggregatorGRPCClient(**client_args) + + return self.client_ + + def get_server( + self, root_certificate=None, private_key=None, certificate=None, **kwargs + ): + """Get gRPC server of the aggregator instance.""" + common_name = self.config["network"][SETTINGS]["agg_addr"].lower() + + if not root_certificate or not private_key or not certificate: + root_certificate = "cert/cert_chain.crt" + certificate = f"cert/server/agg_{common_name}.crt" + private_key = f"cert/server/agg_{common_name}.key" + + server_args = self.config["network"][SETTINGS] + + # patch certificates + + server_args.update(kwargs) + server_args["root_certificate"] = root_certificate + server_args["certificate"] = certificate + server_args["private_key"] = private_key + + server_args["aggregator"] = self.get_aggregator() + + if self.server_ is None: + self.server_ = AggregatorGRPCServer(**server_args) + + return self.server_ + + def get_flow(self): + """instantiates federated flow object""" + defaults = self.config.get( + "federated_flow", { + TEMPLATE: self.config["federated_flow"]["template"], + SETTINGS: {} + }, + ) + defaults = self.import_kwargs_modules(defaults) + + self.flow_ = Plan.build(**defaults) + return self.flow_ + + def import_kwargs_modules(self, defaults): + def import_nested_settings(settings): + for key, value in settings.items(): + if isinstance(value, dict): + settings[key] = import_nested_settings(value) + elif isinstance(value, str): + class_name = splitext(value)[1].strip(".") + if class_name: + module_path = splitext(value)[0] + try: + if import_module(module_path): + module = import_module(module_path) + value_defaults_data = { + 'template': value, + 'settings': settings.get('settings', {}), + } + attr = getattr(module, class_name) + + if not inspect.isclass(attr): + settings[key] = attr + else: + settings = Plan.build(**value_defaults_data) + except ImportError: + raise ImportError(f"Cannot import {value}.") + return settings + + defaults[SETTINGS] = import_nested_settings(defaults[SETTINGS]) + return defaults + + def get_private_attr(self, private_attr_name=None): + private_attrs_callable = None + private_attrs_kwargs = {} + + import os + from openfl.experimental.federated.plan import Plan + from pathlib import Path + + data_yaml = "plan/data.yaml" + + if os.path.exists(data_yaml) and os.path.isfile(data_yaml): + d = Plan.load(Path(data_yaml).absolute()) + + if d.get(private_attr_name, None): + private_attrs_callable = { + "template": d.get(private_attr_name)["callable_func"]["template"] + } + + private_attrs_kwargs = self.import_kwargs_modules( + d.get(private_attr_name)["callable_func"] + )["settings"] + + if isinstance(private_attrs_callable, dict): + private_attrs_callable = Plan.import_(**private_attrs_callable) + elif not callable(private_attrs_callable): + raise TypeError( + f"private_attrs_callable should be callable object " + f"or be import from code part, get {private_attrs_callable}" + ) + return private_attrs_callable, private_attrs_kwargs + return None, None diff --git a/openfl/experimental/interface/cli/__init__.py b/openfl/experimental/interface/cli/__init__.py new file mode 100644 index 0000000000..6a71732c32 --- /dev/null +++ b/openfl/experimental/interface/cli/__init__.py @@ -0,0 +1,3 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""openfl.experimental.interface.cli package.""" diff --git a/openfl/experimental/interface/cli/aggregator.py b/openfl/experimental/interface/cli/aggregator.py new file mode 100644 index 0000000000..c189cda215 --- /dev/null +++ b/openfl/experimental/interface/cli/aggregator.py @@ -0,0 +1,210 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""Aggregator module.""" + +import sys +import threading +from logging import getLogger + +from click import echo +from click import group +from click import option +from click import pass_context +from click import Path as ClickPath +from click import style + +from openfl.utilities import click_types +from openfl.utilities.path_check import is_directory_traversal +from openfl.utilities.utils import getfqdn_env + +logger = getLogger(__name__) + + +@group() +@pass_context +def aggregator(context): + """Manage Federated Learning Aggregator.""" + context.obj['group'] = 'aggregator' + + +@aggregator.command(name='start') +@option('-p', '--plan', required=False, + help='Federated learning plan [plan/plan.yaml]', + default='plan/plan.yaml', + type=ClickPath(exists=True)) +@option('-c', '--authorized_cols', required=False, + help='Authorized collaborator list [plan/cols.yaml]', + default='plan/cols.yaml', type=ClickPath(exists=True)) +@option('-s', '--secure', required=False, + help='Enable Intel SGX Enclave', is_flag=True, default=False) +def start_(plan, authorized_cols, secure): + """Start the aggregator service.""" + import os + from pathlib import Path + + from openfl.experimental.federated.plan import Plan + + if is_directory_traversal(plan): + echo('Federated learning plan path is out of the openfl workspace scope.') + sys.exit(1) + if is_directory_traversal(authorized_cols): + echo('Authorized collaborator list file path is out of the openfl workspace scope.') + sys.exit(1) + + plan = Plan.parse(plan_config_path=Path(plan).absolute(), + cols_config_path=Path(authorized_cols).absolute()) + + if not os.path.exists('plan/data.yaml'): + logger.warning( + 'Aggregator private attributes are set to None as plan/data.yaml not found' + + ' in workspace.') + else: + import yaml + from yaml.loader import SafeLoader + with open('plan/data.yaml', 'r') as f: + data = yaml.load(f, Loader=SafeLoader) + if data.get("aggregator", None) is None: + logger.warning( + 'Aggregator private attributes are set to None as no aggregator' + + ' attributes found in plan/data.yaml.') + + logger.info('🧿 Starting the Aggregator Service.') + + agg_server = plan.get_server() + agg_server.is_server_started = False + agg_grpc_server = threading.Thread(target=agg_server.serve) + agg_grpc_server.start() + + while True: + if agg_server.is_server_started: + plan.aggregator_.run_flow() + break + + +@aggregator.command(name='generate-cert-request') +@option('--fqdn', required=False, type=click_types.FQDN, + help=f'The fully qualified domain name of' + f' aggregator node [{getfqdn_env()}]', + default=getfqdn_env()) +def _generate_cert_request(fqdn): + generate_cert_request(fqdn) + + +def generate_cert_request(fqdn): + """Create aggregator certificate key pair.""" + from openfl.cryptography.participant import generate_csr + from openfl.cryptography.io import write_crt + from openfl.cryptography.io import write_key + from openfl.cryptography.io import get_csr_hash + from openfl.experimental.interface.cli.cli_helper import CERT_DIR + + if fqdn is None: + fqdn = getfqdn_env() + + common_name = f'{fqdn}'.lower() + subject_alternative_name = f'DNS:{common_name}' + file_name = f'agg_{common_name}' + + echo(f'Creating AGGREGATOR certificate key pair with following settings: ' + f'CN={style(common_name, fg="red")},' + f' SAN={style(subject_alternative_name, fg="red")}') + + server_private_key, server_csr = generate_csr(common_name, server=True) + + (CERT_DIR / 'server').mkdir(parents=True, exist_ok=True) + + echo(' Writing AGGREGATOR certificate key pair to: ' + style( + f'{CERT_DIR}/server', fg='green')) + + # Print csr hash before writing csr to disk + csr_hash = get_csr_hash(server_csr) + echo('The CSR Hash ' + style(f'{csr_hash}', fg='red')) + + # Write aggregator csr and key to disk + write_crt(server_csr, CERT_DIR / 'server' / f'{file_name}.csr') + write_key(server_private_key, CERT_DIR / 'server' / f'{file_name}.key') + + +@aggregator.command(name='certify') +@option('-n', '--fqdn', type=click_types.FQDN, + help=f'The fully qualified domain name of aggregator node [{getfqdn_env()}]', + default=getfqdn_env()) +@option('-s', '--silent', help='Do not prompt', is_flag=True) +def _certify(fqdn, silent): + certify(fqdn, silent) + + +def certify(fqdn, silent): + """Sign/certify the aggregator certificate key pair.""" + from pathlib import Path + + from click import confirm + + from openfl.cryptography.ca import sign_certificate + from openfl.cryptography.io import read_crt + from openfl.cryptography.io import read_csr + from openfl.cryptography.io import read_key + from openfl.cryptography.io import write_crt + from openfl.experimental.interface.cli.cli_helper import CERT_DIR + + if fqdn is None: + fqdn = getfqdn_env() + + common_name = f'{fqdn}'.lower() + file_name = f'agg_{common_name}' + cert_name = f'server/{file_name}' + signing_key_path = 'ca/signing-ca/private/signing-ca.key' + signing_crt_path = 'ca/signing-ca.crt' + + # Load CSR + csr_path_absolute_path = Path(CERT_DIR / f'{cert_name}.csr').absolute() + if not csr_path_absolute_path.exists(): + echo(style('Aggregator certificate signing request not found.', fg='red') + + ' Please run `fx aggregator generate-cert-request`' + ' to generate the certificate request.') + + csr, csr_hash = read_csr(csr_path_absolute_path) + + # Load private signing key + private_sign_key_absolute_path = Path(CERT_DIR / signing_key_path).absolute() + if not private_sign_key_absolute_path.exists(): + echo(style('Signing key not found.', fg='red') + + ' Please run `fx workspace certify`' + ' to initialize the local certificate authority.') + + signing_key = read_key(private_sign_key_absolute_path) + + # Load signing cert + signing_crt_absolute_path = Path(CERT_DIR / signing_crt_path).absolute() + if not signing_crt_absolute_path.exists(): + echo(style('Signing certificate not found.', fg='red') + + ' Please run `fx workspace certify`' + ' to initialize the local certificate authority.') + + signing_crt = read_crt(signing_crt_absolute_path) + + echo('The CSR Hash for file ' + + style(f'{cert_name}.csr', fg='green') + + ' = ' + + style(f'{csr_hash}', fg='red')) + + crt_path_absolute_path = Path(CERT_DIR / f'{cert_name}.crt').absolute() + + if silent: + echo(' Warning: manual check of certificate hashes is bypassed in silent mode.') + echo(' Signing AGGREGATOR certificate') + signed_agg_cert = sign_certificate(csr, signing_key, signing_crt.subject) + write_crt(signed_agg_cert, crt_path_absolute_path) + + else: + echo('Make sure the two hashes above are the same.') + if confirm('Do you want to sign this certificate?'): + + echo(' Signing AGGREGATOR certificate') + signed_agg_cert = sign_certificate(csr, signing_key, signing_crt.subject) + write_crt(signed_agg_cert, crt_path_absolute_path) + + else: + echo(style('Not signing certificate.', fg='red') + + ' Please check with this AGGREGATOR to get the correct' + ' certificate for this federation.') diff --git a/openfl/experimental/interface/cli/cli_helper.py b/openfl/experimental/interface/cli/cli_helper.py new file mode 100644 index 0000000000..e552b17209 --- /dev/null +++ b/openfl/experimental/interface/cli/cli_helper.py @@ -0,0 +1,225 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""Module with auxiliary CLI helper functions.""" + +from itertools import islice +from os import environ +from os import stat +from pathlib import Path +from sys import argv + +from click import echo +from click import style +from yaml import FullLoader +from yaml import load + +FX = argv[0] + +SITEPACKS = Path(__file__).parent.parent.parent.parent.parent +WORKSPACE = SITEPACKS / 'openfl-workspace' / 'experimental' +TUTORIALS = SITEPACKS / 'openfl-tutorials' +OPENFL_USERDIR = Path.home() / '.openfl' +CERT_DIR = Path('cert').absolute() + + +def pretty(o): + """Pretty-print the dictionary given.""" + m = max(map(len, o.keys())) + + for k, v in o.items(): + echo(style(f'{k:<{m}} : ', fg='blue') + style(f'{v}', fg='cyan')) + + +def tree(path): + """Print current directory file tree.""" + echo(f'+ {path}') + + for path in sorted(path.rglob('*')): + + depth = len(path.relative_to(path).parts) + space = ' ' * depth + + if path.is_file(): + echo(f'{space}f {path.name}') + else: + echo(f'{space}d {path.name}') + + +def print_tree(dir_path: Path, level: int = -1, + limit_to_directories: bool = False, + length_limit: int = 1000): + """Given a directory Path object print a visual tree structure.""" + space = ' ' + branch = '│ ' + tee = '├── ' + last = '└── ' + + echo('\nNew experimental workspace directory structure:') + + dir_path = Path(dir_path) # accept string coerceable to Path + files = 0 + directories = 0 + + def inner(dir_path: Path, prefix: str = '', level=-1): + nonlocal files, directories + if not level: + return # 0, stop iterating + if limit_to_directories: + contents = [d for d in dir_path.iterdir() if d.is_dir()] + else: + contents = list(dir_path.iterdir()) + pointers = [tee] * (len(contents) - 1) + [last] + for pointer, path in zip(pointers, contents): + if path.is_dir(): + yield prefix + pointer + path.name + directories += 1 + extension = branch if pointer == tee else space + yield from inner(path, prefix=prefix + extension, + level=level - 1) + elif not limit_to_directories: + yield prefix + pointer + path.name + files += 1 + + echo(dir_path.name) + iterator = inner(dir_path, level=level) + for line in islice(iterator, length_limit): + echo(line) + if next(iterator, None): + echo(f'... length_limit, {length_limit}, reached, counted:') + echo(f'\n{directories} directories' + (f', {files} files' if files else '')) + + +def copytree(src, dst, symlinks=False, ignore=None, + ignore_dangling_symlinks=False, dirs_exist_ok=False): + """From Python 3.8 'shutil' which include 'dirs_exist_ok' option.""" + import os + import shutil + + with os.scandir(src) as itr: + entries = list(itr) + + copy_function = shutil.copy2 + + def _copytree(): + + if ignore is not None: + ignored_names = ignore(os.fspath(src), [x.name for x in entries]) + else: + ignored_names = set() + + os.makedirs(dst, exist_ok=dirs_exist_ok) + errors = [] + use_srcentry = copy_function is shutil.copy2 or copy_function is shutil.copy + + for srcentry in entries: + if srcentry.name in ignored_names: + continue + srcname = os.path.join(src, srcentry.name) + dstname = os.path.join(dst, srcentry.name) + srcobj = srcentry if use_srcentry else srcname + try: + is_symlink = srcentry.is_symlink() + if is_symlink and os.name == 'nt': + lstat = srcentry.stat(follow_symlinks=False) + if lstat.st_reparse_tag == stat.IO_REPARSE_TAG_MOUNT_POINT: + is_symlink = False + if is_symlink: + linkto = os.readlink(srcname) + if symlinks: + os.symlink(linkto, dstname) + shutil.copystat(srcobj, dstname, + follow_symlinks=not symlinks) + else: + if (not os.path.exists(linkto) + and ignore_dangling_symlinks): + continue + if srcentry.is_dir(): + copytree(srcobj, dstname, symlinks, ignore, + dirs_exist_ok=dirs_exist_ok) + else: + copy_function(srcobj, dstname) + elif srcentry.is_dir(): + copytree(srcobj, dstname, symlinks, ignore, + dirs_exist_ok=dirs_exist_ok) + else: + copy_function(srcobj, dstname) + except OSError as why: + errors.append((srcname, dstname, str(why))) + except Exception as err: + errors.extend(err.args[0]) + try: + shutil.copystat(src, dst) + except OSError as why: + if getattr(why, 'winerror', None) is None: + errors.append((src, dst, str(why))) + if errors: + raise Exception(errors) + return dst + + return _copytree() + + +def get_workspace_parameter(name): + """Get a parameter from the workspace config file (.workspace).""" + # Update the .workspace file to show the current workspace plan + workspace_file = '.workspace' + + with open(workspace_file, 'r', encoding='utf-8') as f: + doc = load(f, Loader=FullLoader) + + if not doc: # YAML is not correctly formatted + doc = {} # Create empty dictionary + + if name not in doc.keys() or not doc[name]: # List doesn't exist + return '' + else: + return doc[name] + + +def check_varenv(env: str = '', args: dict = None): + """Update "args" (dictionary) with if env has a defined value in the host.""" + if args is None: + args = {} + env_val = environ.get(env) + if env and (env_val is not None): + args[env] = env_val + + return args + + +def get_fx_path(curr_path=''): + """Return the absolute path to fx binary.""" + import re + import os + + match = re.search('lib', curr_path) + idx = match.end() + path_prefix = curr_path[0:idx] + bin_path = re.sub('lib', 'bin', path_prefix) + fx_path = os.path.join(bin_path, 'fx') + + return fx_path + + +def remove_line_from_file(pkg, filename): + """Remove line that contains `pkg` from the `filename` file.""" + with open(filename, 'r+', encoding='utf-8') as f: + d = f.readlines() + f.seek(0) + for i in d: + if pkg not in i: + f.write(i) + f.truncate() + + +def replace_line_in_file(line, line_num_to_replace, filename): + """Replace line at `line_num_to_replace` with `line`.""" + with open(filename, 'r+', encoding='utf-8') as f: + d = f.readlines() + f.seek(0) + for idx, i in enumerate(d): + if idx == line_num_to_replace: + f.write(line) + else: + f.write(i) + f.truncate() diff --git a/openfl/experimental/interface/cli/collaborator.py b/openfl/experimental/interface/cli/collaborator.py new file mode 100644 index 0000000000..e31de19a88 --- /dev/null +++ b/openfl/experimental/interface/cli/collaborator.py @@ -0,0 +1,416 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""Collaborator module.""" + +import sys +import os +from logging import getLogger + +from click import echo +from click import group +from click import option +from click import pass_context +from click import Path as ClickPath +from click import style + +from openfl.utilities.path_check import is_directory_traversal + + +logger = getLogger(__name__) + + +@group() +@pass_context +def collaborator(context): + """Manage Federated Learning Collaborators.""" + context.obj["group"] = "service" + + +@collaborator.command(name="start") +@option( + "-p", + "--plan", + required=False, + help="Federated learning plan [plan/plan.yaml]", + default="plan/plan.yaml", + type=ClickPath(exists=True), +) +@option( + "-n", + "--collaborator_name", + required=True, + help="The certified common name of the collaborator", +) +@option( + '-s', + '--secure', + required=False, + help='Enable Intel SGX Enclave', + is_flag=True, + default=False +) +def start_(plan, collaborator_name, secure, data_config="plan/data.yaml"): + """Start a collaborator service.""" + from pathlib import Path + + from openfl.experimental.federated import Plan + + if plan and is_directory_traversal(plan): + echo("Federated learning plan path is out of the openfl workspace scope.") + sys.exit(1) + if data_config and is_directory_traversal(data_config): + echo( + "The data set/shard configuration file path is out of the openfl workspace scope." + ) + sys.exit(1) + + plan = Plan.parse( + plan_config_path=Path(plan).absolute(), + data_config_path=Path(data_config).absolute(), + ) + + if not os.path.exists(data_config): + logger.warning('Collaborator private attributes are set to None as' + f' {data_config} not found in workspace.') + else: + import yaml + from yaml.loader import SafeLoader + collaborator_name = collaborator_name.lower() + with open(data_config, 'r') as f: + data = yaml.load(f, Loader=SafeLoader) + if data.get(collaborator_name, None) is None: + logger.warning( + f'Collaborator private attributes are set to None as no attributes' + f' for {collaborator_name} found in {data_config}.') + + logger.info('🧿 Starting the Collaborator Service.') + + plan.get_collaborator(collaborator_name).run() + + +@collaborator.command(name="generate-cert-request") +@option( + "-n", + "--collaborator_name", + required=True, + help="The certified common name of the collaborator", +) +@option("-s", "--silent", help="Do not prompt", is_flag=True) +@option( + "-x", + "--skip-package", + help="Do not package the certificate signing request for export", + is_flag=True, +) +def generate_cert_request_(collaborator_name, silent, skip_package): + """Generate certificate request for the collaborator.""" + generate_cert_request(collaborator_name, silent, skip_package) + + +def generate_cert_request(collaborator_name, silent, skip_package): + """ + Create collaborator certificate key pair. + + Then create a package with the CSR to send for signing. + """ + from openfl.cryptography.participant import generate_csr + from openfl.cryptography.io import write_crt + from openfl.cryptography.io import write_key + from openfl.cryptography.io import get_csr_hash + from openfl.experimental.interface.cli.cli_helper import CERT_DIR + + common_name = f"{collaborator_name}".lower() + subject_alternative_name = f"DNS:{common_name}" + file_name = f"col_{common_name}" + + echo( + f"Creating COLLABORATOR certificate key pair with following settings: " + f'CN={style(common_name, fg="red")},' + f' SAN={style(subject_alternative_name, fg="red")}' + ) + + client_private_key, client_csr = generate_csr(common_name, server=False) + + (CERT_DIR / "client").mkdir(parents=True, exist_ok=True) + + echo( + " Moving COLLABORATOR certificate to: " + + style(f"{CERT_DIR}/{file_name}", fg="green") + ) + + # Print csr hash before writing csr to disk + csr_hash = get_csr_hash(client_csr) + echo("The CSR Hash " + style(f"{csr_hash}", fg="red")) + + # Write collaborator csr and key to disk + write_crt(client_csr, CERT_DIR / "client" / f"{file_name}.csr") + write_key(client_private_key, CERT_DIR / "client" / f"{file_name}.key") + + if not skip_package: + from shutil import copytree + from shutil import ignore_patterns + from shutil import make_archive + from tempfile import mkdtemp + from os.path import basename + from os.path import join + from os import remove + from glob import glob + + from openfl.utilities.utils import rmtree + + archive_type = "zip" + archive_name = f"col_{common_name}_to_agg_cert_request" + archive_file_name = archive_name + "." + archive_type + + # Collaborator certificate signing request + tmp_dir = join(mkdtemp(), "openfl", archive_name) + + ignore = ignore_patterns("__pycache__", "*.key", "*.srl", "*.pem") + # Copy the current directory into the temporary directory + copytree(f"{CERT_DIR}/client", tmp_dir, ignore=ignore) + + for f in glob(f"{tmp_dir}/*"): + if common_name not in basename(f): + remove(f) + + # Create Zip archive of directory + make_archive(archive_name, archive_type, tmp_dir) + rmtree(tmp_dir) + + echo( + f"Archive {archive_file_name} with certificate signing" f" request created" + ) + echo( + "This file should be sent to the certificate authority" + " (typically hosted by the aggregator) for signing" + ) + + +def find_certificate_name(file_name): + """Parse the collaborator name.""" + col_name = str(file_name).split(os.sep)[-1].split(".")[0][4:] + return col_name + + +def register_collaborator(file_name): + """Register the collaborator name in the cols.yaml list. + + Args: + file_name (str): The name of the collaborator in this federation + + """ + from os.path import isfile + from yaml import dump + from yaml import FullLoader + from yaml import load + from pathlib import Path + + col_name = find_certificate_name(file_name) + + cols_file = Path("plan/cols.yaml").absolute() + + if not isfile(cols_file): + cols_file.touch() + with open(cols_file, "r", encoding="utf-8") as f: + doc = load(f, Loader=FullLoader) + + if not doc: # YAML is not correctly formatted + doc = {} # Create empty dictionary + + # List doesn't exist + if "collaborators" not in doc.keys() or not doc["collaborators"]: + doc["collaborators"] = [] # Create empty list + + if col_name in doc["collaborators"]: + echo( + "\nCollaborator " + + style(f"{col_name}", fg="green") + + " is already in the " + + style(f"{cols_file}", fg="green") + ) + + else: + doc["collaborators"].append(col_name) + with open(cols_file, "w", encoding="utf-8") as f: + dump(doc, f) + + echo( + "\nRegistering " + + style(f"{col_name}", fg="green") + + " in " + + style(f"{cols_file}", fg="green") + ) + + +@collaborator.command(name="certify") +@option( + "-n", + "--collaborator_name", + help="The certified common name of the collaborator. This is only" + " needed for single node expiriments", +) +@option("-s", "--silent", help="Do not prompt", is_flag=True) +@option( + "-r", + "--request-pkg", + type=ClickPath(exists=True), + help="The archive containing the certificate signing" + " request (*.zip) for a collaborator", +) +@option( + "-i", + "--import", + "import_", + type=ClickPath(exists=True), + help="Import the archive containing the collaborator's" + " certificate (signed by the CA)", +) +def certify_(collaborator_name, silent, request_pkg, import_): + """Certify the collaborator.""" + certify(collaborator_name, silent, request_pkg, import_) + + +def certify(collaborator_name, silent, request_pkg=None, import_=False): + """Sign/certify collaborator certificate key pair.""" + from click import confirm + from pathlib import Path + from shutil import copy + from shutil import make_archive + from shutil import unpack_archive + from glob import glob + from os.path import basename + from os.path import join + from os.path import splitext + from os import remove + from tempfile import mkdtemp + from openfl.cryptography.ca import sign_certificate + from openfl.cryptography.io import read_crt + from openfl.cryptography.io import read_csr + from openfl.cryptography.io import read_key + from openfl.cryptography.io import write_crt + from openfl.experimental.interface.cli.cli_helper import CERT_DIR + from openfl.utilities.utils import rmtree + + common_name = f"{collaborator_name}".lower() + + if not import_: + if request_pkg: + Path(f"{CERT_DIR}/client").mkdir(parents=True, exist_ok=True) + unpack_archive(request_pkg, extract_dir=f"{CERT_DIR}/client") + csr = glob(f"{CERT_DIR}/client/*.csr")[0] + else: + if collaborator_name is None: + echo( + "collaborator_name can only be omitted if signing\n" + "a zipped request package.\n" + "\n" + "Example: fx collaborator certify --request-pkg " + "col_one_to_agg_cert_request.zip" + ) + return + csr = glob(f"{CERT_DIR}/client/col_{common_name}.csr")[0] + copy(csr, CERT_DIR) + cert_name = splitext(csr)[0] + file_name = basename(cert_name) + signing_key_path = "ca/signing-ca/private/signing-ca.key" + signing_crt_path = "ca/signing-ca.crt" + + # Load CSR + if not Path(f"{cert_name}.csr").exists(): + echo( + style("Collaborator certificate signing request not found.", fg="red") + + " Please run `fx collaborator generate-cert-request`" + " to generate the certificate request." + ) + + csr, csr_hash = read_csr(f"{cert_name}.csr") + + # Load private signing key + if not Path(CERT_DIR / signing_key_path).exists(): + echo( + style("Signing key not found.", fg="red") + + " Please run `fx workspace certify`" + " to initialize the local certificate authority." + ) + + signing_key = read_key(CERT_DIR / signing_key_path) + + # Load signing cert + if not Path(CERT_DIR / signing_crt_path).exists(): + echo( + style("Signing certificate not found.", fg="red") + + " Please run `fx workspace certify`" + " to initialize the local certificate authority." + ) + + signing_crt = read_crt(CERT_DIR / signing_crt_path) + + echo( + "The CSR Hash for file " + + style(f"{file_name}.csr", fg="green") + + " = " + + style(f"{csr_hash}", fg="red") + ) + + if silent: + echo(" Signing COLLABORATOR certificate") + echo( + " Warning: manual check of certificate hashes is bypassed in silent mode." + ) + signed_col_cert = sign_certificate(csr, signing_key, signing_crt.subject) + write_crt(signed_col_cert, f"{cert_name}.crt") + register_collaborator(CERT_DIR / "client" / f"{file_name}.crt") + + else: + echo("Make sure the two hashes above are the same.") + if confirm("Do you want to sign this certificate?"): + echo(" Signing COLLABORATOR certificate") + signed_col_cert = sign_certificate( + csr, signing_key, signing_crt.subject + ) + write_crt(signed_col_cert, f"{cert_name}.crt") + register_collaborator(CERT_DIR / "client" / f"{file_name}.crt") + + else: + echo( + style("Not signing certificate.", fg="red") + + " Please check with this collaborator to get the" + " correct certificate for this federation." + ) + return + + if len(common_name) == 0: + # If the collaborator name is provided, the collaborator and + # certificate does not need to be exported + return + + # Remove unneeded CSR + remove(f"{cert_name}.csr") + + archive_type = "zip" + archive_name = f"agg_to_{file_name}_signed_cert" + + # Collaborator certificate signing request + tmp_dir = join(mkdtemp(), "openfl", archive_name) + + Path(f"{tmp_dir}/client").mkdir(parents=True, exist_ok=True) + # Copy the signed cert to the temporary directory + copy(f"{CERT_DIR}/client/{file_name}.crt", f"{tmp_dir}/client/") + # Copy the CA certificate chain to the temporary directory + copy(f"{CERT_DIR}/cert_chain.crt", tmp_dir) + + # Create Zip archive of directory + make_archive(archive_name, archive_type, tmp_dir) + rmtree(tmp_dir) + + else: + # Copy the signed certificate and cert chain into PKI_DIR + previous_crts = glob(f"{CERT_DIR}/client/*.crt") + unpack_archive(import_, extract_dir=CERT_DIR) + updated_crts = glob(f"{CERT_DIR}/client/*.crt") + cert_difference = list(set(updated_crts) - set(previous_crts)) + if len(cert_difference) != 0: + crt = basename(cert_difference[0]) + echo(f"Certificate {crt} installed to PKI directory") + else: + echo("Certificate updated in the PKI directory") diff --git a/openfl/experimental/interface/cli/experimental.py b/openfl/experimental/interface/cli/experimental.py new file mode 100644 index 0000000000..e97dbabeb9 --- /dev/null +++ b/openfl/experimental/interface/cli/experimental.py @@ -0,0 +1,25 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""Experimental CLI.""" + +import os +from pathlib import Path + +from click import group +from click import pass_context + + +@group() +@pass_context +def experimental(context): + """Manage Experimental Environment.""" + context.obj["group"] = "experimental" + + +@experimental.command(name="deactivate") +def deactivate(): + """Deactivate experimental environment.""" + settings = Path("~").expanduser().joinpath( + ".openfl", "experimental").resolve() + + os.remove(settings) diff --git a/openfl/experimental/interface/cli/plan.py b/openfl/experimental/interface/cli/plan.py new file mode 100644 index 0000000000..3026b182c9 --- /dev/null +++ b/openfl/experimental/interface/cli/plan.py @@ -0,0 +1,94 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""Plan module.""" + +import sys +from logging import getLogger + +from click import echo +from click import group +from click import option +from click import pass_context +from click import Path as ClickPath + +from openfl.utilities.path_check import is_directory_traversal + +logger = getLogger(__name__) + + +@group() +@pass_context +def plan(context): + """Manage Federated Learning Plans.""" + context.obj['group'] = 'plan' + + +@plan.command() +@pass_context +@option('-p', '--plan_config', required=False, + help='Federated learning plan [plan/plan.yaml]', + default='plan/plan.yaml', type=ClickPath(exists=True)) +@option('-c', '--cols_config', required=False, + help='Authorized collaborator list [plan/cols.yaml]', + default='plan/cols.yaml', type=ClickPath(exists=True)) +@option('-d', '--data_config', required=False, + help='The data set/shard configuration file [plan/data.yaml]', + default='plan/data.yaml') +@option('-a', '--aggregator_address', required=False, + help='The FQDN of the federation agregator') +def initialize(context, plan_config, cols_config, data_config, + aggregator_address): + """ + Initialize Data Science plan. + + Create a protocol buffer file of the initial model weights for + the federation. + """ + from pathlib import Path + + from openfl.experimental.federated import Plan + from openfl.utilities.utils import getfqdn_env + + for p in [plan_config, cols_config, data_config]: + if is_directory_traversal(p): + echo(f'{p} is out of the openfl workspace scope.') + sys.exit(1) + + plan_config = Path(plan_config).absolute() + cols_config = Path(cols_config).absolute() + data_config = Path(data_config).absolute() + + plan = Plan.parse(plan_config_path=plan_config, + cols_config_path=cols_config, + data_config_path=data_config) + + plan_origin = Plan.parse(plan_config, resolve=False).config + + if (plan_origin['network']['settings']['agg_addr'] == 'auto' + or aggregator_address): + plan_origin['network']['settings']['agg_addr'] = aggregator_address or getfqdn_env() + + logger.warn(f'Patching Aggregator Addr in Plan' + f" 🠆 {plan_origin['network']['settings']['agg_addr']}") + + Plan.dump(plan_config, plan_origin) + + plan.config = plan_origin + + # Record that plan with this hash has been initialized + if 'plans' not in context.obj: + context.obj['plans'] = [] + context.obj['plans'].append(f'{plan_config.stem}_{plan.hash[:8]}') + logger.info(f"{context.obj['plans']}") + + +def freeze_plan(plan_config): + """Dump the plan to YAML file.""" + from pathlib import Path + + from openfl.experimental.federated import Plan + + plan = Plan() + plan.config = Plan.parse(Path(plan_config), resolve=False).config + + Plan.dump(Path(plan_config), plan.config, freeze=True) diff --git a/openfl/experimental/interface/cli/workspace.py b/openfl/experimental/interface/cli/workspace.py new file mode 100644 index 0000000000..f76391e0f8 --- /dev/null +++ b/openfl/experimental/interface/cli/workspace.py @@ -0,0 +1,437 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""Workspace module.""" + +import sys +import os +from pathlib import Path +from typing import Tuple +from logging import getLogger + +from click import Choice +from click import confirm +from click import echo +from click import style +from click import group +from click import option +from click import pass_context +from click import Path as ClickPath + +from openfl.utilities.path_check import is_directory_traversal +from openfl.utilities.workspace import dump_requirements_file + +logger = getLogger(__name__) + + +@group() +@pass_context +def workspace(context): + """Manage Experimental Federated Learning Workspaces.""" + context.obj['group'] = 'workspace' + + +def create_dirs(prefix): + """Create workspace directories.""" + from shutil import copyfile + + from openfl.experimental.interface.cli.cli_helper import WORKSPACE + + echo('Creating Workspace Directories') + + (prefix / 'cert').mkdir(parents=True, exist_ok=True) # certifications + (prefix / 'data').mkdir(parents=True, exist_ok=True) # training data + (prefix / 'logs').mkdir(parents=True, exist_ok=True) # training logs + (prefix / 'save').mkdir(parents=True, exist_ok=True) # model weight saves / initialization + (prefix / 'src').mkdir(parents=True, exist_ok=True) # model code + + copyfile(WORKSPACE / 'workspace' / '.workspace', prefix / '.workspace') + + +def create_temp(prefix, template): + """Create workspace templates.""" + from shutil import ignore_patterns + + from openfl.experimental.interface.cli.cli_helper import copytree + from openfl.experimental.interface.cli.cli_helper import WORKSPACE + + echo('Creating Workspace Templates') + # Use the specified template if it's a Path, otherwise use WORKSPACE/template + source = template if isinstance(template, Path) else WORKSPACE / template + + copytree(src=source, dst=prefix, dirs_exist_ok=True, + ignore=ignore_patterns('__pycache__')) # from template workspace + apply_template_plan(prefix, template) + + +def get_templates(): + """Grab the default templates from the distribution.""" + from openfl.experimental.interface.cli.cli_helper import WORKSPACE + + return [d.name for d in WORKSPACE.glob('*') if d.is_dir() + and d.name not in ['__pycache__', 'workspace']] + + +@workspace.command(name='create') +@option('--prefix', required=True, + help='Workspace name or path', type=ClickPath()) +@option('--custom_template', required=False, + help='Path to custom template', type=ClickPath(exists=True)) +@option('--notebook', required=False, + help='Path to jupyter notebook', type=ClickPath(exists=True)) +@option('--template_output_dir', required=False, + help='Destination directory to save your Jupyter Notebook workspace.', + type=ClickPath(exists=False, file_okay=False, dir_okay=True)) +@option('--template', required=False, type=Choice(get_templates())) +def create_(prefix, custom_template, template, notebook, template_output_dir): + """Create the experimental workspace.""" + if is_directory_traversal(prefix): + echo('Workspace name or path is out of the openfl workspace scope.') + sys.exit(1) + + if custom_template and template and notebook: + raise ValueError( + 'Please provide either `template`, `custom_template` or ' + + '`notebook`. Not all are necessary' + ) + elif ( + (custom_template and template) + or (template and notebook) + or (custom_template and notebook)): + raise ValueError( + 'Please provide only one of the following options: ' + + '`template`, `custom_template`, or `notebook`.' + ) + + if not (custom_template or template or notebook): + raise ValueError( + 'Please provide one of the following options: ' + + '`template`, `custom_template`, or `notebook`.' + ) + + if notebook: + if not template_output_dir: + raise ValueError( + 'Please provide output_workspace which is Destination directory to ' + + 'save your Jupyter Notebook workspace.' + ) + + from openfl.experimental.workspace_export import WorkspaceExport + + WorkspaceExport.export( + notebook_path=notebook, output_workspace=template_output_dir, + ) + + create(prefix, template_output_dir) + + logger.warning( + 'The user should review the generated workspace for completeness ' + + 'before proceeding') + else: + template = ( + Path(custom_template).resolve() + if custom_template + else template + ) + create(prefix, template) + + +def create(prefix, template): + """Create federated learning workspace.""" + from os.path import isfile + from subprocess import check_call + from sys import executable + + from openfl.experimental.interface.cli.cli_helper import ( + OPENFL_USERDIR, + print_tree + ) + + if not OPENFL_USERDIR.exists(): + OPENFL_USERDIR.mkdir() + + prefix = Path(prefix).absolute() + + create_dirs(prefix) + create_temp(prefix, template) + + requirements_filename = 'requirements.txt' + + if not os.path.exists(f'{str(prefix)}/plan/data.yaml'): + echo(style('Participant private attributes shall be set to None as plan/data.yaml' + + ' was not found in the workspace.', fg='yellow')) + + if isfile(f'{str(prefix)}/{requirements_filename}'): + check_call([ + executable, '-m', 'pip', 'install', '-r', + f'{prefix}/requirements.txt'], shell=False) + echo(f'Successfully installed packages from {prefix}/requirements.txt.') + else: + echo('No additional requirements for workspace defined. Skipping...') + prefix_hash = _get_dir_hash(str(prefix.absolute())) + with open(OPENFL_USERDIR / f'requirements.{prefix_hash}.txt', 'w', encoding='utf-8') as f: + check_call([executable, '-m', 'pip', 'freeze'], shell=False, stdout=f) + + print_tree(prefix, level=3) + + +@workspace.command(name='export') +@option('-o', '--pip-install-options', required=False, + type=str, multiple=True, default=tuple, + help='Options for remote pip install. ' + 'You may pass several options in quotation marks alongside with arguments, ' + 'e.g. -o "--find-links source.site"') +def export_(pip_install_options: Tuple[str]): + """Export federated learning workspace.""" + from os import getcwd + from os import makedirs + from os.path import basename + from os.path import join + from shutil import copy2 + from shutil import copytree + from shutil import ignore_patterns + from shutil import make_archive + from tempfile import mkdtemp + + from plan import freeze_plan + from openfl.experimental.interface.cli.cli_helper import WORKSPACE + from openfl.utilities.utils import rmtree + + echo(style('This command will archive the contents of \'plan\' and \'src\' directory, user' + + ' should review that these does not contain any information which is private and' + + ' not to be shared.', fg='yellow')) + + plan_file = Path('plan/plan.yaml').absolute() + try: + freeze_plan(plan_file) + except FileNotFoundError: + echo(f'Plan file "{plan_file}" not found. No freeze performed.') + + # Dump requirements.txt + dump_requirements_file(prefixes=pip_install_options, keep_original_prefixes=True) + + archive_type = 'zip' + archive_name = basename(getcwd()) + archive_file_name = archive_name + '.' + archive_type + + # Aggregator workspace + tmp_dir = join(mkdtemp(), 'openfl', archive_name) + + ignore = ignore_patterns( + '__pycache__', '*.crt', '*.key', '*.csr', '*.srl', '*.pem', '*.pbuf') + + # We only export the minimum required files to set up a collaborator + makedirs(f'{tmp_dir}/save', exist_ok=True) + makedirs(f'{tmp_dir}/logs', exist_ok=True) + makedirs(f'{tmp_dir}/data', exist_ok=True) + copytree('./src', f'{tmp_dir}/src', ignore=ignore) # code + copytree('./plan', f'{tmp_dir}/plan', ignore=ignore) # plan + copy2('./requirements.txt', f'{tmp_dir}/requirements.txt') # requirements + + try: + copy2('.workspace', tmp_dir) # .workspace + except FileNotFoundError: + echo('\'.workspace\' file not found.') + if confirm('Create a default \'.workspace\' file?'): + copy2(WORKSPACE / 'workspace' / '.workspace', tmp_dir) + else: + echo('To proceed, you must have a \'.workspace\' ' + 'file in the current directory.') + raise + + # Create Zip archive of directory + echo('\n 🗜️ Preparing workspace distribution zip file') + make_archive(archive_name, archive_type, tmp_dir) + rmtree(tmp_dir) + echo(f'\n ✔️ Workspace exported to archive: {archive_file_name}') + + +@workspace.command(name='import') +@option('--archive', required=True, + help='Zip file containing workspace to import', + type=ClickPath(exists=True)) +def import_(archive): + """Import federated learning workspace.""" + from os import chdir + from os.path import basename + from os.path import isfile + from shutil import unpack_archive + from subprocess import check_call + from sys import executable + + archive = Path(archive).absolute() + + dir_path = basename(archive).split('.')[0] + unpack_archive(archive, extract_dir=dir_path) + chdir(dir_path) + + requirements_filename = 'requirements.txt' + + if isfile(requirements_filename): + check_call([ + executable, '-m', 'pip', 'install', '--upgrade', 'pip'], + shell=False) + check_call([ + executable, '-m', 'pip', 'install', '-r', requirements_filename], + shell=False) + else: + echo('No ' + requirements_filename + ' file found.') + + echo(f'Workspace {archive} has been imported.') + echo('You may need to copy your PKI certificates to join the federation.') + + +@workspace.command(name='certify') +def certify_(): + """Create certificate authority for federation.""" + certify() + + +def certify(): + """Create certificate authority for federation.""" + from cryptography.hazmat.primitives import serialization + + from openfl.cryptography.ca import generate_root_cert + from openfl.cryptography.ca import generate_signing_csr + from openfl.cryptography.ca import sign_certificate + from openfl.experimental.interface.cli.cli_helper import CERT_DIR + + echo('Setting Up Certificate Authority...\n') + + echo('1. Create Root CA') + echo('1.1 Create Directories') + + (CERT_DIR / 'ca/root-ca/private').mkdir( + parents=True, exist_ok=True, mode=0o700) + (CERT_DIR / 'ca/root-ca/db').mkdir(parents=True, exist_ok=True) + + echo('1.2 Create Database') + + with open(CERT_DIR / 'ca/root-ca/db/root-ca.db', 'w', encoding='utf-8') as f: + pass # write empty file + with open(CERT_DIR / 'ca/root-ca/db/root-ca.db.attr', 'w', encoding='utf-8') as f: + pass # write empty file + + with open(CERT_DIR / 'ca/root-ca/db/root-ca.crt.srl', 'w', encoding='utf-8') as f: + f.write('01') # write file with '01' + with open(CERT_DIR / 'ca/root-ca/db/root-ca.crl.srl', 'w', encoding='utf-8') as f: + f.write('01') # write file with '01' + + echo('1.3 Create CA Request and Certificate') + + root_crt_path = 'ca/root-ca.crt' + root_key_path = 'ca/root-ca/private/root-ca.key' + + root_private_key, root_cert = generate_root_cert() + + # Write root CA certificate to disk + with open(CERT_DIR / root_crt_path, 'wb') as f: + f.write(root_cert.public_bytes( + encoding=serialization.Encoding.PEM, + )) + + with open(CERT_DIR / root_key_path, 'wb') as f: + f.write(root_private_key.private_bytes( + encoding=serialization.Encoding.PEM, + format=serialization.PrivateFormat.TraditionalOpenSSL, + encryption_algorithm=serialization.NoEncryption() + )) + + echo('2. Create Signing Certificate') + echo('2.1 Create Directories') + + (CERT_DIR / 'ca/signing-ca/private').mkdir( + parents=True, exist_ok=True, mode=0o700) + (CERT_DIR / 'ca/signing-ca/db').mkdir(parents=True, exist_ok=True) + + echo('2.2 Create Database') + + with open(CERT_DIR / 'ca/signing-ca/db/signing-ca.db', 'w', encoding='utf-8') as f: + pass # write empty file + with open(CERT_DIR / 'ca/signing-ca/db/signing-ca.db.attr', 'w', encoding='utf-8') as f: + pass # write empty file + + with open(CERT_DIR / 'ca/signing-ca/db/signing-ca.crt.srl', 'w', encoding='utf-8') as f: + f.write('01') # write file with '01' + with open(CERT_DIR / 'ca/signing-ca/db/signing-ca.crl.srl', 'w', encoding='utf-8') as f: + f.write('01') # write file with '01' + + echo('2.3 Create Signing Certificate CSR') + + signing_csr_path = 'ca/signing-ca.csr' + signing_crt_path = 'ca/signing-ca.crt' + signing_key_path = 'ca/signing-ca/private/signing-ca.key' + + signing_private_key, signing_csr = generate_signing_csr() + + # Write Signing CA CSR to disk + with open(CERT_DIR / signing_csr_path, 'wb') as f: + f.write(signing_csr.public_bytes( + encoding=serialization.Encoding.PEM, + )) + + with open(CERT_DIR / signing_key_path, 'wb') as f: + f.write(signing_private_key.private_bytes( + encoding=serialization.Encoding.PEM, + format=serialization.PrivateFormat.TraditionalOpenSSL, + encryption_algorithm=serialization.NoEncryption() + )) + + echo('2.4 Sign Signing Certificate CSR') + + signing_cert = sign_certificate(signing_csr, root_private_key, root_cert.subject, ca=True) + + with open(CERT_DIR / signing_crt_path, 'wb') as f: + f.write(signing_cert.public_bytes( + encoding=serialization.Encoding.PEM, + )) + + echo('3 Create Certificate Chain') + + # create certificate chain file by combining root-ca and signing-ca + with open(CERT_DIR / 'cert_chain.crt', 'w', encoding='utf-8') as d: + with open(CERT_DIR / 'ca/root-ca.crt', encoding='utf-8') as s: + d.write(s.read()) + with open(CERT_DIR / 'ca/signing-ca.crt') as s: + d.write(s.read()) + + echo('\nDone.') + +# FIXME: Function is not in use + + +def _get_requirements_dict(txtfile): + with open(txtfile, 'r', encoding='utf-8') as snapshot: + snapshot_dict = {} + for line in snapshot: + try: + # 'pip freeze' generates requirements with exact versions + k, v = line.split('==') + snapshot_dict[k] = v + except ValueError: + snapshot_dict[line] = None + return snapshot_dict + + +def _get_dir_hash(path): + from hashlib import sha256 + hash_ = sha256() + hash_.update(path.encode('utf-8')) + hash_ = hash_.hexdigest() + return hash_ + + +def apply_template_plan(prefix, template): + """Copy plan file from template folder. + + This function unfolds default values from template plan configuration + and writes the configuration to the current workspace. + """ + from openfl.experimental.federated.plan import Plan + from openfl.experimental.interface.cli.cli_helper import WORKSPACE + + # Use the specified template if it's a Path, otherwise use WORKSPACE/template + source = template if isinstance(template, Path) else WORKSPACE / template + + template_plan = Plan.parse(source / 'plan' / 'plan.yaml') + + Plan.dump(prefix / 'plan' / 'plan.yaml', template_plan.config) diff --git a/openfl/experimental/interface/fl_spec.py b/openfl/experimental/interface/fl_spec.py index 170e9cd9ae..771e471d97 100644 --- a/openfl/experimental/interface/fl_spec.py +++ b/openfl/experimental/interface/fl_spec.py @@ -11,11 +11,12 @@ from openfl.experimental.utilities import ( MetaflowInterface, SerializationError, + generate_artifacts, aggregator_to_collaborator, collaborator_to_aggregator, should_transfer, filter_attributes, - checkpoint, + checkpoint ) from openfl.experimental.runtime import Runtime @@ -46,11 +47,11 @@ def save_initial_state(cls, instance: Type[FLSpec]) -> None: def run(self) -> None: """Starts the execution of the flow""" # Submit flow to Runtime - self._metaflow_interface = MetaflowInterface( - self.__class__, self.runtime.backend - ) - self._run_id = self._metaflow_interface.create_run() if str(self._runtime) == "LocalRuntime": + self._metaflow_interface = MetaflowInterface( + self.__class__, self.runtime.backend + ) + self._run_id = self._metaflow_interface.create_run() # Initialize aggregator private attributes self.runtime.initialize_aggregator() self._foreach_methods = [] @@ -86,7 +87,7 @@ def run(self) -> None: for name, attr in final_attributes: setattr(self, name, attr) elif str(self._runtime) == "FederatedRuntime": - raise Exception("Submission to remote runtime not available yet") + pass else: raise Exception("Runtime not implemented") @@ -145,30 +146,92 @@ def _display_transition_logs(self, f: Callable, parent_func: Callable) -> None: elif collaborator_to_aggregator(f, parent_func): print("Sending state from collaborator to aggregator") - def next(self, f: Callable, **kwargs) -> None: + def filter_exclude_include(self, f, **kwargs): """ - Next task in the flow to execute + This function filters exclude/include attributes Args: - f: The next task that will be executed in the flow + flspec_obj : Reference to the FLSpec (flow) object + f : The task to be executed within the flow + """ + selected_collaborators = getattr(self, kwargs["foreach"]) + + for col in selected_collaborators: + clone = FLSpec._clones[col] + clone.input = col + if ("exclude" in kwargs and hasattr(clone, kwargs["exclude"][0])) or ( + "include" in kwargs and hasattr(clone, kwargs["include"][0]) + ): + filter_attributes(clone, f, **kwargs) + artifacts_iter, _ = generate_artifacts(ctx=self) + for name, attr in artifacts_iter(): + setattr(clone, name, deepcopy(attr)) + clone._foreach_methods = self._foreach_methods + + def restore_instance_snapshot( + self, ctx: FLSpec, instance_snapshot: List[FLSpec] + ): + """Restores attributes from backup (in instance snapshot) to ctx""" + for backup in instance_snapshot: + artifacts_iter, _ = generate_artifacts(ctx=backup) + for name, attr in artifacts_iter(): + if not hasattr(ctx, name): + setattr(ctx, name, attr) + + def get_clones(self, kwargs): + """ + Create, and prepare clones + """ + FLSpec._reset_clones() + FLSpec._create_clones(self, self.runtime.collaborators) + selected_collaborators = self.__getattribute__(kwargs['foreach']) + + for col in selected_collaborators: + clone = FLSpec._clones[col] + clone.input = col + artifacts_iter, _ = generate_artifacts(ctx=clone) + attributes = artifacts_iter() + for name, attr in attributes: + setattr(clone, name, deepcopy(attr)) + clone._foreach_methods = self._foreach_methods + clone._metaflow_interface = self._metaflow_interface + + def next(self, f, **kwargs): + """ + Next task in the flow to execute """ - # Get the name and reference to the calling function parent = inspect.stack()[1][3] parent_func = getattr(self, parent) - # Checkpoint current attributes (if checkpoint==True) - checkpoint(self, parent_func) + if str(self._runtime) == "LocalRuntime": + # Checkpoint current attributes (if checkpoint==True) + checkpoint(self, parent_func) # Take back-up of current state of self - agg_to_collab_ss = [] + agg_to_collab_ss = None if aggregator_to_collaborator(f, parent_func): agg_to_collab_ss = self._capture_instance_snapshot(kwargs=kwargs) + if str(self._runtime) == "FederatedRuntime": + if len(FLSpec._clones) == 0: + self.get_clones(kwargs) + # Remove included / excluded attributes from next task filter_attributes(self, f, **kwargs) - self._display_transition_logs(f, parent_func) - - # update parameters required to execute execute_task function - self.execute_task_args = [f, parent_func, agg_to_collab_ss, kwargs] + if str(self._runtime) == "FederatedRuntime": + if f.collaborator_step and not f.aggregator_step: + self._foreach_methods.append(f.__name__) + + if "foreach" in kwargs: + self.filter_exclude_include(f, **kwargs) + # if "foreach" in kwargs: + self.execute_task_args = (self, f, parent_func, FLSpec._clones, + agg_to_collab_ss, kwargs) + else: + self.execute_task_args = (self, f, parent_func, kwargs) + + elif str(self._runtime) == "LocalRuntime": + # update parameters required to execute execute_task function + self.execute_task_args = [f, parent_func, agg_to_collab_ss, kwargs] diff --git a/openfl/experimental/interface/participants.py b/openfl/experimental/interface/participants.py index 8ff54523b2..84847fb6fd 100644 --- a/openfl/experimental/interface/participants.py +++ b/openfl/experimental/interface/participants.py @@ -1,228 +1,235 @@ -# Copyright (C) 2020-2023 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -"""openfl.experimental.interface.participants module.""" - -from typing import Dict, Any -from typing import Callable, Optional - - -class Participant: - def __init__(self, name: str = ""): - self.private_attributes = {} - self._name = name - - @property - def name(self): - return self._name - - @name.setter - def name(self, name: str): - self._name = name - - def private_attributes(self, attrs: Dict[str, Any]) -> None: - """ - Set the private attributes of the participant. These attributes will - only be available within the tasks performed by the participants and - will be filtered out prior to the task's state being transfered. - - Args: - attrs: dictionary of ATTRIBUTE_NAME (str) -> object that will be accessible - within the participant's task. - - Example: - {'train_loader' : torch.utils.data.DataLoader(...)} - - In any task performed by this participant performed within the flow, - this attribute could be referenced with self.train_loader - """ - self.private_attributes = attrs - - -class Collaborator(Participant): - """ - Defines a collaborator participant - """ - def __init__(self, name: str = "", private_attributes_callable: Callable = None, - num_cpus: int = 0, num_gpus: int = 0.0, **kwargs): - """ - Create collaborator object with custom resources and a callable - function to assign private attributes - - Parameters: - name (str): Name of the collaborator. default="" - - private_attributes_callable (Callable): A function which returns collaborator - private attributes for each collaborator. In case private_attributes are not - required this can be omitted. default=None - - num_cpus (int): Specifies how many cores to use for the collaborator step exection. - This will only be used if backend is set to ray. default=0 - - num_gpus (float): Specifies how many GPUs to use to accerlerate the collaborator - step exection. This will only be used if backend is set to ray. default=0 - - kwargs (dict): Parameters required to call private_attributes_callable function. - The key of the dictionary must match the arguments to the private_attributes_callable. - default={} - """ - super().__init__(name=name) - self.num_cpus = num_cpus - self.num_gpus = num_gpus - self.kwargs = kwargs - - if private_attributes_callable is None: - self.private_attributes_callable = private_attributes_callable - else: - if not callable(private_attributes_callable): - raise Exception("private_attributes_callable parameter must be a callable") - else: - self.private_attributes_callable = private_attributes_callable - - def get_name(self) -> str: - """Get collaborator name""" - return self._name - - def initialize_private_attributes(self) -> None: - """ - initialize private attributes of Collaborator object by invoking - the callable specified by user - """ - if self.private_attributes_callable is not None: - self.private_attributes = self.private_attributes_callable(**self.kwargs) - - def __set_collaborator_attrs_to_clone(self, clone: Any) -> None: - """ - Set collaborator private attributes to FLSpec clone before transitioning - from Aggregator step to collaborator steps - """ - # set collaborator private attributes as - # clone attributes - for name, attr in self.private_attributes.items(): - setattr(clone, name, attr) - - def __delete_collab_attrs_from_clone(self, clone: Any) -> None: - """ - Remove collaborator private attributes from FLSpec clone before - transitioning from Collaborator step to Aggregator step - """ - # Update collaborator private attributes by taking latest - # parameters from clone, then delete attributes from clone. - for attr_name in self.private_attributes: - if hasattr(clone, attr_name): - self.private_attributes.update( - {attr_name: getattr(clone, attr_name)} - ) - delattr(clone, attr_name) - - def execute_func(self, ctx: Any, f_name: str, callback: Callable) -> Any: - """ - Execute remote function f - """ - self.__set_collaborator_attrs_to_clone(ctx) - - callback(ctx, f_name) - - self.__delete_collab_attrs_from_clone(ctx) - - return ctx - - -class Aggregator(Participant): - """ - Defines an aggregator participant - """ - - def __init__( - self, - name: str = "", - private_attributes_callable: Callable = None, - num_cpus: int = 0, - num_gpus: int = 0.0, - **kwargs - ): - """ - Create aggregator object with custom resources and a callable - function to assign private attributes - - Parameters: - name (str): Name of the aggregator. default="" - - private_attributes_callable (Callable): A function which returns aggregator - private attributes. In case private_attributes are not required this can be omitted. - default=None - - num_cpus (int): Specifies how many cores to use for the aggregator step exection. - This will only be used if backend is set to ray. default=0 - - num_gpus (float): Specifies how many GPUs to use to accerlerate the aggregator - step exection. This will only be used if backend is set to ray. default=0 - - kwargs (dict): Parameters required to call private_attributes_callable function. - The key of the dictionary must match the arguments to the private_attributes_callable. - default={} - """ - super().__init__(name=name) - self.num_cpus = num_cpus - self.num_gpus = num_gpus - self.kwargs = kwargs - - if private_attributes_callable is None: - self.private_attributes_callable = private_attributes_callable - else: - if not callable(private_attributes_callable): - raise Exception( - "private_attributes_callable parameter must be a callable" - ) - else: - self.private_attributes_callable = private_attributes_callable - - def get_name(self) -> str: - """Get aggregator name""" - return self.name - - def initialize_private_attributes(self) -> None: - """ - initialize private attributes of Aggregator object by invoking - the callable specified by user - """ - if self.private_attributes_callable is not None: - self.private_attributes = self.private_attributes_callable(**self.kwargs) - - def __set_agg_attrs_to_clone(self, clone: Any) -> None: - """ - Set aggregator private attributes to FLSpec clone before transition - from Aggregator step to collaborator steps - """ - # set aggregator private attributes as - # clone attributes - for name, attr in self.private_attributes.items(): - setattr(clone, name, attr) - - def __delete_agg_attrs_from_clone(self, clone: Any) -> None: - """ - Remove aggregator private attributes from FLSpec clone before - transition from Aggregator step to collaborator steps - """ - # Update aggregator private attributes by taking latest - # parameters from clone, then delete attributes from clone. - for attr_name in self.private_attributes: - if hasattr(clone, attr_name): - self.private_attributes.update({attr_name: getattr(clone, attr_name)}) - delattr(clone, attr_name) - - def execute_func(self, ctx: Any, f_name: str, callback: Callable, - clones: Optional[Any] = None) -> Any: - """ - Execute remote function f - """ - self.__set_agg_attrs_to_clone(ctx) - - if clones is not None: - callback(ctx, f_name, clones) - else: - callback(ctx, f_name) - - self.__delete_agg_attrs_from_clone(ctx) - - return ctx +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""openfl.experimental.interface.participants module.""" + +from typing import Dict, Any +from typing import Callable, Optional + + +class Participant: + def __init__(self, name: str = ""): + self.private_attributes = {} + self._name = name.lower() + + @property + def name(self): + return self._name + + @name.setter + def name(self, name: str): + self._name = name.lower() + + def private_attributes(self, attrs: Dict[str, Any]) -> None: + """ + Set the private attributes of the participant. These attributes will + only be available within the tasks performed by the participants and + will be filtered out prior to the task's state being transfered. + + Args: + attrs: dictionary of ATTRIBUTE_NAME (str) -> object that will be accessible + within the participant's task. + + Example: + {'train_loader' : torch.utils.data.DataLoader(...)} + + In any task performed by this participant performed within the flow, + this attribute could be referenced with self.train_loader + """ + self.private_attributes = attrs + + +class Collaborator(Participant): + """ + Defines a collaborator participant + """ + + def __init__( + self, + name: str = "", + private_attributes_callable: Callable = None, + num_cpus: int = 0, + num_gpus: int = 0.0, + **kwargs + ): + """ + Create collaborator object with custom resources and a callable + function to assign private attributes + + Parameters: + name (str): Name of the collaborator. default="" + + private_attributes_callable (Callable): A function which returns collaborator + private attributes for each collaborator. In case private_attributes are not + required this can be omitted. default=None + + num_cpus (int): Specifies how many cores to use for the collaborator step exection. + This will only be used if backend is set to ray. default=0 + + num_gpus (float): Specifies how many GPUs to use to accerlerate the collaborator + step exection. This will only be used if backend is set to ray. default=0 + + kwargs (dict): Parameters required to call private_attributes_callable function. + The key of the dictionary must match the arguments to the private_attributes_callable. + default={} + """ + super().__init__(name=name) + self.num_cpus = num_cpus + self.num_gpus = num_gpus + self.kwargs = kwargs + + if private_attributes_callable is None: + self.private_attributes_callable = private_attributes_callable + else: + if not callable(private_attributes_callable): + raise Exception( + "private_attributes_callable parameter must be a callable" + ) + else: + self.private_attributes_callable = private_attributes_callable + + def get_name(self) -> str: + """Get collaborator name""" + return self._name + + def initialize_private_attributes(self) -> None: + """ + initialize private attributes of Collaborator object by invoking + the callable specified by user + """ + if self.private_attributes_callable is not None: + self.private_attributes = self.private_attributes_callable(**self.kwargs) + + def __set_collaborator_attrs_to_clone(self, clone: Any) -> None: + """ + Set collaborator private attributes to FLSpec clone before transitioning + from Aggregator step to collaborator steps + """ + # set collaborator private attributes as + # clone attributes + for name, attr in self.private_attributes.items(): + setattr(clone, name, attr) + + def __delete_collab_attrs_from_clone(self, clone: Any) -> None: + """ + Remove collaborator private attributes from FLSpec clone before + transitioning from Collaborator step to Aggregator step + """ + # Update collaborator private attributes by taking latest + # parameters from clone, then delete attributes from clone. + for attr_name in self.private_attributes: + if hasattr(clone, attr_name): + self.private_attributes.update({attr_name: getattr(clone, attr_name)}) + delattr(clone, attr_name) + + def execute_func(self, ctx: Any, f_name: str, callback: Callable) -> Any: + """ + Execute remote function f + """ + self.__set_collaborator_attrs_to_clone(ctx) + + callback(ctx, f_name) + + self.__delete_collab_attrs_from_clone(ctx) + + return ctx + + +class Aggregator(Participant): + """ + Defines an aggregator participant + """ + + def __init__( + self, + name: str = "", + private_attributes_callable: Callable = None, + num_cpus: int = 0, + num_gpus: int = 0.0, + **kwargs + ): + """ + Create aggregator object with custom resources and a callable + function to assign private attributes + + Parameters: + name (str): Name of the aggregator. default="" + + private_attributes_callable (Callable): A function which returns aggregator + private attributes. In case private_attributes are not required this can be omitted. + default=None + + num_cpus (int): Specifies how many cores to use for the aggregator step exection. + This will only be used if backend is set to ray. default=0 + + num_gpus (float): Specifies how many GPUs to use to accerlerate the aggregator + step exection. This will only be used if backend is set to ray. default=0 + + kwargs (dict): Parameters required to call private_attributes_callable function. + The key of the dictionary must match the arguments to the private_attributes_callable. + default={} + """ + super().__init__(name=name) + self.num_cpus = num_cpus + self.num_gpus = num_gpus + self.kwargs = kwargs + + if private_attributes_callable is None: + self.private_attributes_callable = private_attributes_callable + else: + if not callable(private_attributes_callable): + raise Exception( + "private_attributes_callable parameter must be a callable" + ) + else: + self.private_attributes_callable = private_attributes_callable + + def get_name(self) -> str: + """Get aggregator name""" + return self.name + + def initialize_private_attributes(self) -> None: + """ + initialize private attributes of Aggregator object by invoking + the callable specified by user + """ + if self.private_attributes_callable is not None: + self.private_attributes = self.private_attributes_callable(**self.kwargs) + + def __set_agg_attrs_to_clone(self, clone: Any) -> None: + """ + Set aggregator private attributes to FLSpec clone before transition + from Aggregator step to collaborator steps + """ + # set aggregator private attributes as + # clone attributes + for name, attr in self.private_attributes.items(): + setattr(clone, name, attr) + + def __delete_agg_attrs_from_clone(self, clone: Any) -> None: + """ + Remove aggregator private attributes from FLSpec clone before + transition from Aggregator step to collaborator steps + """ + # Update aggregator private attributes by taking latest + # parameters from clone, then delete attributes from clone. + for attr_name in self.private_attributes: + if hasattr(clone, attr_name): + self.private_attributes.update({attr_name: getattr(clone, attr_name)}) + delattr(clone, attr_name) + + def execute_func(self, ctx: Any, f_name: str, callback: Callable, + clones: Optional[Any] = None) -> Any: + """ + Execute remote function f + """ + self.__set_agg_attrs_to_clone(ctx) + + if clones is not None: + callback(ctx, f_name, clones) + else: + callback(ctx, f_name) + + self.__delete_agg_attrs_from_clone(ctx) + + return ctx diff --git a/openfl/experimental/placement/placement.py b/openfl/experimental/placement/placement.py index 810994043a..a66b47f72c 100644 --- a/openfl/experimental/placement/placement.py +++ b/openfl/experimental/placement/placement.py @@ -20,7 +20,6 @@ def agg_task(self): ... """ - print(f'Aggregator step "{f.__name__}" registered') f.is_step = True f.decorators = [] diff --git a/openfl/experimental/protocols/README.md b/openfl/experimental/protocols/README.md new file mode 100644 index 0000000000..eb7fe906a0 --- /dev/null +++ b/openfl/experimental/protocols/README.md @@ -0,0 +1,4 @@ +# OpenFL Experimental gRPC protocols + +All `*_pb2*` files are generated automatically during the installation via `pip`. +You can always build these files manually by running `python setup.py build_grpc` command from the root repository directory. diff --git a/openfl/experimental/protocols/__init__.py b/openfl/experimental/protocols/__init__.py new file mode 100644 index 0000000000..e9215e2668 --- /dev/null +++ b/openfl/experimental/protocols/__init__.py @@ -0,0 +1,3 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""openfl.experimental.protocols module.""" diff --git a/openfl/experimental/protocols/aggregator.proto b/openfl/experimental/protocols/aggregator.proto new file mode 100644 index 0000000000..fe77c086ad --- /dev/null +++ b/openfl/experimental/protocols/aggregator.proto @@ -0,0 +1,58 @@ +// Copyright (C) 2020-2023 Intel Corporation +// Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +syntax = "proto3"; + +package openfl.experiment.aggregator; + +import "openfl/protocols/base.proto"; + + +service Aggregator { + rpc SendTaskResults(TaskResultsRequest) returns (TaskResultsResponse) {} + rpc GetTasks(GetTasksRequest) returns (GetTasksResponse) {} + rpc CallCheckpoint(CheckpointRequest) returns (CheckpointResponse) {} +} + +message MessageHeader { + string sender = 1; + string receiver = 2; + string federation_uuid = 3; + string single_col_cert_common_name = 4; +} + +message TaskResultsRequest { + MessageHeader header = 1; + string collab_name = 2; + int32 round_number = 3; + string next_step = 4; + bytes execution_environment = 5; +} + +message TaskResultsResponse { + MessageHeader header = 1; +} + +message GetTasksRequest { + MessageHeader header = 1; +} + +message GetTasksResponse { + MessageHeader header = 1; + int32 round_number = 2; + string function_name = 3; + bytes execution_environment = 4; + int32 sleep_time = 5; + bool quit = 6; +} + +message CheckpointRequest { + MessageHeader header = 1; + bytes execution_environment = 2; + bytes function = 3; + bytes stream_buffer = 4; +} + +message CheckpointResponse { + MessageHeader header = 1; +} diff --git a/openfl/experimental/protocols/interceptors.py b/openfl/experimental/protocols/interceptors.py new file mode 100644 index 0000000000..a54ff76d82 --- /dev/null +++ b/openfl/experimental/protocols/interceptors.py @@ -0,0 +1,78 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""gRPC interceptors module.""" +import collections + +import grpc + + +class _GenericClientInterceptor(grpc.UnaryUnaryClientInterceptor, + grpc.UnaryStreamClientInterceptor, + grpc.StreamUnaryClientInterceptor, + grpc.StreamStreamClientInterceptor): + + def __init__(self, interceptor_function): + self._fn = interceptor_function + + def intercept_unary_unary(self, continuation, client_call_details, request): + new_details, new_request_iterator, postprocess = self._fn( + client_call_details, iter((request,)), False, False) + response = continuation(new_details, next(new_request_iterator)) + return postprocess(response) if postprocess else response + + def intercept_unary_stream(self, continuation, client_call_details, + request): + new_details, new_request_iterator, postprocess = self._fn( + client_call_details, iter((request,)), False, True) + response_it = continuation(new_details, next(new_request_iterator)) + return postprocess(response_it) if postprocess else response_it + + def intercept_stream_unary(self, continuation, client_call_details, + request_iterator): + new_details, new_request_iterator, postprocess = self._fn( + client_call_details, request_iterator, True, False) + response = continuation(new_details, new_request_iterator) + return postprocess(response) if postprocess else response + + def intercept_stream_stream(self, continuation, client_call_details, + request_iterator): + new_details, new_request_iterator, postprocess = self._fn( + client_call_details, request_iterator, True, True) + response_it = continuation(new_details, new_request_iterator) + return postprocess(response_it) if postprocess else response_it + + +def _create_generic_interceptor(intercept_call): + return _GenericClientInterceptor(intercept_call) + + +class _ClientCallDetails( + collections.namedtuple( + '_ClientCallDetails', + ('method', 'timeout', 'metadata', 'credentials') + ), + grpc.ClientCallDetails +): + pass + + +def headers_adder(headers): + """Create interceptor with added headers.""" + + def intercept_call(client_call_details, request_iterator, request_streaming, + response_streaming): + metadata = [] + if client_call_details.metadata is not None: + metadata = list(client_call_details.metadata) + for header, value in headers.items(): + metadata.append(( + header, + value, + )) + client_call_details = _ClientCallDetails( + client_call_details.method, client_call_details.timeout, metadata, + client_call_details.credentials) + return client_call_details, request_iterator, None + + return _create_generic_interceptor(intercept_call) diff --git a/openfl/experimental/protocols/utils.py b/openfl/experimental/protocols/utils.py new file mode 100644 index 0000000000..fc6edc7bae --- /dev/null +++ b/openfl/experimental/protocols/utils.py @@ -0,0 +1,262 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""Proto utils.""" + +from openfl.protocols import base_pb2 +from openfl.utilities import TensorKey + + +def model_proto_to_bytes_and_metadata(model_proto): + """Convert the model protobuf to bytes and metadata. + + Args: + model_proto: Protobuf of the model + + Returns: + bytes_dict: Dictionary of the bytes contained in the model protobuf + metadata_dict: Dictionary of the meta data in the model protobuf + """ + bytes_dict = {} + metadata_dict = {} + round_number = None + for tensor_proto in model_proto.tensors: + bytes_dict[tensor_proto.name] = tensor_proto.data_bytes + metadata_dict[tensor_proto.name] = [{ + 'int_to_float': proto.int_to_float, + 'int_list': proto.int_list, + 'bool_list': proto.bool_list + } + for proto in tensor_proto.transformer_metadata + ] + if round_number is None: + round_number = tensor_proto.round_number + else: + assert round_number == tensor_proto.round_number, ( + f'Round numbers in model are inconsistent: {round_number} ' + f'and {tensor_proto.round_number}' + ) + return bytes_dict, metadata_dict, round_number + + +def bytes_and_metadata_to_model_proto(bytes_dict, model_id, model_version, + is_delta, metadata_dict): + """Convert bytes and metadata to model protobuf.""" + model_header = ModelHeader(id=model_id, version=model_version, is_delta=is_delta) # NOQA:F821 + + tensor_protos = [] + for key, data_bytes in bytes_dict.items(): + transformer_metadata = metadata_dict[key] + metadata_protos = [] + for metadata in transformer_metadata: + if metadata.get('int_to_float') is not None: + int_to_float = metadata.get('int_to_float') + else: + int_to_float = {} + + if metadata.get('int_list') is not None: + int_list = metadata.get('int_list') + else: + int_list = [] + + if metadata.get('bool_list') is not None: + bool_list = metadata.get('bool_list') + else: + bool_list = [] + metadata_protos.append(base_pb2.MetadataProto( + int_to_float=int_to_float, + int_list=int_list, + bool_list=bool_list, + )) + tensor_protos.append(TensorProto(name=key, # NOQA:F821 + data_bytes=data_bytes, + transformer_metadata=metadata_protos)) + return base_pb2.ModelProto(header=model_header, tensors=tensor_protos) + + +def construct_named_tensor(tensor_key, nparray, transformer_metadata, lossless): + """Construct named tensor.""" + metadata_protos = [] + for metadata in transformer_metadata: + if metadata.get('int_to_float') is not None: + int_to_float = metadata.get('int_to_float') + else: + int_to_float = {} + + if metadata.get('int_list') is not None: + int_list = metadata.get('int_list') + else: + int_list = [] + + if metadata.get('bool_list') is not None: + bool_list = metadata.get('bool_list') + else: + bool_list = [] + metadata_protos.append(base_pb2.MetadataProto( + int_to_float=int_to_float, + int_list=int_list, + bool_list=bool_list, + )) + + tensor_name, origin, round_number, report, tags = tensor_key + + return base_pb2.NamedTensor( + name=tensor_name, + round_number=round_number, + lossless=lossless, + report=report, + tags=tags, + transformer_metadata=metadata_protos, + data_bytes=nparray, + ) + + +def construct_proto(tensor_dict, model_id, model_version, is_delta, compression_pipeline): + """Construct proto.""" + # compress the arrays in the tensor_dict, and form the model proto + # TODO: Hold-out tensors from the compression pipeline. + bytes_dict = {} + metadata_dict = {} + for key, array in tensor_dict.items(): + bytes_dict[key], metadata_dict[key] = compression_pipeline.forward(data=array) + + # convert the compressed_tensor_dict and metadata to protobuf, and make the new model proto + model_proto = bytes_and_metadata_to_model_proto(bytes_dict=bytes_dict, + model_id=model_id, + model_version=model_version, + is_delta=is_delta, + metadata_dict=metadata_dict) + return model_proto + + +def construct_model_proto(tensor_dict, round_number, tensor_pipe): + """Construct model proto from tensor dict.""" + # compress the arrays in the tensor_dict, and form the model proto + # TODO: Hold-out tensors from the tensor compression pipeline. + named_tensors = [] + for key, nparray in tensor_dict.items(): + bytes_data, transformer_metadata = tensor_pipe.forward(data=nparray) + tensor_key = TensorKey(key, 'agg', round_number, False, ('model',)) + named_tensors.append(construct_named_tensor( + tensor_key, + bytes_data, + transformer_metadata, + lossless=True, + )) + + return base_pb2.ModelProto(tensors=named_tensors) + + +def deconstruct_model_proto(model_proto, compression_pipeline): + """Deconstruct model proto.""" + # extract the tensor_dict and metadata + bytes_dict, metadata_dict, round_number = model_proto_to_bytes_and_metadata(model_proto) + + # decompress the tensors + # TODO: Handle tensors meant to be held-out from the compression pipeline + # (currently none are held out). + tensor_dict = {} + for key in bytes_dict: + tensor_dict[key] = compression_pipeline.backward(data=bytes_dict[key], + transformer_metadata=metadata_dict[key]) + return tensor_dict, round_number + + +def deconstruct_proto(model_proto, compression_pipeline): + """Deconstruct the protobuf. + + Args: + model_proto: The protobuf of the model + compression_pipeline: The compression pipeline object + + Returns: + protobuf: A protobuf of the model + """ + # extract the tensor_dict and metadata + bytes_dict, metadata_dict = model_proto_to_bytes_and_metadata(model_proto) + + # decompress the tensors + # TODO: Handle tensors meant to be held-out from the compression pipeline + # (currently none are held out). + tensor_dict = {} + for key in bytes_dict: + tensor_dict[key] = compression_pipeline.backward(data=bytes_dict[key], + transformer_metadata=metadata_dict[key]) + return tensor_dict + + +def load_proto(fpath): + """Load the protobuf. + + Args: + fpath: The filepath for the protobuf + + Returns: + protobuf: A protobuf of the model + """ + with open(fpath, 'rb') as f: + loaded = f.read() + model = base_pb2.ModelProto().FromString(loaded) + return model + + +def dump_proto(model_proto, fpath): + """Dump the protobuf to a file. + + Args: + model_proto: The protobuf of the model + fpath: The filename to save the model protobuf + + """ + s = model_proto.SerializeToString() + with open(fpath, 'wb') as f: + f.write(s) + + +def datastream_to_proto(proto, stream, logger=None): + """Convert the datastream to the protobuf. + + Args: + model_proto: The protobuf of the model + stream: The data stream from the remote connection + logger: (Optional) The log object + + Returns: + protobuf: A protobuf of the model + """ + npbytes = b'' + for chunk in stream: + npbytes += chunk.npbytes + + if len(npbytes) > 0: + proto.ParseFromString(npbytes) + if logger is not None: + logger.debug(f'datastream_to_proto parsed a {type(proto)}.') + return proto + else: + raise RuntimeError(f'Received empty stream message of type {type(proto)}') + + +def proto_to_datastream(proto, logger, max_buffer_size=(2 * 1024 * 1024)): + """Convert the protobuf to the datastream for the remote connection. + + Args: + model_proto: The protobuf of the model + logger: The log object + max_buffer_size: The buffer size (Default= 2*1024*1024) + Returns: + reply: The message for the remote connection. + """ + npbytes = proto.SerializeToString() + data_size = len(npbytes) + buffer_size = data_size if max_buffer_size > data_size else max_buffer_size + logger.debug(f'Setting stream chunks with size {buffer_size} for proto of type {type(proto)}') + + for i in range(0, data_size, buffer_size): + chunk = npbytes[i: i + buffer_size] + reply = base_pb2.DataStream(npbytes=chunk, size=len(chunk)) + yield reply + + +def get_headers(context) -> dict: + """Get headers from context.""" + return {header[0]: header[1] for header in context.invocation_metadata()} diff --git a/openfl/experimental/runtime/federated_runtime.py b/openfl/experimental/runtime/federated_runtime.py index 954161e676..da7ef3efb2 100644 --- a/openfl/experimental/runtime/federated_runtime.py +++ b/openfl/experimental/runtime/federated_runtime.py @@ -1,20 +1,65 @@ # Copyright (C) 2020-2023 Intel Corporation # SPDX-License-Identifier: Apache-2.0 -""" openfl.experimental.runtime module FederatedRuntime class.""" +""" openfl.experimental.runtime package LocalRuntime class.""" from __future__ import annotations from openfl.experimental.runtime import Runtime -from typing import TYPE_CHECKING, Type, List +from typing import TYPE_CHECKING + if TYPE_CHECKING: - from openfl.experimental.interface import Aggregator, Collaborator + from openfl.experimental.interface import Aggregator + from openfl.experimental.interface import Collaborator + +from typing import List +from typing import Type class FederatedRuntime(Runtime): def __init__( - self, - aggregator: Type[Aggregator], - collaborators: List[Type[Collaborator]] = None + self, + aggregator: str = None, + collaborators: List[str] = None, + **kwargs, ) -> None: - """Use remote federated infrastructure to run the flow""" - raise NotImplementedError("FederatedRuntime will be implemented in the future") + """ + Use single node to run the flow + + Args: + aggregator: Name of the aggregator. + collaborators: List of collaborator names. + + Returns: + None + """ + super().__init__() + if aggregator is not None: + self.aggregator = aggregator + + if collaborators is not None: + self.collaborators = collaborators + + @property + def aggregator(self) -> str: + """Returns name of _aggregator""" + return self._aggregator + + @aggregator.setter + def aggregator(self, aggregator_name: Type[Aggregator]): + """Set LocalRuntime _aggregator""" + self._aggregator = aggregator_name + + @property + def collaborators(self) -> List[str]: + """ + Return names of collaborators. Don't give direct access to private attributes + """ + return self.__collaborators + + @collaborators.setter + def collaborators(self, collaborators: List[Type[Collaborator]]): + """Set LocalRuntime collaborators""" + self.__collaborators = collaborators + + def __repr__(self): + return "FederatedRuntime" diff --git a/openfl/experimental/runtime/local_runtime.py b/openfl/experimental/runtime/local_runtime.py index 2788cf7a6a..4bf46bc141 100644 --- a/openfl/experimental/runtime/local_runtime.py +++ b/openfl/experimental/runtime/local_runtime.py @@ -1,686 +1,702 @@ -# Copyright (C) 2020-2023 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -""" openfl.experimental.runtime package LocalRuntime class.""" - -from __future__ import annotations -from copy import deepcopy -import importlib -import ray -import os -import gc -from openfl.experimental.runtime import Runtime -from typing import TYPE_CHECKING, Optional -import math - -if TYPE_CHECKING: - from openfl.experimental.interface import Aggregator, Collaborator, FLSpec - -from openfl.experimental.utilities import ( - ResourcesNotAvailableError, - aggregator_to_collaborator, - generate_artifacts, - filter_attributes, - checkpoint, - get_number_of_gpus, - check_resource_allocation, -) -from typing import List, Any -from typing import Dict, Type, Callable - - -class RayExecutor: - def __init__(self): - """Create RayExecutor object""" - self.__remote_contexts = [] - - def ray_call_put( - self, - participant: Any, - ctx: Any, - f_name: str, - callback: Callable, - clones: Optional[Any] = None, - ) -> None: - """ - Execute f_name from inside participant (Aggregator or Collaborator) class with the context - of clone (ctx) - """ - if clones is not None: - self.__remote_contexts.append( - participant.execute_func.remote(ctx, f_name, callback, clones) - ) - else: - self.__remote_contexts.append( - participant.execute_func.remote(ctx, f_name, callback) - ) - - def ray_call_get(self) -> List[Any]: - """ - Get remote clones and delete ray references of clone (ctx) and, - reclaim memory - """ - clones = ray.get(self.__remote_contexts) - del self.__remote_contexts - self.__remote_contexts = [] - - return clones - - -def ray_group_assign(collaborators, num_actors=1): - """ - Assigns collaborators to resource groups which share a CUDA context. - - Args: - collaborators (list): The list of collaborators. - num_actors (int, optional): Number of actors to distribute collaborators to. - Defaults to 3. - - Returns: - list: A list of GroupMember instances. - """ - - class GroupMember: - """ - A utility class that manages the collaborator and its group. - - This class maintains compatibility with runtime execution by assigning attributes for each - function in the Collaborator interface in conjunction with RemoteHelper. - """ - - def __init__(self, collaborator_actor, collaborator): - """ - Initializes a new instance of the GroupMember class. - - Args: - collaborator_actor: The collaborator actor. - collaborator: The collaborator. - """ - from openfl.experimental.interface import Collaborator - - all_methods = [ - method - for method in dir(Collaborator) - if callable(getattr(Collaborator, method)) - ] - external_methods = [method for method in all_methods if (method[0] != "_")] - self.collaborator_actor = collaborator_actor - self.collaborator = collaborator - for method in external_methods: - setattr( - self, - method, - RemoteHelper(self.collaborator_actor, self.collaborator, method), - ) - - class RemoteHelper: - """ - A utility class to maintain compatibility with RayExecutor. - - This class returns a lambda function that uses collaborator_actor.execute_from_col to run - a given function from the given collaborator. - """ - - # once ray_grouped replaces the current ray runtime this class can be replaced with a - # funtion that returns the lambda funtion, using a funtion is necesary because this is used - # in setting multiple funtions in a loop and lambda takes the reference to self.f_name and - # not the value so we need to change scope to avoid self.f_name from changing as the loop - # progresses - def __init__(self, collaborator_actor, collaborator, f_name) -> None: - """ - Initializes a new instance of the RemoteHelper class. - - Args: - collaborator_actor: The collaborator actor. - collaborator: The collaborator. - f_name (str): The name of the function. - """ - self.f_name = f_name - self.collaborator_actor = collaborator_actor - self.collaborator = collaborator - self.f = ( - lambda *args, **kwargs: self.collaborator_actor.execute_from_col.remote( - self.collaborator, self.f_name, *args, **kwargs - ) - ) - - def remote(self, *args, **kwargs): - """ - Executes the function with the given arguments and keyword arguments. - - Args: - *args: The arguments to pass to the function. - **kwargs: The keyword arguments to pass to the function. - - Returns: - The result of the function execution. - """ - return self.f(*args, *kwargs) - - collaborator_ray_refs = [] - collaborators_per_group = math.ceil(len(collaborators) / num_actors) - times_called = 0 - # logic to sort collaborators by gpus, if collaborators have the same number of gpu then they - # are sorted by cpu - cpu_magnitude = len(str(abs(max([i.num_cpus for i in collaborators])))) - min_gpu = min([i.num_gpus for i in collaborators]) - min_gpu = max(min_gpu, 0.0001) - collaborators_sorted_by_gpucpu = sorted( - collaborators, - key=lambda x: x.num_gpus / min_gpu * 10**cpu_magnitude + x.num_cpus, - ) - initializations = [] - - for collaborator in collaborators_sorted_by_gpucpu: - # initialize actor group - if times_called % collaborators_per_group == 0: - max_num_cpus = max( - [ - i.num_cpus - for i in collaborators_sorted_by_gpucpu[ - times_called: times_called + collaborators_per_group - ] - ] - ) - max_num_gpus = max( - [ - i.num_gpus - for i in collaborators_sorted_by_gpucpu[ - times_called: times_called + collaborators_per_group - ] - ] - ) - print(f"creating actor with {max_num_cpus}, {max_num_gpus}") - collaborator_actor = ( - ray.remote(RayGroup) - .options( - num_cpus=max_num_cpus, num_gpus=max_num_gpus - ) # max_concurrency=max_concurrency) - .remote() - ) - # add collaborator to actor group - initializations.append( - collaborator_actor.append.remote( - collaborator.get_name(), - private_attributes_callable=collaborator.private_attributes_callable, - **collaborator.kwargs, - ) - ) - - times_called += 1 - - # append GroupMember to output list - collaborator_ray_refs.append( - GroupMember(collaborator_actor, collaborator.get_name()) - ) - # Wait for all collaborators to be created on actors - ray.get(initializations) - - return collaborator_ray_refs - - -class RayGroup: - """ - A Ray actor that manages a group of collaborators. - - This class allows for the execution of functions from a specified collaborator - using the execute_from_col method. The collaborators are stored in a dictionary - where the key is the collaborator's name. - """ - - def __init__(self): - """ - Initializes a new instance of the RayGroup class. - """ - self.collaborators = {} - - def append( - self, - name: str = "", - private_attributes_callable: Callable = None, - **kwargs, - ): - """ - Appends a new collaborator to the group. - - Args: - name (str): The name of the collaborator. - private_attributes_callable (Callable): A callable that sets the private attributes of - the collaborator. - **kwargs: Additional keyword arguments. - """ - from openfl.experimental.interface import Collaborator - - self.collaborators[name] = Collaborator( - name=name, - private_attributes_callable=private_attributes_callable, - **kwargs, - ) - - def execute_from_col(self, name, internal_f_name, *args, **kwargs): - """ - Executes a function from a specified collaborator. - - Args: - name (str): The name of the collaborator. - internal_f_name (str): The name of the function to execute. - *args: Additional arguments to pass to the function. - **kwargs: Additional keyword arguments to pass to the function. - - Returns: - The result of the function execution. - """ - f = getattr(self.collaborators[name], internal_f_name) - return f(*args, **kwargs) - - def get_collaborator(self, name): - """ - Retrieves a collaborator from the group by name. - - Args: - name (str): The name of the collaborator. - - Returns: - The collaborator instance. - """ - return self.collaborators[name] - - -class LocalRuntime(Runtime): - def __init__( - self, - aggregator: Dict = None, - collaborators: Dict = None, - backend: str = "single_process", - **kwargs, - ) -> None: - """ - Use single node to run the flow - - Args: - aggregator: The aggregator instance that holds private attributes - collaborators: A list of collaborators; each with their own private attributes - backend: The backend that will execute the tasks. Available options are: - - 'single_process': (default) Executes every task within the same process - - 'ray': Executes tasks using the Ray library. We use ray - actors called RayGroups to runs tasks in their own - isolated process. Each participant is distributed - into a ray group. The RayGroups run concurrently - while participants in the group run serially. - The default is 1 RayGroup and can be changed by using - the num_actors=1 kwarg. By using more RayGroups more - concurency is allowed with the trade off being that - each RayGroup has extra memory overhead in the form - of extra CUDA CONTEXTS. - - Also the ray runtime supports GPU isolation using - Ray's 'num_gpus' argument, which can be passed in - through the collaborator placement decorator. - - Example: - @collaborator(num_gpus=1) - def some_collaborator_task(self): - ... - - - By selecting num_gpus=1, the task is guaranteed - exclusive GPU access. If the system has one GPU, - collaborator tasks will run sequentially. - - """ - super().__init__() - if backend not in ["ray", "single_process"]: - raise ValueError( - f"Invalid 'backend' value '{backend}', accepted values are " - + "'ray', or 'single_process'" - ) - if backend == "ray": - if not ray.is_initialized(): - dh = kwargs.get("dashboard_host", "127.0.0.1") - dp = kwargs.get("dashboard_port", 5252) - ray.init(dashboard_host=dh, dashboard_port=dp) - - self.num_actors = kwargs.get("num_actors", 1) - self.backend = backend - if aggregator is not None: - self.aggregator = self.__get_aggregator_object(aggregator) - - if collaborators is not None: - self.collaborators = self.__get_collaborator_object(collaborators) - - def __get_aggregator_object(self, aggregator: Type[Aggregator]) -> Any: - """Get aggregator object based on localruntime backend""" - - if self.backend == "single_process": - return aggregator - - total_available_cpus = os.cpu_count() - total_available_gpus = get_number_of_gpus() - - agg_cpus = aggregator.num_cpus - agg_gpus = aggregator.num_gpus - - if agg_gpus > 0: - check_resource_allocation( - total_available_gpus, - {aggregator.get_name(): agg_gpus}, - ) - - if total_available_gpus < agg_gpus: - raise ResourcesNotAvailableError( - f"cannot assign more than available GPUs \ - ({agg_gpus} < {total_available_gpus})." - ) - if total_available_cpus < agg_cpus: - raise ResourcesNotAvailableError( - f"cannot assign more than available CPUs \ - ({agg_cpus} < {total_available_cpus})." - ) - - interface_module = importlib.import_module("openfl.experimental.interface") - aggregator_class = getattr(interface_module, "Aggregator") - - aggregator_actor = ray.remote(aggregator_class).options( - num_cpus=agg_cpus, num_gpus=agg_gpus - ) - aggregator_actor_ref = aggregator_actor.remote( - name=aggregator.get_name(), - private_attributes_callable=aggregator.private_attributes_callable, - **aggregator.kwargs, - ) - - return aggregator_actor_ref - - def __get_collaborator_object(self, collaborators: List) -> Any: - """Get collaborator object based on localruntime backend""" - - if self.backend == "single_process": - return collaborators - - total_available_cpus = os.cpu_count() - total_required_cpus = sum( - [collaborator.num_cpus for collaborator in collaborators] - ) - if total_available_cpus < total_required_cpus: - raise ResourcesNotAvailableError( - f"cannot assign more than available CPUs \ - ({total_required_cpus} < {total_available_cpus})." - ) - - if self.backend == "ray": - collaborator_ray_refs = ray_group_assign( - collaborators, num_actors=self.num_actors - ) - return collaborator_ray_refs - - @property - def aggregator(self) -> str: - """Returns name of _aggregator""" - return self._aggregator.name - - @aggregator.setter - def aggregator(self, aggregator: Type[Aggregator]): - """Set LocalRuntime _aggregator""" - self._aggregator = aggregator - - @property - def collaborators(self) -> List[str]: - """ - Return names of collaborators. Don't give direct access to private attributes - """ - return list(self.__collaborators.keys()) - - @collaborators.setter - def collaborators(self, collaborators: List[Type[Collaborator]]): - """Set LocalRuntime collaborators""" - if self.backend == "single_process": - - def get_collab_name(collab): - return collab.get_name() - - else: - - def get_collab_name(collab): - return ray.get(collab.get_name.remote()) - - self.__collaborators = { - get_collab_name(collaborator): collaborator - for collaborator in collaborators - } - - def initialize_aggregator(self): - """initialize aggregator private attributes""" - if self.backend == "single_process": - self._aggregator.initialize_private_attributes() - else: - ray.get(self._aggregator.initialize_private_attributes.remote()) - - def initialize_collaborators(self): - """initialize collaborator private attributes""" - if self.backend == "single_process": - - def init_private_attrs(collab): - return collab.initialize_private_attributes() - - else: - - def init_private_attrs(collab): - return ray.get(collab.initialize_private_attributes.remote()) - - for collaborator in self.__collaborators.values(): - init_private_attrs(collaborator) - - def restore_instance_snapshot( - self, ctx: Type[FLSpec], instance_snapshot: List[Type[FLSpec]] - ): - """Restores attributes from backup (in instance snapshot) to ctx""" - for backup in instance_snapshot: - artifacts_iter, _ = generate_artifacts(ctx=backup) - for name, attr in artifacts_iter(): - if not hasattr(ctx, name): - setattr(ctx, name, attr) - - def execute_agg_steps(self, ctx: Any, f_name: str, clones: Optional[Any] = None): - """ - Execute aggregator steps until at transition point - """ - if clones is not None: - f = getattr(ctx, f_name) - f(clones) - else: - not_at_transition_point = True - while not_at_transition_point: - f = getattr(ctx, f_name) - f() - - f, parent_func = ctx.execute_task_args[:2] - if aggregator_to_collaborator(f, parent_func) or f.__name__ == "end": - not_at_transition_point = False - - f_name = f.__name__ - - def execute_collab_steps(self, ctx: Any, f_name: str): - """ - Execute collaborator steps until at transition point - """ - not_at_transition_point = True - while not_at_transition_point: - f = getattr(ctx, f_name) - f() - - f, parent_func = ctx.execute_task_args[:2] - if ctx._is_at_transition_point(f, parent_func): - not_at_transition_point = False - - f_name = f.__name__ - - def execute_task(self, flspec_obj: Type[FLSpec], f: Callable, **kwargs): - """ - Defines which function to be executed based on name and kwargs - Updates the arguments and executes until end is not reached - - Args: - flspec_obj: Reference to the FLSpec (flow) object. Contains information - about task sequence, flow attributes. - f: The next task to be executed within the flow - - Returns: - artifacts_iter: Iterator with updated sequence of values - """ - parent_func = None - instance_snapshot = None - self.join_step = False - - while f.__name__ != "end": - if "foreach" in kwargs: - flspec_obj = self.execute_collab_task( - flspec_obj, f, parent_func, instance_snapshot, **kwargs - ) - else: - flspec_obj = self.execute_agg_task(flspec_obj, f) - f, parent_func, instance_snapshot, kwargs = flspec_obj.execute_task_args - else: - flspec_obj = self.execute_agg_task(flspec_obj, f) - f = flspec_obj.execute_task_args[0] - - checkpoint(flspec_obj, f) - artifacts_iter, _ = generate_artifacts(ctx=flspec_obj) - return artifacts_iter() - - def execute_agg_task(self, flspec_obj, f): - """ - Performs execution of aggregator task - Args: - flspec_obj : Reference to the FLSpec (flow) object - f : The task to be executed within the flow - - Returns: - flspec_obj: updated FLSpec (flow) object - """ - from openfl.experimental.interface import FLSpec - - aggregator = self._aggregator - clones = None - - if self.join_step: - clones = [FLSpec._clones[col] for col in self.selected_collaborators] - self.join_step = False - - if self.backend == "ray": - ray_executor = RayExecutor() - ray_executor.ray_call_put( - aggregator, flspec_obj, f.__name__, self.execute_agg_steps, clones - ) - flspec_obj = ray_executor.ray_call_get()[0] - del ray_executor - else: - aggregator.execute_func( - flspec_obj, f.__name__, self.execute_agg_steps, clones - ) - - gc.collect() - return flspec_obj - - def execute_collab_task( - self, flspec_obj, f, parent_func, instance_snapshot, **kwargs - ): - """ - Performs - 1. Filter include/exclude - 2. Set runtime, collab private attributes , metaflow_interface - 3. Execution of all collaborator for each task - 4. Remove collaborator private attributes - 5. Execute the next function after transition - - Args: - flspec_obj : Reference to the FLSpec (flow) object - f : The task to be executed within the flow - parent_func : The prior task executed in the flow - instance_snapshot : A prior FLSpec state that needs to be restored - - Returns: - flspec_obj: updated FLSpec (flow) object - """ - - from openfl.experimental.interface import ( - FLSpec, - ) - - flspec_obj._foreach_methods.append(f.__name__) - selected_collaborators = getattr(flspec_obj, kwargs["foreach"]) - self.selected_collaborators = selected_collaborators - - # filter exclude/include attributes for clone - self.filter_exclude_include(flspec_obj, f, selected_collaborators, **kwargs) - - if self.backend == "ray": - ray_executor = RayExecutor() - # set runtime,collab private attributes and metaflowinterface - for col in selected_collaborators: - clone = FLSpec._clones[col] - # Set new LocalRuntime for clone as it is required - # new runtime object will not contain private attributes of - # aggregator or other collaborators - clone.runtime = LocalRuntime(backend="single_process") - - # write the clone to the object store - # ensure clone is getting latest _metaflow_interface - clone._metaflow_interface = flspec_obj._metaflow_interface - - for collab_name in selected_collaborators: - clone = FLSpec._clones[collab_name] - collaborator = self.__collaborators[collab_name] - - if self.backend == "ray": - ray_executor.ray_call_put( - collaborator, clone, f.__name__, self.execute_collab_steps - ) - else: - collaborator.execute_func(clone, f.__name__, self.execute_collab_steps) - - if self.backend == "ray": - clones = ray_executor.ray_call_get() - FLSpec._clones.update(zip(selected_collaborators, clones)) - clone = clones[0] - del clones - - flspec_obj.execute_task_args = clone.execute_task_args - - # Restore the flspec_obj state if back-up is taken - self.restore_instance_snapshot(flspec_obj, instance_snapshot) - del instance_snapshot - - gc.collect() - # Setting the join_step to indicate to aggregator to collect clones - self.join_step = True - return flspec_obj - - def filter_exclude_include(self, flspec_obj, f, selected_collaborators, **kwargs): - """ - This function filters exclude/include attributes - Args: - flspec_obj : Reference to the FLSpec (flow) object - f : The task to be executed within the flow - selected_collaborators : all collaborators - """ - - from openfl.experimental.interface import ( - FLSpec, - ) - - for col in selected_collaborators: - clone = FLSpec._clones[col] - clone.input = col - if ("exclude" in kwargs and hasattr(clone, kwargs["exclude"][0])) or ( - "include" in kwargs and hasattr(clone, kwargs["include"][0]) - ): - filter_attributes(clone, f, **kwargs) - artifacts_iter, _ = generate_artifacts(ctx=flspec_obj) - for name, attr in artifacts_iter(): - setattr(clone, name, deepcopy(attr)) - clone._foreach_methods = flspec_obj._foreach_methods - - def __repr__(self): - return "LocalRuntime" +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +""" openfl.experimental.runtime package LocalRuntime class.""" + +from __future__ import annotations +from copy import deepcopy +import importlib +import ray +import os +import gc +from openfl.experimental.runtime import Runtime +from typing import TYPE_CHECKING, Optional +import math + +if TYPE_CHECKING: + from openfl.experimental.interface import Aggregator, Collaborator, FLSpec + +from openfl.experimental.utilities import ( + ResourcesNotAvailableError, + aggregator_to_collaborator, + generate_artifacts, + filter_attributes, + checkpoint, + get_number_of_gpus, + check_resource_allocation, +) +from typing import List, Any +from typing import Dict, Type, Callable + + +class RayExecutor: + def __init__(self): + """Create RayExecutor object""" + self.__remote_contexts = [] + + def ray_call_put( + self, + participant: Any, + ctx: Any, + f_name: str, + callback: Callable, + clones: Optional[Any] = None, + ) -> None: + """ + Execute f_name from inside participant (Aggregator or Collaborator) class with the context + of clone (ctx) + """ + if clones is not None: + self.__remote_contexts.append( + participant.execute_func.remote(ctx, f_name, callback, clones) + ) + else: + self.__remote_contexts.append( + participant.execute_func.remote(ctx, f_name, callback) + ) + + def ray_call_get(self) -> List[Any]: + """ + Get remote clones and delete ray references of clone (ctx) and, + reclaim memory + """ + clones = ray.get(self.__remote_contexts) + del self.__remote_contexts + self.__remote_contexts = [] + + return clones + + +def ray_group_assign(collaborators, num_actors=1): + """ + Assigns collaborators to resource groups which share a CUDA context. + + Args: + collaborators (list): The list of collaborators. + num_actors (int, optional): Number of actors to distribute collaborators to. + Defaults to 3. + + Returns: + list: A list of GroupMember instances. + """ + + class GroupMember: + """ + A utility class that manages the collaborator and its group. + + This class maintains compatibility with runtime execution by assigning attributes for each + function in the Collaborator interface in conjunction with RemoteHelper. + """ + + def __init__(self, collaborator_actor, collaborator): + """ + Initializes a new instance of the GroupMember class. + + Args: + collaborator_actor: The collaborator actor. + collaborator: The collaborator. + """ + from openfl.experimental.interface import Collaborator + + all_methods = [ + method + for method in dir(Collaborator) + if callable(getattr(Collaborator, method)) + ] + external_methods = [method for method in all_methods if (method[0] != "_")] + self.collaborator_actor = collaborator_actor + self.collaborator = collaborator + for method in external_methods: + setattr( + self, + method, + RemoteHelper(self.collaborator_actor, self.collaborator, method), + ) + + class RemoteHelper: + """ + A utility class to maintain compatibility with RayExecutor. + + This class returns a lambda function that uses collaborator_actor.execute_from_col to run + a given function from the given collaborator. + """ + + # once ray_grouped replaces the current ray runtime this class can be replaced with a + # funtion that returns the lambda funtion, using a funtion is necesary because this is used + # in setting multiple funtions in a loop and lambda takes the reference to self.f_name and + # not the value so we need to change scope to avoid self.f_name from changing as the loop + # progresses + def __init__(self, collaborator_actor, collaborator, f_name) -> None: + """ + Initializes a new instance of the RemoteHelper class. + + Args: + collaborator_actor: The collaborator actor. + collaborator: The collaborator. + f_name (str): The name of the function. + """ + self.f_name = f_name + self.collaborator_actor = collaborator_actor + self.collaborator = collaborator + self.f = ( + lambda *args, **kwargs: self.collaborator_actor.execute_from_col.remote( + self.collaborator, self.f_name, *args, **kwargs + ) + ) + + def remote(self, *args, **kwargs): + """ + Executes the function with the given arguments and keyword arguments. + + Args: + *args: The arguments to pass to the function. + **kwargs: The keyword arguments to pass to the function. + + Returns: + The result of the function execution. + """ + return self.f(*args, *kwargs) + + collaborator_ray_refs = [] + collaborators_per_group = math.ceil(len(collaborators) / num_actors) + times_called = 0 + # logic to sort collaborators by gpus, if collaborators have the same number of gpu then they + # are sorted by cpu + cpu_magnitude = len(str(abs(max([i.num_cpus for i in collaborators])))) + min_gpu = min([i.num_gpus for i in collaborators]) + min_gpu = max(min_gpu, 0.0001) + collaborators_sorted_by_gpucpu = sorted( + collaborators, + key=lambda x: x.num_gpus / min_gpu * 10**cpu_magnitude + x.num_cpus, + ) + initializations = [] + + for collaborator in collaborators_sorted_by_gpucpu: + # initialize actor group + if times_called % collaborators_per_group == 0: + max_num_cpus = max( + [ + i.num_cpus + for i in collaborators_sorted_by_gpucpu[ + times_called: times_called + collaborators_per_group + ] + ] + ) + max_num_gpus = max( + [ + i.num_gpus + for i in collaborators_sorted_by_gpucpu[ + times_called: times_called + collaborators_per_group + ] + ] + ) + print(f"creating actor with {max_num_cpus}, {max_num_gpus}") + collaborator_actor = ( + ray.remote(RayGroup) + .options( + num_cpus=max_num_cpus, num_gpus=max_num_gpus + ) # max_concurrency=max_concurrency) + .remote() + ) + # add collaborator to actor group + initializations.append( + collaborator_actor.append.remote( + collaborator.get_name(), + private_attributes_callable=collaborator.private_attributes_callable, + **collaborator.kwargs, + ) + ) + + times_called += 1 + + # append GroupMember to output list + collaborator_ray_refs.append( + GroupMember(collaborator_actor, collaborator.get_name()) + ) + # Wait for all collaborators to be created on actors + ray.get(initializations) + + return collaborator_ray_refs + + +class RayGroup: + """ + A Ray actor that manages a group of collaborators. + + This class allows for the execution of functions from a specified collaborator + using the execute_from_col method. The collaborators are stored in a dictionary + where the key is the collaborator's name. + """ + + def __init__(self): + """ + Initializes a new instance of the RayGroup class. + """ + self.collaborators = {} + + def append( + self, + name: str = "", + private_attributes_callable: Callable = None, + **kwargs, + ): + """ + Appends a new collaborator to the group. + + Args: + name (str): The name of the collaborator. + private_attributes_callable (Callable): A callable that sets the private attributes of + the collaborator. + **kwargs: Additional keyword arguments. + """ + from openfl.experimental.interface import Collaborator + + self.collaborators[name] = Collaborator( + name=name, + private_attributes_callable=private_attributes_callable, + **kwargs, + ) + + def execute_from_col(self, name, internal_f_name, *args, **kwargs): + """ + Executes a function from a specified collaborator. + + Args: + name (str): The name of the collaborator. + internal_f_name (str): The name of the function to execute. + *args: Additional arguments to pass to the function. + **kwargs: Additional keyword arguments to pass to the function. + + Returns: + The result of the function execution. + """ + f = getattr(self.collaborators[name], internal_f_name) + return f(*args, **kwargs) + + def get_collaborator(self, name): + """ + Retrieves a collaborator from the group by name. + + Args: + name (str): The name of the collaborator. + + Returns: + The collaborator instance. + """ + return self.collaborators[name] + + +class LocalRuntime(Runtime): + def __init__( + self, + aggregator: Dict = None, + collaborators: Dict = None, + backend: str = "single_process", + **kwargs, + ) -> None: + """ + Use single node to run the flow + + Args: + aggregator: The aggregator instance that holds private attributes + collaborators: A list of collaborators; each with their own private attributes + backend: The backend that will execute the tasks. Available options are: + + 'single_process': (default) Executes every task within the same process + + 'ray': Executes tasks using the Ray library. We use ray + actors called RayGroups to runs tasks in their own + isolated process. Each participant is distributed + into a ray group. The RayGroups run concurrently + while participants in the group run serially. + The default is 1 RayGroup and can be changed by using + the num_actors=1 kwarg. By using more RayGroups more + concurency is allowed with the trade off being that + each RayGroup has extra memory overhead in the form + of extra CUDA CONTEXTS. + + Also the ray runtime supports GPU isolation using + Ray's 'num_gpus' argument, which can be passed in + through the collaborator placement decorator. + + Example: + @collaborator(num_gpus=1) + def some_collaborator_task(self): + ... + + + By selecting num_gpus=1, the task is guaranteed + exclusive GPU access. If the system has one GPU, + collaborator tasks will run sequentially. + """ + super().__init__() + if backend not in ["ray", "single_process"]: + raise ValueError( + f"Invalid 'backend' value '{backend}', accepted values are " + + "'ray', or 'single_process'" + ) + if backend == "ray": + if not ray.is_initialized(): + dh = kwargs.get("dashboard_host", "127.0.0.1") + dp = kwargs.get("dashboard_port", 5252) + ray.init(dashboard_host=dh, dashboard_port=dp) + + self.num_actors = kwargs.get("num_actors", 1) + self.backend = backend + if aggregator is not None: + self.aggregator = self.__get_aggregator_object(aggregator) + + if collaborators is not None: + self.collaborators = self.__get_collaborator_object(collaborators) + + def __get_aggregator_object(self, aggregator: Type[Aggregator]) -> Any: + """Get aggregator object based on localruntime backend""" + + if self.backend == "single_process": + return aggregator + + total_available_cpus = os.cpu_count() + total_available_gpus = get_number_of_gpus() + + agg_cpus = aggregator.num_cpus + agg_gpus = aggregator.num_gpus + + if agg_gpus > 0: + check_resource_allocation( + total_available_gpus, + {aggregator.get_name(): agg_gpus}, + ) + + if total_available_gpus < agg_gpus: + raise ResourcesNotAvailableError( + f"cannot assign more than available GPUs \ + ({agg_gpus} < {total_available_gpus})." + ) + if total_available_cpus < agg_cpus: + raise ResourcesNotAvailableError( + f"cannot assign more than available CPUs \ + ({agg_cpus} < {total_available_cpus})." + ) + + interface_module = importlib.import_module("openfl.experimental.interface") + aggregator_class = getattr(interface_module, "Aggregator") + + aggregator_actor = ray.remote(aggregator_class).options( + num_cpus=agg_cpus, num_gpus=agg_gpus + ) + aggregator_actor_ref = aggregator_actor.remote( + name=aggregator.get_name(), + private_attributes_callable=aggregator.private_attributes_callable, + **aggregator.kwargs, + ) + + return aggregator_actor_ref + + def __get_collaborator_object(self, collaborators: List) -> Any: + """Get collaborator object based on localruntime backend""" + + if self.backend == "single_process": + return collaborators + + total_available_cpus = os.cpu_count() + total_required_cpus = sum( + [collaborator.num_cpus for collaborator in collaborators] + ) + if total_available_cpus < total_required_cpus: + raise ResourcesNotAvailableError( + f"cannot assign more than available CPUs \ + ({total_required_cpus} < {total_available_cpus})." + ) + + if self.backend == "ray": + collaborator_ray_refs = ray_group_assign( + collaborators, num_actors=self.num_actors + ) + return collaborator_ray_refs + + @property + def aggregator(self) -> str: + """Returns name of _aggregator""" + return self._aggregator.name + + @aggregator.setter + def aggregator(self, aggregator: Type[Aggregator]): + """Set LocalRuntime _aggregator""" + self._aggregator = aggregator + + @property + def collaborators(self) -> List[str]: + """ + Return names of collaborators. Don't give direct access to private attributes + """ + return list(self.__collaborators.keys()) + + @collaborators.setter + def collaborators(self, collaborators: List[Type[Collaborator]]): + """Set LocalRuntime collaborators""" + if self.backend == "single_process": + def get_collab_name(collab): + return collab.get_name() + + else: + def get_collab_name(collab): + return ray.get(collab.get_name.remote()) + + self.__collaborators = { + get_collab_name(collaborator): collaborator + for collaborator in collaborators + } + + def get_collaborator_kwargs(self, collaborator_name: str): + """ + Returns kwargs of collaborator + + Args: + collaborator_name: Collaborator name for which kwargs is to be returned + + Returns: + kwargs: Collaborator private_attributes_callable function name, and + arguments required to call it. + """ + collab = self.__collaborators[collaborator_name] + kwargs = {} + if hasattr(collab, "private_attributes_callable"): + if collab.private_attributes_callable is not None: + kwargs.update(collab.kwargs) + kwargs["private_attributes_callable"] = collab.private_attributes_callable.__name__ + + return kwargs + + def initialize_aggregator(self): + """initialize aggregator private attributes""" + if self.backend == "single_process": + self._aggregator.initialize_private_attributes() + else: + ray.get(self._aggregator.initialize_private_attributes.remote()) + + def initialize_collaborators(self): + """initialize collaborator private attributes""" + if self.backend == "single_process": + + def init_private_attrs(collab): + return collab.initialize_private_attributes() + + else: + + def init_private_attrs(collab): + return ray.get(collab.initialize_private_attributes.remote()) + + for collaborator in self.__collaborators.values(): + init_private_attrs(collaborator) + + def restore_instance_snapshot( + self, ctx: Type[FLSpec], instance_snapshot: List[Type[FLSpec]] + ): + """Restores attributes from backup (in instance snapshot) to ctx""" + for backup in instance_snapshot: + artifacts_iter, _ = generate_artifacts(ctx=backup) + for name, attr in artifacts_iter(): + if not hasattr(ctx, name): + setattr(ctx, name, attr) + + def execute_agg_steps(self, ctx: Any, f_name: str, clones: Optional[Any] = None): + """ + Execute aggregator steps until at transition point + """ + if clones is not None: + f = getattr(ctx, f_name) + f(clones) + else: + not_at_transition_point = True + while not_at_transition_point: + f = getattr(ctx, f_name) + f() + + f, parent_func = ctx.execute_task_args[:2] + if aggregator_to_collaborator(f, parent_func) or f.__name__ == "end": + not_at_transition_point = False + + f_name = f.__name__ + + def execute_collab_steps(self, ctx: Any, f_name: str): + """ + Execute collaborator steps until at transition point + """ + not_at_transition_point = True + while not_at_transition_point: + f = getattr(ctx, f_name) + f() + + f, parent_func = ctx.execute_task_args[:2] + if ctx._is_at_transition_point(f, parent_func): + not_at_transition_point = False + + f_name = f.__name__ + + def execute_task(self, flspec_obj: Type[FLSpec], f: Callable, **kwargs): + """ + Defines which function to be executed based on name and kwargs + Updates the arguments and executes until end is not reached + + Args: + flspec_obj: Reference to the FLSpec (flow) object. Contains information + about task sequence, flow attributes. + f: The next task to be executed within the flow + + Returns: + artifacts_iter: Iterator with updated sequence of values + """ + parent_func = None + instance_snapshot = None + self.join_step = False + + while f.__name__ != "end": + if "foreach" in kwargs: + flspec_obj = self.execute_collab_task( + flspec_obj, f, parent_func, instance_snapshot, **kwargs + ) + else: + flspec_obj = self.execute_agg_task(flspec_obj, f) + f, parent_func, instance_snapshot, kwargs = flspec_obj.execute_task_args + else: + flspec_obj = self.execute_agg_task(flspec_obj, f) + f = flspec_obj.execute_task_args[0] + + checkpoint(flspec_obj, f) + artifacts_iter, _ = generate_artifacts(ctx=flspec_obj) + return artifacts_iter() + + def execute_agg_task(self, flspec_obj, f): + """ + Performs execution of aggregator task + Args: + flspec_obj : Reference to the FLSpec (flow) object + f : The task to be executed within the flow + + Returns: + flspec_obj: updated FLSpec (flow) object + """ + from openfl.experimental.interface import FLSpec + aggregator = self._aggregator + clones = None + + if self.join_step: + clones = [FLSpec._clones[col] for col in self.selected_collaborators] + self.join_step = False + + if self.backend == "ray": + ray_executor = RayExecutor() + ray_executor.ray_call_put( + aggregator, flspec_obj, f.__name__, self.execute_agg_steps, clones + ) + flspec_obj = ray_executor.ray_call_get()[0] + del ray_executor + else: + aggregator.execute_func( + flspec_obj, f.__name__, self.execute_agg_steps, clones + ) + + gc.collect() + return flspec_obj + + def execute_collab_task( + self, flspec_obj, f, parent_func, instance_snapshot, **kwargs + ): + """ + Performs + 1. Filter include/exclude + 2. Set runtime, collab private attributes , metaflow_interface + 3. Execution of all collaborator for each task + 4. Remove collaborator private attributes + 5. Execute the next function after transition + + Args: + flspec_obj : Reference to the FLSpec (flow) object + f : The task to be executed within the flow + parent_func : The prior task executed in the flow + instance_snapshot : A prior FLSpec state that needs to be restored + + Returns: + flspec_obj: updated FLSpec (flow) object + """ + + from openfl.experimental.interface import ( + FLSpec, + ) + + flspec_obj._foreach_methods.append(f.__name__) + selected_collaborators = getattr(flspec_obj, kwargs["foreach"]) + self.selected_collaborators = selected_collaborators + + # filter exclude/include attributes for clone + self.filter_exclude_include(flspec_obj, f, selected_collaborators, **kwargs) + + if self.backend == "ray": + ray_executor = RayExecutor() + # set runtime,collab private attributes and metaflowinterface + for col in selected_collaborators: + clone = FLSpec._clones[col] + # Set new LocalRuntime for clone as it is required + # new runtime object will not contain private attributes of + # aggregator or other collaborators + clone.runtime = LocalRuntime(backend="single_process") + + # write the clone to the object store + # ensure clone is getting latest _metaflow_interface + clone._metaflow_interface = flspec_obj._metaflow_interface + + for collab_name in selected_collaborators: + clone = FLSpec._clones[collab_name] + collaborator = self.__collaborators[collab_name] + + if self.backend == "ray": + ray_executor.ray_call_put( + collaborator, clone, f.__name__, self.execute_collab_steps + ) + else: + collaborator.execute_func(clone, f.__name__, self.execute_collab_steps) + + if self.backend == "ray": + clones = ray_executor.ray_call_get() + FLSpec._clones.update(zip(selected_collaborators, clones)) + clone = clones[0] + del clones + + flspec_obj.execute_task_args = clone.execute_task_args + + # Restore the flspec_obj state if back-up is taken + self.restore_instance_snapshot(flspec_obj, instance_snapshot) + del instance_snapshot + + gc.collect() + # Setting the join_step to indicate to aggregator to collect clones + self.join_step = True + return flspec_obj + + def filter_exclude_include(self, flspec_obj, f, selected_collaborators, **kwargs): + """ + This function filters exclude/include attributes + Args: + flspec_obj : Reference to the FLSpec (flow) object + f : The task to be executed within the flow + selected_collaborators : all collaborators + """ + + from openfl.experimental.interface import ( + FLSpec, + ) + + for col in selected_collaborators: + clone = FLSpec._clones[col] + clone.input = col + if ("exclude" in kwargs and hasattr(clone, kwargs["exclude"][0])) or ( + "include" in kwargs and hasattr(clone, kwargs["include"][0]) + ): + filter_attributes(clone, f, **kwargs) + artifacts_iter, _ = generate_artifacts(ctx=flspec_obj) + for name, attr in artifacts_iter(): + setattr(clone, name, deepcopy(attr)) + clone._foreach_methods = flspec_obj._foreach_methods + + def __repr__(self): + return "LocalRuntime" diff --git a/openfl/experimental/transport/__init__.py b/openfl/experimental/transport/__init__.py new file mode 100644 index 0000000000..5b20dba61b --- /dev/null +++ b/openfl/experimental/transport/__init__.py @@ -0,0 +1,12 @@ +# Copyright (C) 2020-2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""openfl.experimental.transport package.""" +from .grpc import AggregatorGRPCClient +from .grpc import AggregatorGRPCServer + + +__all__ = [ + 'AggregatorGRPCServer', + 'AggregatorGRPCClient', +] diff --git a/openfl/experimental/transport/grpc/__init__.py b/openfl/experimental/transport/grpc/__init__.py new file mode 100644 index 0000000000..270fc493c7 --- /dev/null +++ b/openfl/experimental/transport/grpc/__init__.py @@ -0,0 +1,18 @@ +# Copyright (C) 2020-2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""openfl.experimental.transport.grpc package.""" + +from .aggregator_client import AggregatorGRPCClient +from .aggregator_server import AggregatorGRPCServer + + +class ShardNotFoundError(Exception): + """Indicates that director has no information about that shard.""" + + +__all__ = [ + 'AggregatorGRPCServer', + 'AggregatorGRPCClient', + 'ShardNotFoundError', +] diff --git a/openfl/experimental/transport/grpc/aggregator_client.py b/openfl/experimental/transport/grpc/aggregator_client.py new file mode 100644 index 0000000000..3982a7031a --- /dev/null +++ b/openfl/experimental/transport/grpc/aggregator_client.py @@ -0,0 +1,321 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""AggregatorGRPCClient module.""" + +import time +from logging import getLogger +from typing import Optional +from typing import Tuple + +import grpc + +from openfl.experimental.protocols import aggregator_pb2 +from openfl.experimental.protocols import aggregator_pb2_grpc +from openfl.utilities import check_equal + +from .grpc_channel_options import channel_options + + +class ConstantBackoff: + """Constant Backoff policy.""" + + def __init__(self, reconnect_interval, logger, uri): + """Initialize Constant Backoff.""" + self.reconnect_interval = reconnect_interval + self.logger = logger + self.uri = uri + + def sleep(self): + """Sleep for specified interval.""" + self.logger.info(f'Attempting to connect to aggregator at {self.uri}') + time.sleep(self.reconnect_interval) + + +class RetryOnRpcErrorClientInterceptor( + grpc.UnaryUnaryClientInterceptor, grpc.StreamUnaryClientInterceptor +): + """Retry gRPC connection on failure.""" + + def __init__( + self, + sleeping_policy, + status_for_retry: Optional[Tuple[grpc.StatusCode]] = None, + ): + """Initialize function for gRPC retry.""" + self.sleeping_policy = sleeping_policy + self.status_for_retry = status_for_retry + + def _intercept_call(self, continuation, client_call_details, request_or_iterator): + """Intercept the call to the gRPC server.""" + while True: + response = continuation(client_call_details, request_or_iterator) + + if isinstance(response, grpc.RpcError): + + # If status code is not in retryable status codes + self.sleeping_policy.logger.info(f'Response code: {response.code()}') + if ( + self.status_for_retry + and response.code() not in self.status_for_retry + ): + return response + + self.sleeping_policy.sleep() + else: + return response + + def intercept_unary_unary(self, continuation, client_call_details, request): + """Wrap intercept call for unary->unary RPC.""" + return self._intercept_call(continuation, client_call_details, request) + + def intercept_stream_unary( + self, continuation, client_call_details, request_iterator + ): + """Wrap intercept call for stream->unary RPC.""" + return self._intercept_call(continuation, client_call_details, request_iterator) + + +def _atomic_connection(func): + def wrapper(self, *args, **kwargs): + self.reconnect() + response = func(self, *args, **kwargs) + self.disconnect() + return response + + return wrapper + + +def _resend_data_on_reconnection(func): + def wrapper(self, *args, **kwargs): + while True: + try: + response = func(self, *args, **kwargs) + except grpc.RpcError as e: + if e.code() == grpc.StatusCode.UNKNOWN: + self.logger.info( + f'Attempting to resend data request to aggregator at {self.uri}' + ) + elif e.code() == grpc.StatusCode.UNAUTHENTICATED: + raise + continue + break + return response + + return wrapper + + +class AggregatorGRPCClient: + """Client to the aggregator over gRPC-TLS.""" + + def __init__(self, + agg_addr, + agg_port, + tls, + disable_client_auth, + root_certificate, + certificate, + private_key, + aggregator_uuid=None, + federation_uuid=None, + single_col_cert_common_name=None, + **kwargs): + """Initialize.""" + self.uri = f'{agg_addr}:{agg_port}' + self.tls = tls + self.disable_client_auth = disable_client_auth + self.root_certificate = root_certificate + self.certificate = certificate + self.private_key = private_key + + self.logger = getLogger(__name__) + + if not self.tls: + self.logger.warn( + 'gRPC is running on insecure channel with TLS disabled.') + self.channel = self.create_insecure_channel(self.uri) + else: + self.channel = self.create_tls_channel( + self.uri, + self.root_certificate, + self.disable_client_auth, + self.certificate, + self.private_key + ) + + self.header = None + self.aggregator_uuid = aggregator_uuid + self.federation_uuid = federation_uuid + self.single_col_cert_common_name = single_col_cert_common_name + + # Adding an interceptor for RPC Errors + self.interceptors = ( + RetryOnRpcErrorClientInterceptor( + sleeping_policy=ConstantBackoff( + logger=self.logger, + reconnect_interval=int(kwargs.get('client_reconnect_interval', 1)), + uri=self.uri), + status_for_retry=(grpc.StatusCode.UNAVAILABLE,), + ), + ) + self.stub = aggregator_pb2_grpc.AggregatorStub( + grpc.intercept_channel(self.channel, *self.interceptors) + ) + + def create_insecure_channel(self, uri): + """ + Set an insecure gRPC channel (i.e. no TLS) if desired. + + Warns user that this is not recommended. + + Args: + uri: The uniform resource identifier fo the insecure channel + + Returns: + An insecure gRPC channel object + + """ + return grpc.insecure_channel(uri, options=channel_options) + + def create_tls_channel(self, uri, root_certificate, disable_client_auth, + certificate, private_key): + """ + Set an secure gRPC channel (i.e. TLS). + + Args: + uri: The uniform resource identifier fo the insecure channel + root_certificate: The Certificate Authority filename + disable_client_auth (boolean): True disabled client-side + authentication (not recommended, throws warning to user) + certificate: The client certficate filename from the collaborator + (signed by the certificate authority) + + Returns: + An insecure gRPC channel object + """ + with open(root_certificate, 'rb') as f: + root_certificate_b = f.read() + + if disable_client_auth: + self.logger.warn('Client-side authentication is disabled.') + private_key_b = None + certificate_b = None + else: + with open(private_key, 'rb') as f: + private_key_b = f.read() + with open(certificate, 'rb') as f: + certificate_b = f.read() + + credentials = grpc.ssl_channel_credentials( + root_certificates=root_certificate_b, + private_key=private_key_b, + certificate_chain=certificate_b, + ) + + return grpc.secure_channel( + uri, credentials, options=channel_options) + + def _set_header(self, collaborator_name): + self.header = aggregator_pb2.MessageHeader( + sender=collaborator_name, + receiver=self.aggregator_uuid, + federation_uuid=self.federation_uuid, + single_col_cert_common_name=self.single_col_cert_common_name or '' + ) + + def validate_response(self, reply, collaborator_name): + """Validate the aggregator response.""" + # check that the message was intended to go to this collaborator + check_equal(reply.header.receiver, collaborator_name, self.logger) + check_equal(reply.header.sender, self.aggregator_uuid, self.logger) + + # check that federation id matches + check_equal( + reply.header.federation_uuid, + self.federation_uuid, + self.logger + ) + + # check that there is aggrement on the single_col_cert_common_name + check_equal( + reply.header.single_col_cert_common_name, + self.single_col_cert_common_name or '', + self.logger + ) + + def disconnect(self): + """Close the gRPC channel.""" + self.logger.debug(f'Disconnecting from gRPC server at {self.uri}') + self.channel.close() + + def reconnect(self): + """Create a new channel with the gRPC server.""" + # channel.close() is idempotent. Call again here in case it wasn't issued previously + self.disconnect() + + if not self.tls: + self.channel = self.create_insecure_channel(self.uri) + else: + self.channel = self.create_tls_channel( + self.uri, + self.root_certificate, + self.disable_client_auth, + self.certificate, + self.private_key + ) + + self.logger.debug(f'Connecting to gRPC at {self.uri}') + + self.stub = aggregator_pb2_grpc.AggregatorStub( + grpc.intercept_channel(self.channel, *self.interceptors) + ) + + @_atomic_connection + @_resend_data_on_reconnection + def send_task_results(self, collaborator_name, round_number, next_step, + clone_bytes): + """Send next function name to aggregator.""" + self._set_header(collaborator_name) + request = aggregator_pb2.TaskResultsRequest( + header=self.header, + collab_name=collaborator_name, + round_number=round_number, + next_step=next_step, + execution_environment=clone_bytes + ) + + response = self.stub.SendTaskResults(request) + self.validate_response(response, collaborator_name) + + return response.header + + @_atomic_connection + @_resend_data_on_reconnection + def get_tasks(self, collaborator_name): + """Get tasks from the aggregator.""" + self._set_header(collaborator_name) + request = aggregator_pb2.GetTasksRequest(header=self.header) + + response = self.stub.GetTasks(request) + self.validate_response(response, collaborator_name) + + return (response.round_number, response.function_name, + response.execution_environment, response.sleep_time, response.quit) + + @_atomic_connection + @_resend_data_on_reconnection + def call_checkpoint(self, collaborator_name, clone_bytes, function, stream_buffer): + """Perform checkpoint for collaborator task.""" + self._set_header(collaborator_name) + + request = aggregator_pb2.CheckpointRequest( + header=self.header, + execution_environment=clone_bytes, + function=function, + stream_buffer=stream_buffer, + ) + + response = self.stub.CallCheckpoint(request) + self.validate_response(response, collaborator_name) + + return response.header diff --git a/openfl/experimental/transport/grpc/aggregator_server.py b/openfl/experimental/transport/grpc/aggregator_server.py new file mode 100644 index 0000000000..5675036e43 --- /dev/null +++ b/openfl/experimental/transport/grpc/aggregator_server.py @@ -0,0 +1,253 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""AggregatorGRPCServer module.""" + +import logging +from concurrent.futures import ThreadPoolExecutor +from random import random +from multiprocessing import cpu_count +from time import sleep + +from grpc import server +from grpc import ssl_server_credentials +from grpc import StatusCode + +from openfl.experimental.protocols import aggregator_pb2 +from openfl.experimental.protocols import aggregator_pb2_grpc +from openfl.utilities import check_equal +from openfl.utilities import check_is_in + +from .grpc_channel_options import channel_options + +logger = logging.getLogger(__name__) + + +class AggregatorGRPCServer(aggregator_pb2_grpc.AggregatorServicer): + """gRPC server class for the Aggregator.""" + + def __init__(self, + aggregator, + agg_port, + tls=True, + disable_client_auth=False, + root_certificate=None, + certificate=None, + private_key=None, + **kwargs): + """ + Class initializer. + + Args: + aggregator: The aggregator + Args: + fltask (FLtask): The gRPC service task. + tls (bool): To disable the TLS. (Default: True) + disable_client_auth (bool): To disable the client side + authentication. (Default: False) + root_certificate (str): File path to the CA certificate. + certificate (str): File path to the server certificate. + private_key (str): File path to the private key. + kwargs (dict): Additional arguments to pass into function + """ + self.aggregator = aggregator + self.uri = f'[::]:{agg_port}' + self.tls = tls + self.disable_client_auth = disable_client_auth + self.root_certificate = root_certificate + self.certificate = certificate + self.private_key = private_key + self.server = None + self.server_credentials = None + + self.logger = logging.getLogger(__name__) + + def validate_collaborator(self, request, context): + """ + Validate the collaborator. + + Args: + request: The gRPC message request + context: The gRPC context + + Raises: + ValueError: If the collaborator or collaborator certificate is not + valid then raises error. + + """ + if self.tls: + common_name = context.auth_context()[ + 'x509_common_name'][0].decode('utf-8') + collaborator_common_name = request.header.sender + if not self.aggregator.valid_collaborator_cn_and_id( + common_name, collaborator_common_name): + # Random delay in authentication failures + sleep(5 * random()) + context.abort( + StatusCode.UNAUTHENTICATED, + f'Invalid collaborator. CN: |{common_name}| ' + f'collaborator_common_name: |{collaborator_common_name}|') + + def get_header(self, collaborator_name): + """ + Compose and return MessageHeader. + + Args: + collaborator_name : str + The collaborator the message is intended for + """ + return aggregator_pb2.MessageHeader( + sender=self.aggregator.uuid, + receiver=collaborator_name, + federation_uuid=self.aggregator.federation_uuid, + single_col_cert_common_name=self.aggregator.single_col_cert_common_name + ) + + def check_request(self, request): + """ + Validate request header matches expected values. + + Args: + request : protobuf + Request sent from a collaborator that requires validation + """ + # TODO improve this check. the sender name could be spoofed + check_is_in(request.header.sender, self.aggregator.authorized_cols, self.logger) + + # check that the message is for me + check_equal(request.header.receiver, self.aggregator.uuid, self.logger) + + # check that the message is for my federation + check_equal( + request.header.federation_uuid, self.aggregator.federation_uuid, self.logger) + + # check that we agree on the single cert common name + check_equal( + request.header.single_col_cert_common_name, + self.aggregator.single_col_cert_common_name, + self.logger + ) + + def SendTaskResults(self, request, context): # NOQA:N802 + """ + . + + Args: + request: The gRPC message request + context: The gRPC context + + """ + self.validate_collaborator(request, context) + self.check_request(request) + collaborator_name = request.header.sender + round_number = request.round_number, + next_step = request.next_step, + execution_environment = request.execution_environment + + _ = self.aggregator.send_task_results( + collaborator_name, round_number[0], next_step, execution_environment + ) + + return aggregator_pb2.TaskResultsResponse( + header=self.get_header(collaborator_name) + ) + + def GetTasks(self, request, context): # NOQA:N802 + """ + Request a job from aggregator. + + Args: + request: The gRPC message request + context: The gRPC context + """ + self.validate_collaborator(request, context) + self.check_request(request) + collaborator_name = request.header.sender + + rn, f, ee, st, q = self.aggregator.get_tasks( + request.header.sender) + + return aggregator_pb2.GetTasksResponse( + header=self.get_header(collaborator_name), + round_number=rn, + function_name=f, + execution_environment=ee, + sleep_time=st, + quit=q + ) + + def CallCheckpoint(self, request, context): # NOQA:N802 + """ + Request aggregator to perform a checkpoint + for a given function. + + Args: + request: The gRPC message request + context: The gRPC context + """ + self.validate_collaborator(request, context) + self.check_request(request) + collaborator_name = request.header.sender + execution_environment = request.execution_environment + function = request.function + stream_buffer = request.stream_buffer + + self.aggregator.call_checkpoint( + execution_environment, function, stream_buffer + ) + + return aggregator_pb2.CheckpointResponse( + header=self.get_header(collaborator_name) + ) + + def get_server(self): + """Return gRPC server.""" + self.server = server(ThreadPoolExecutor(max_workers=cpu_count()), + options=channel_options) + + aggregator_pb2_grpc.add_AggregatorServicer_to_server(self, self.server) + + if not self.tls: + + self.logger.warn( + 'gRPC is running on insecure channel with TLS disabled.') + port = self.server.add_insecure_port(self.uri) + self.logger.info(f'Insecure port: {port}') + + else: + + with open(self.private_key, 'rb') as f: + private_key_b = f.read() + with open(self.certificate, 'rb') as f: + certificate_b = f.read() + with open(self.root_certificate, 'rb') as f: + root_certificate_b = f.read() + + if self.disable_client_auth: + self.logger.warn('Client-side authentication is disabled.') + + self.server_credentials = ssl_server_credentials( + ((private_key_b, certificate_b),), + root_certificates=root_certificate_b, + require_client_auth=not self.disable_client_auth + ) + + self.server.add_secure_port(self.uri, self.server_credentials) + + return self.server + + def serve(self): + """Start an aggregator gRPC service.""" + self.get_server() + + self.logger.info('Starting Aggregator gRPC Server') + self.server.start() + self.is_server_started = True + try: + while not self.aggregator.all_quit_jobs_sent(): + sleep(5) + except KeyboardInterrupt: + pass + + def stop_server(self): + self.server.stop(0) diff --git a/openfl/experimental/transport/grpc/exceptions.py b/openfl/experimental/transport/grpc/exceptions.py new file mode 100644 index 0000000000..5bd19315c0 --- /dev/null +++ b/openfl/experimental/transport/grpc/exceptions.py @@ -0,0 +1,8 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +"""Exceptions that occur during service interaction.""" + + +class ShardNotFoundError(Exception): + """Indicates that director has no information about that shard.""" diff --git a/openfl/experimental/transport/grpc/grpc_channel_options.py b/openfl/experimental/transport/grpc/grpc_channel_options.py new file mode 100644 index 0000000000..229dd45e51 --- /dev/null +++ b/openfl/experimental/transport/grpc/grpc_channel_options.py @@ -0,0 +1,11 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +max_metadata_size = 32 * 2 ** 20 +max_message_length = 2 ** 30 + +channel_options = [ + ('grpc.max_metadata_size', max_metadata_size), + ('grpc.max_send_message_length', max_message_length), + ('grpc.max_receive_message_length', max_message_length) +] diff --git a/openfl/experimental/utilities/metaflow_utils.py b/openfl/experimental/utilities/metaflow_utils.py index 77112df15c..0d08f5265c 100644 --- a/openfl/experimental/utilities/metaflow_utils.py +++ b/openfl/experimental/utilities/metaflow_utils.py @@ -421,19 +421,20 @@ def create_task(self, task_name: str) -> int: Returns: task_id [int] """ - # May need a lock here - if self.backend == "ray": - with SystemMutex("critical_section"): + with SystemMutex("critical_section"): + if self.backend == "ray": task_id = ray.get(self.counter.get_counter.remote()) self.local_metadata._task_id_seq = task_id self.local_metadata.new_task_id(self.run_id, task_name) return ray.get(self.counter.increment.remote()) - else: - task_id = self.counter - self.local_metadata._task_id_seq = task_id - self.local_metadata.new_task_id(self.run_id, task_name) - self.counter += 1 - return self.counter + else: + # Keeping single_process in critical_section + # because gRPC calls may cause problems. + task_id = self.counter + self.local_metadata._task_id_seq = task_id + self.local_metadata.new_task_id(self.run_id, task_name) + self.counter += 1 + return self.counter def save_artifacts( self, diff --git a/openfl/experimental/utilities/resources.py b/openfl/experimental/utilities/resources.py index 08df76c941..24689bb82e 100644 --- a/openfl/experimental/utilities/resources.py +++ b/openfl/experimental/utilities/resources.py @@ -1,6 +1,5 @@ # Copyright (C) 2020-2023 Intel Corporation # SPDX-License-Identifier: Apache-2.0 - """openfl.experimental.utilities.resources module.""" from logging import getLogger diff --git a/openfl/experimental/utilities/runtime_utils.py b/openfl/experimental/utilities/runtime_utils.py index e122f75438..e39ed9f36d 100644 --- a/openfl/experimental/utilities/runtime_utils.py +++ b/openfl/experimental/utilities/runtime_utils.py @@ -52,7 +52,7 @@ def filter_attributes(ctx, f, **kwargs): if "include" in kwargs and "exclude" in kwargs: raise RuntimeError("'include' and 'exclude' should not both be present") elif "include" in kwargs: - assert type(kwargs["include"]) is list + assert isinstance(kwargs["include"], list) for in_attr in kwargs["include"]: if in_attr not in cls_attrs: raise RuntimeError( @@ -62,7 +62,7 @@ def filter_attributes(ctx, f, **kwargs): if attr not in kwargs["include"]: delattr(ctx, attr) elif "exclude" in kwargs: - assert type(kwargs["exclude"]) is list + assert isinstance(kwargs["exclude"], list) for in_attr in kwargs["exclude"]: if in_attr not in cls_attrs: raise RuntimeError( diff --git a/openfl/experimental/utilities/stream_redirect.py b/openfl/experimental/utilities/stream_redirect.py index 0a3d8b9942..daa2b7bc78 100644 --- a/openfl/experimental/utilities/stream_redirect.py +++ b/openfl/experimental/utilities/stream_redirect.py @@ -21,7 +21,6 @@ def get_stdstream(self): """ Return the contents of stdout and stderr buffers """ - self._stdoutbuff.seek(0) self._stderrbuff.seek(0) @@ -45,6 +44,7 @@ def __init__(self, buffer, destination): self.__stdBuffer = buffer def write(self, message): + message = f"\33[94m{message}\33[0m" self.__stdDestination.write(message) self.__stdBuffer.write(message) @@ -56,7 +56,6 @@ class RedirectStdStreamContext: """ Context Manager that enables redirection of stdout & stderr """ - def __init__(self): self.stdstreambuffer = RedirectStdStreamBuffer() @@ -68,6 +67,7 @@ def __enter__(self): self.__old_stderr = sys.stderr sys.stdout = RedirectStdStream(self.stdstreambuffer._stdoutbuff, sys.stdout) sys.stderr = RedirectStdStream(self.stdstreambuffer._stderrbuff, sys.stderr) + return self.stdstreambuffer def __exit__(self, et, ev, tb): diff --git a/openfl/experimental/workspace_export/__init__.py b/openfl/experimental/workspace_export/__init__.py new file mode 100644 index 0000000000..ba88041c78 --- /dev/null +++ b/openfl/experimental/workspace_export/__init__.py @@ -0,0 +1,6 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from .export import WorkspaceExport + +__all__ = ["WorkspaceExport"] diff --git a/openfl/experimental/workspace_export/export.py b/openfl/experimental/workspace_export/export.py new file mode 100644 index 0000000000..ad338a5906 --- /dev/null +++ b/openfl/experimental/workspace_export/export.py @@ -0,0 +1,401 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""Workspace Builder module.""" + +import re +import yaml +import ast +import astor +import inspect +import importlib +import nbformat + +from shutil import copytree +from logging import getLogger +from pathlib import Path + +from nbdev.export import nb_export +from openfl.experimental.interface.cli.cli_helper import print_tree + + +class WorkspaceExport: + """ + Convert a LocalRuntime Jupyter Notebook to Aggregator based FederatedRuntime Workflow. + + Args: + notebook_path: Absolute path of jupyter notebook. + template_workspace_path: Path to template workspace provided with OpenFL. + output_dir: Output directory for new generated workspace (default="/tmp"). + + Returns: + None + """ + def __init__(self, + notebook_path: str, + output_workspace: str) -> None: + self.logger = getLogger(__name__) + + self.notebook_path = Path(notebook_path).resolve() + self.output_workspace_path = Path(output_workspace).resolve() + self.output_workspace_path.parent.mkdir(parents=True, exist_ok=True) + + self.template_workspace_path = Path(f"{__file__}").parent.parent.parent.parent.joinpath( + "openfl-workspace", "experimental", "template_workspace" + ).resolve(strict=True) + + # Copy template workspace to output directory + self.created_workspace_path = Path(copytree( + self.template_workspace_path, self.output_workspace_path)) + self.logger.info(f"Copied template workspace to {self.created_workspace_path}") + + self.logger.info("Converting jupter notebook to python script...") + export_filename = self.__get_exp_name() + self.script_path = Path(self.__convert_to_python( + self.notebook_path, self.created_workspace_path.joinpath("src"), + f"{export_filename}.py")).resolve() + print_tree(self.created_workspace_path, level=2) + + # Generated python script name without .py extension + self.script_name = self.script_path.name.split(".")[0].strip() + # Comment flow.run() so when script is imported flow does not start executing + self.__comment_flow_execution() + # This is required as Ray created actors too many actors when backend="ray" + self.__change_runtime() + + def __get_exp_name(self): + """Fetch the experiment name from the Jupyter notebook.""" + with open(str(self.notebook_path), "r") as f: + notebook_content = nbformat.read(f, as_version=nbformat.NO_CONVERT) + + for cell in notebook_content.cells: + if cell.cell_type == "code": + code = cell.source + match = re.search(r"#\s*\|\s*default_exp\s+(\w+)", code) + if match: + self.logger.info(f"Retrieved {match.group(1)} from default_exp") + return match.group(1) + return None + + def __convert_to_python(self, notebook_path: Path, output_path: Path, export_filename): + nb_export(notebook_path, output_path) + + return Path(output_path).joinpath(export_filename).resolve() + + def __comment_flow_execution(self): + """ + In the python script search for ".run()" and comment it + """ + with open(self.script_path, "r") as f: + data = f.readlines() + for idx, line in enumerate(data): + if ".run()" in line: + data[idx] = f"# {line}" + with open(self.script_path, "w") as f: + f.writelines(data) + + def __change_runtime(self): + """ + Change the LocalRuntime backend from ray to single_process + """ + with open(self.script_path, "r") as f: + data = f.read() + + if data.find("backend='ray'") != -1: + data = data.replace("backend='ray'", "backend='single_process'") + elif data.find('backend="ray"') != -1: + data = data.replace('backend="ray"', 'backend="single_process"') + + with open(self.script_path, "w") as f: + f.write(data) + + def __get_class_arguments(self, class_name): + """ + Given the class name returns expected class arguments + """ + # Import python script if not already + if not hasattr(self, "exported_script_module"): + self.__import_exported_script() + + # Find class from imported python script module + for idx, attr in enumerate(self.available_modules_in_exported_script): + if attr == class_name: + cls = getattr(self.exported_script_module, + self.available_modules_in_exported_script[idx]) + + # If class not found + if "cls" not in locals(): + raise Exception(f"{class_name} not found.") + + if inspect.isclass(cls): + # Check if the class has an __init__ method + if "__init__" in cls.__dict__: + init_signature = inspect.signature(cls.__init__) + # Extract the parameter names (excluding 'self', 'args', and 'kwargs') + arg_names = [param for param in init_signature.parameters if param not in ( + "self", "args", "kwargs")] + return arg_names + return [] + self.logger.error(f"{cls} is not a class") + + def __get_class_name_and_sourcecode_from_parent_class(self, parent_class): + """ + Provided the parent_class name returns derived class source code and name. + """ + # Import python script if not already + if not hasattr(self, "exported_script_module"): + self.__import_exported_script() + + # Going though all attributes in imported python script + for attr in self.available_modules_in_exported_script: + t = getattr(self.exported_script_module, attr) + if inspect.isclass(t) and t != parent_class and issubclass(t, parent_class): + return inspect.getsource(t), attr + + return None, None + + def __extract_class_initializing_args(self, class_name): + """ + Provided name of the class returns expected arguments and it's values in form of dictionary + """ + instantiation_args = { + "args": {}, "kwargs": {} + } + + with open(self.script_path, "r") as s: + tree = ast.parse(s.read()) + + for node in ast.walk(tree): + if isinstance(node, ast.Call) and isinstance(node.func, ast.Name): + if node.func.id == class_name: + # We found an instantiation of the class + for arg in node.args: + # Iterate through positional arguments + if isinstance(arg, ast.Name): + # Use the variable name as the argument value + instantiation_args["args"][arg.id] = arg.id + elif isinstance(arg, ast.Constant): + instantiation_args["args"][arg.s] = astor.to_source(arg) + else: + instantiation_args["args"][arg.arg] = astor.to_source(arg).strip() + + for kwarg in node.keywords: + # Iterate through keyword arguments + value = astor.to_source(kwarg.value).strip() + + # If paranthese or brackets around the value is found + # and it's not tuple or list remove paranthese or brackets + if value.startswith("(") and "," not in value: + value = value.lstrip("(").rstrip(")") + if value.startswith("[") and "," not in value: + value = value.lstrip("[").rstrip("]") + try: + value = ast.literal_eval(value) + except Exception: + pass + instantiation_args["kwargs"][kwarg.arg] = value + + return instantiation_args + + def __import_exported_script(self): + """ + Imports generated python script with help of importlib + """ + import sys + import importlib + + sys.path.append(str(self.script_path.parent)) + self.exported_script_module = importlib.import_module(self.script_name) + self.available_modules_in_exported_script = dir(self.exported_script_module) + + def __read_yaml(self, path): + with open(path, "r") as y: + return yaml.safe_load(y) + + def __write_yaml(self, path, data): + with open(path, "w") as y: + yaml.safe_dump(data, y) + + @classmethod + def export(cls, notebook_path: str, output_workspace: str) -> None: + """ + Exports workspace to `output_dir`. + + Args: + notebook_path: Jupyter notebook path. + output_dir: Path for generated workspace directory. + template_workspace_path: Path to template workspace provided with OpenFL + (default="/tmp"). + + Returns: + None + """ + instance = cls(notebook_path, output_workspace) + instance.generate_requirements() + instance.generate_plan_yaml() + instance.generate_data_yaml() + + # Have to do generate_requirements before anything else + # because these !pip commands needs to be removed from python script + def generate_requirements(self): + """ + Finds pip libraries mentioned in exported python script and append in + workspace/requirements.txt + """ + data = None + with open(self.script_path, "r") as f: + requirements = [] + line_nos = [] + data = f.readlines() + for i, line in enumerate(data): + line = line.strip() + if "pip install" in line: + line_nos.append(i) + # Avoid commented lines, libraries from *.txt file, or openfl.git + # installation + if not line.startswith("#") and "-r" not in line and "openfl.git" not in line: + requirements.append(f"{line.split(' ')[-1].strip()}\n") + + requirements_filepath = str( + self.created_workspace_path.joinpath("requirements.txt").resolve()) + + # Write libraries found in requirements.txt + with open(requirements_filepath, "a") as f: + f.writelines(requirements) + + # Delete pip requirements from python script + # if not we won't be able to import python script. + with open(self.script_path, "w") as f: + for i, line in enumerate(data): + if i not in line_nos: + f.write(line) + + def generate_plan_yaml(self): + """ + Generates plan.yaml + """ + flspec = getattr( + importlib.import_module("openfl.experimental.interface"), "FLSpec" + ) + # Get flow classname + _, self.flow_class_name = self.__get_class_name_and_sourcecode_from_parent_class(flspec) + # Get expected arguments of flow class + self.flow_class_expected_arguments = self.__get_class_arguments(self.flow_class_name) + # Get provided arguments to flow class + self.arguments_passed_to_initialize = self.__extract_class_initializing_args( + self.flow_class_name) + + plan = self.created_workspace_path.joinpath("plan", "plan.yaml").resolve() + data = self.__read_yaml(plan) + if data is None: + data["federated_flow"] = { + "settings": {}, + "template": "" + } + + data["federated_flow"]["template"] = f"src.{self.script_name}.{self.flow_class_name}" + + def update_dictionary(args: dict, data: dict, dtype: str = "args"): + for idx, (k, v) in enumerate(args.items()): + if dtype == "args": + v = getattr(self.exported_script_module, str(k), None) + if v is not None and type(v) not in (int, str, bool): + v = f"src.{self.script_name}.{k}" + k = self.flow_class_expected_arguments[idx] + elif dtype == "kwargs": + if v is not None and type(v) not in (int, str, bool): + v = f"src.{self.script_name}.{k}" + data["federated_flow"]["settings"].update({ + k: v + }) + + # Find positional arguments of flow class and it's values + pos_args = self.arguments_passed_to_initialize["args"] + update_dictionary(pos_args, data, dtype="args") + # Find kwargs of flow class and it's values + kw_args = self.arguments_passed_to_initialize["kwargs"] + update_dictionary(kw_args, data, dtype="kwargs") + + self.__write_yaml(plan, data) + + def generate_data_yaml(self): + """ + Generates data.yaml + """ + # Import python script if not already + if not hasattr(self, "exported_script_module"): + self.__import_exported_script() + + # If flow classname is not yet found + if not hasattr(self, "flow_class_name"): + flspec = getattr( + importlib.import_module("openfl.experimental.interface"), "FLSpec" + ) + _, self.flow_class_name = self.__get_class_name_and_sourcecode_from_parent_class( + flspec) + + # Import flow class + federated_flow_class = getattr(self.exported_script_module, self.flow_class_name) + # Find federated_flow._runtime and federated_flow._runtime.collaborators + for t in self.available_modules_in_exported_script: + t = getattr(self.exported_script_module, t) + if isinstance(t, federated_flow_class): + if not hasattr(t, "_runtime"): + raise Exception("Unable to locate LocalRuntime instantiation") + runtime = t._runtime + if not hasattr(runtime, "collaborators"): + raise Exception("LocalRuntime instance does not have collaborators") + collaborators_names = runtime.collaborators + break + + data_yaml = self.created_workspace_path.joinpath("plan", "data.yaml").resolve() + data = self.__read_yaml(data_yaml) + if data is None: + data = {} + + # Find aggregator details + aggregator = runtime._aggregator + private_attrs_callable = aggregator.private_attributes_callable + if private_attrs_callable is not None: + data["aggregator"] = { + "callable_func": { + "settings": {}, + "template": f"src.{self.script_name}.{private_attrs_callable.__name__}" + } + } + # Find arguments expected by Aggregator + arguments_passed_to_initialize = self.__extract_class_initializing_args("Aggregator")[ + "kwargs"] + agg_kwargs = aggregator.kwargs + for key, value in agg_kwargs.items(): + if isinstance(value, (int, str, bool)): + data["aggregator"]["callable_func"]["settings"][key] = value + else: + arg = arguments_passed_to_initialize[key] + value = f"src.{self.script_name}.{arg}" + data["aggregator"]["callable_func"]["settings"][key] = value + + # Find arguments expected by Collaborator + arguments_passed_to_initialize = self.__extract_class_initializing_args("Collaborator")[ + "kwargs"] + for collab_name in collaborators_names: + if collab_name not in data: + data[collab_name] = { + "callable_func": { + "settings": {}, + "template": None + } + } + # Find collaborator details + kw_args = runtime.get_collaborator_kwargs(collab_name) + for key, value in kw_args.items(): + if key == "private_attributes_callable": + value = f"src.{self.script_name}.{value}" + data[collab_name]["callable_func"]["template"] = value + elif isinstance(value, (int, str, bool)): + data[collab_name]["callable_func"]["settings"][key] = value + else: + arg = arguments_passed_to_initialize[key] + value = f"src.{self.script_name}.{arg}" + data[collab_name]["callable_func"]["settings"][key] = value + + self.__write_yaml(data_yaml, data) diff --git a/openfl/interface/cli.py b/openfl/interface/cli.py index 4bf91684fe..3e4d0876d2 100755 --- a/openfl/interface/cli.py +++ b/openfl/interface/cli.py @@ -3,6 +3,8 @@ # SPDX-License-Identifier: Apache-2.0 """CLI module.""" +import os + from click import argument from click import command from click import confirm @@ -207,7 +209,16 @@ def review_plan_callback(file_name, file_path): def show_header(): """Show header.""" + from pathlib import Path + banner = 'OpenFL - Open Federated Learning' + + experimental = Path(os.path.expanduser("~")).resolve().joinpath( + ".openfl", "experimental").resolve() + + if os.path.exists(experimental): + banner = 'OpenFL - Open Federated Learning (Experimental)' + echo(style(f'{banner:<80}', bold=True, bg='bright_blue')) echo() @@ -218,8 +229,14 @@ def entry(): from pathlib import Path from sys import path - file = Path(__file__).resolve() - root = file.parent.resolve() # interface root, containing command modules + experimental = Path(os.path.expanduser("~")).resolve().joinpath( + ".openfl", "experimental").resolve() + + root = Path(__file__).parent.resolve() + + if experimental.exists(): + root = root.parent.joinpath("experimental", "interface", "cli").resolve() + work = Path.cwd().resolve() path.append(str(root)) path.insert(0, str(work)) diff --git a/openfl/interface/experimental.py b/openfl/interface/experimental.py new file mode 100644 index 0000000000..d7622ea25f --- /dev/null +++ b/openfl/interface/experimental.py @@ -0,0 +1,44 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""Experimental CLI.""" + +from pathlib import Path +from logging import getLogger +from click import group +from click import pass_context + +logger = getLogger(__name__) + + +@group() +@pass_context +def experimental(context): + """Manage Experimental Environment.""" + context.obj["group"] = "experimental" + + +@experimental.command(name="activate") +def activate(): + """Activate experimental environment.""" + settings = Path("~").expanduser().joinpath( + ".openfl").resolve() + settings.mkdir(parents=False, exist_ok=True) + settings = settings.joinpath("experimental").resolve() + + from subprocess import check_call + from sys import executable + import openfl + + rf = Path(openfl.__file__).parent.parent.resolve().joinpath( + "openfl-tutorials", "experimental", "requirements_workflow_interface.txt").resolve() + + if rf.is_file(): + check_call( + [executable, '-m', 'pip', 'install', '-r', rf], + shell=False + ) + else: + logger.warning(f"Requirements file {rf} not found.") + + with open(settings, "w") as f: + f.write("experimental") diff --git a/openfl/interface/workspace.py b/openfl/interface/workspace.py index 1ee747ca7b..31b5ff647d 100644 --- a/openfl/interface/workspace.py +++ b/openfl/interface/workspace.py @@ -68,7 +68,7 @@ def get_templates(): from openfl.interface.cli_helper import WORKSPACE return [d.name for d in WORKSPACE.glob('*') if d.is_dir() - and d.name not in ['__pycache__', 'workspace']] + and d.name not in ['__pycache__', 'workspace', 'experimental']] @workspace.command(name='create') diff --git a/setup.py b/setup.py index 1b3b14ac74..3bf05e5bed 100644 --- a/setup.py +++ b/setup.py @@ -102,6 +102,13 @@ def run(self): 'openfl.databases', 'openfl.databases.utilities', 'openfl.experimental', + 'openfl.experimental.workspace_export', + 'openfl.experimental.federated', + 'openfl.experimental.federated.plan', + 'openfl.experimental.component', + 'openfl.experimental.component.aggregator', + 'openfl.experimental.component.collaborator', + 'openfl.experimental.interface.cli', 'openfl.experimental.interface', 'openfl.experimental.interface.keras', 'openfl.experimental.interface.keras.aggregation_functions', @@ -109,6 +116,9 @@ def run(self): 'openfl.experimental.interface.torch.aggregation_functions', 'openfl.experimental.placement', 'openfl.experimental.runtime', + 'openfl.experimental.protocols', + 'openfl.experimental.transport', + 'openfl.experimental.transport.grpc', 'openfl.experimental.utilities', 'openfl.federated', 'openfl.federated.data', diff --git a/tests/github/experimental/testflow_datastore_cli.py b/tests/github/experimental/testflow_datastore_cli.py index 9b40f765cf..6a51d1364b 100644 --- a/tests/github/experimental/testflow_datastore_cli.py +++ b/tests/github/experimental/testflow_datastore_cli.py @@ -175,12 +175,18 @@ def validate_datastore_cli(flow_obj, expected_flow_steps, num_rounds): validate_flow_error = [] verify_stdout = { - "start": "Testing FederatedFlow - Starting Test for Dataflow and CLI Functionality\n", - "aggregated_model_validation": "Performing aggregated model validation for collaborator\n", - "train": "Train the model\n", - "local_model_validation": "Doing local model validation for collaborator\n", - "join": "Executing join\n", - "end": "This is the end of the flow\n", + "start": + "\x1b[94mTesting FederatedFlow - Starting Test for Dataflow" + + " and CLI Functionality\x1b[0m\x1b[94m\n\x1b[0m\n", + "aggregated_model_validation": + "\x1b[94mPerforming aggregated model validation for" + + " collaborator\x1b[0m\x1b[94m\n\x1b[0m\n", + "train": "\x1b[94mTrain the model\x1b[0m\x1b[94m\n\x1b[0m\n", + "local_model_validation": + "\x1b[94mDoing local model validation for collaborator" + + "\x1b[0m\x1b[94m\n\x1b[0m\n", + "join": "\x1b[94mExecuting join\x1b[0m\x1b[94m\n\x1b[0m\n", + "end": "\x1b[94mThis is the end of the flow\x1b[0m\x1b[94m\n\x1b[0m\n", } # fetch data from metaflow @@ -286,7 +292,6 @@ def display_validate_errors(validate_flow_error): # Setup participants aggregator_ = Aggregator() - # Setup collaborators with private attributes collaborator_names = ["Portland", "Seattle", "Chandler", "Bangalore"] def callable_to_initialize_collaborator_private_attributes( diff --git a/tests/github/experimental/testflow_subset_of_collaborators.py b/tests/github/experimental/testflow_subset_of_collaborators.py index 12fea10a92..8adf5e6858 100644 --- a/tests/github/experimental/testflow_subset_of_collaborators.py +++ b/tests/github/experimental/testflow_subset_of_collaborators.py @@ -73,7 +73,7 @@ def join(self, inputs): """ print("inside join") - self.collaborators_ran = [input.collaborator_ran for input in inputs] + self.collaborators_ran = [i.collaborator_ran for i in inputs] self.next(self.end) @aggregator @@ -140,6 +140,8 @@ def callable_to_initialize_collaborator_private_attributes(collab_name): subset_collaborators = testflow_subset_collaborators.subset_collabrators collaborators_ran = testflow_subset_collaborators.collaborators_ran + # We now convert names to lowercase + collaborators_ran = list(map(str.lower, collaborators_ran)) random_ints = testflow_subset_collaborators.random_ints random_ints.remove(len(subset_collaborators)) @@ -161,6 +163,8 @@ def callable_to_initialize_collaborator_private_attributes(collab_name): + f"Testcase Passed.{bcolors.ENDC}" ) passed = True + print(f'subset_collaborators = {subset_collaborators}') + print(f'collaborators_ran = {collaborators_ran}') for collaborator_name in subset_collaborators: if collaborator_name not in collaborators_ran: passed = False diff --git a/tests/github/experimental/workspace/test_experimental_agg_based_workflow.py b/tests/github/experimental/workspace/test_experimental_agg_based_workflow.py new file mode 100644 index 0000000000..9b569a6dc1 --- /dev/null +++ b/tests/github/experimental/workspace/test_experimental_agg_based_workflow.py @@ -0,0 +1,79 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import os +import time +import socket +import argparse +from pathlib import Path +from subprocess import check_call +from concurrent.futures import ProcessPoolExecutor +from openfl.utilities.utils import rmtree +from tests.github.experimental.workspace.utils import create_collaborator +from tests.github.experimental.workspace.utils import create_certified_workspace +from tests.github.experimental.workspace.utils import certify_aggregator + + +if __name__ == '__main__': + # Test the pipeline + parser = argparse.ArgumentParser() + workspace_choice = [] + with os.scandir('tests/github/experimental/workspace') as iterator: + for entry in iterator: + if entry.name not in ['__init__.py', 'workspace', 'default']: + workspace_choice.append(entry.name) + parser.add_argument('--custom_template') + parser.add_argument('--template') + parser.add_argument('--fed_workspace', default='fed_work12345alpha81671') + parser.add_argument('--col', action='append', default=[]) + parser.add_argument('--rounds-to-train') + + origin_dir = Path.cwd().resolve() + args = parser.parse_args() + fed_workspace = args.fed_workspace + archive_name = f'{fed_workspace}.zip' + fqdn = socket.getfqdn() + template = args.template + custom_template = args.custom_template + rounds_to_train = args.rounds_to_train + collaborators = args.col + # START + # ===== + # Make sure you are in a Python virtual environment with the FL package installed. + + # Activate experimental + check_call(['fx', 'experimental', 'activate']) + + create_certified_workspace( + fed_workspace, custom_template, template, fqdn, rounds_to_train + ) + certify_aggregator(fqdn) + + # Get the absolute directory path for the workspace + workspace_root = Path().resolve() + + # Create Collaborators + for collab in collaborators: + create_collaborator( + collab, workspace_root, archive_name, fed_workspace + ) + + # Run the federation + with ProcessPoolExecutor(max_workers=len(collaborators) + 1) as executor: + executor.submit( + check_call, ['fx', 'aggregator', 'start'], cwd=workspace_root + ) + time.sleep(5) + + for collab in collaborators: + col_dir = workspace_root / collab / fed_workspace + executor.submit( + check_call, ['fx', 'collaborator', 'start', '-n', collab], + cwd=col_dir + ) + + os.chdir(origin_dir) + rmtree(workspace_root) + + # Deactivate experimental + check_call(['fx', 'experimental', 'deactivate']) diff --git a/tests/github/experimental/workspace/testcase_datastore_cli/.workspace b/tests/github/experimental/workspace/testcase_datastore_cli/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_datastore_cli/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/tests/github/experimental/workspace/testcase_datastore_cli/plan/cols.yaml b/tests/github/experimental/workspace/testcase_datastore_cli/plan/cols.yaml new file mode 100644 index 0000000000..2ac4e56fa5 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_datastore_cli/plan/cols.yaml @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: + \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_datastore_cli/plan/data.yaml b/tests/github/experimental/workspace/testcase_datastore_cli/plan/data.yaml new file mode 100644 index 0000000000..5538b80f12 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_datastore_cli/plan/data.yaml @@ -0,0 +1,26 @@ +## Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +col1: + callable_func: + settings: + batch_size_train: 64 + index: 1 + n_collaborators: 2 + test_dataset: src.collaborator_private_attrs.test_dataset + train_dataset: src.collaborator_private_attrs.train_dataset + template: src.collaborator_private_attrs.collaborator_private_attrs + +col2: + callable_func: + settings: + batch_size_train: 64 + index: 2 + n_collaborators: 2 + test_dataset: src.collaborator_private_attrs.test_dataset + train_dataset: src.collaborator_private_attrs.train_dataset + template: src.collaborator_private_attrs.collaborator_private_attrs \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_datastore_cli/plan/defaults b/tests/github/experimental/workspace/testcase_datastore_cli/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_datastore_cli/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/tests/github/experimental/workspace/testcase_datastore_cli/plan/plan.yaml b/tests/github/experimental/workspace/testcase_datastore_cli/plan/plan.yaml new file mode 100644 index 0000000000..b73d17e6d8 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_datastore_cli/plan/plan.yaml @@ -0,0 +1,27 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.Aggregator + settings : + rounds_to_train : 1 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.Collaborator + settings : {} + + +federated_flow: + template: src.testflow_datastore_cli.TestFlowDatastoreAndCli + settings: + rounds: 1 + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml diff --git a/tests/github/experimental/workspace/testcase_datastore_cli/requirements.txt b/tests/github/experimental/workspace/testcase_datastore_cli/requirements.txt new file mode 100644 index 0000000000..046073d366 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_datastore_cli/requirements.txt @@ -0,0 +1,2 @@ +wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability +torchvision diff --git a/tests/github/experimental/workspace/testcase_datastore_cli/src/__init__.py b/tests/github/experimental/workspace/testcase_datastore_cli/src/__init__.py new file mode 100644 index 0000000000..6e02c1c951 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_datastore_cli/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/tests/github/experimental/workspace/testcase_datastore_cli/src/collaborator_private_attrs.py b/tests/github/experimental/workspace/testcase_datastore_cli/src/collaborator_private_attrs.py new file mode 100644 index 0000000000..b035e58807 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_datastore_cli/src/collaborator_private_attrs.py @@ -0,0 +1,47 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +import torch +from copy import deepcopy +import torchvision + +train_dataset = torchvision.datasets.MNIST( + "files/", + train=True, + download=True, + transform=torchvision.transforms.Compose( + [ + torchvision.transforms.ToTensor(), + torchvision.transforms.Normalize((0.1307,), (0.3081,)), + ] + ), +) + +test_dataset = torchvision.datasets.MNIST( + "files/", + train=False, + download=True, + transform=torchvision.transforms.Compose( + [ + torchvision.transforms.ToTensor(), + torchvision.transforms.Normalize((0.1307,), (0.3081,)), + ] + ), +) + + +def collaborator_private_attrs(index, n_collaborators, train_dataset, + test_dataset, batch_size_train): + local_train = deepcopy(train_dataset) + local_test = deepcopy(test_dataset) + local_train.data = train_dataset.data[index:: n_collaborators] + local_train.targets = train_dataset.targets[index:: n_collaborators] + local_test.data = test_dataset.data[index:: n_collaborators] + local_test.targets = test_dataset.targets[index:: n_collaborators] + return { + "train_loader": torch.utils.data.DataLoader( + local_train, batch_size=batch_size_train, shuffle=True + ), + "test_loader": torch.utils.data.DataLoader( + local_test, batch_size=batch_size_train, shuffle=True + ), + } diff --git a/tests/github/experimental/workspace/testcase_datastore_cli/src/testflow_datastore_cli.py b/tests/github/experimental/workspace/testcase_datastore_cli/src/testflow_datastore_cli.py new file mode 100644 index 0000000000..4517ef0e87 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_datastore_cli/src/testflow_datastore_cli.py @@ -0,0 +1,271 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim +import torch + +from openfl.experimental.interface import FLSpec +from openfl.experimental.placement import aggregator, collaborator + +batch_size_train = 64 +learning_rate = 0.01 +momentum = 0.5 +log_interval = 10 + +random_seed = 1 +torch.backends.cudnn.enabled = False +torch.manual_seed(random_seed) + + +class Bcolors: + HEADER = "\033[95m" + OKBLUE = "\033[94m" + OKCYAN = "\033[96m" + OKGREEN = "\033[92m" + WARNING = "\033[93m" + FAIL = "\033[91m" + ENDC = "\033[0m" + BOLD = "\033[1m" + UNDERLINE = "\033[4m" + + +class Net(nn.Module): + def __init__(self): + super(Net, self).__init__() + self.conv1 = nn.Conv2d(1, 10, kernel_size=5) + self.fc1 = nn.Linear(1440, 10) + + def forward(self, x): + x = F.relu(F.max_pool2d(self.conv1(x), 2)) + x = x.view(-1, 1440) + x = F.relu(self.fc1(x)) + return F.log_softmax(x) + + +def inference(network, test_loader): + network.eval() + test_loss = 0 + correct = 0 + with torch.no_grad(): + for data, target in test_loader: + output = network(data) + test_loss += F.nll_loss(output, target, size_average=False).item() + pred = output.data.max(1, keepdim=True)[1] + correct += pred.eq(target.data.view_as(pred)).sum() + test_loss /= len(test_loader.dataset) + accuracy = float(correct / len(test_loader.dataset)) + return accuracy + + +class TestFlowDatastoreAndCli(FLSpec): + """ + Testflow for Dataflow and CLI Functionality + """ + def __init__(self, model=None, optimizer=None, rounds=3, **kwargs): + super().__init__(**kwargs) + if model is not None: + self.model = model + self.optimizer = optimizer + else: + self.model = Net() + self.optimizer = optim.SGD( + self.model.parameters(), lr=learning_rate, momentum=momentum + ) + self.num_rounds = rounds + self.current_round = 0 + + @aggregator + def start(self): + print( + "Testing FederatedFlow - Starting Test for Dataflow and CLI Functionality" + ) + self.collaborators = self.runtime.collaborators + self.private = 10 + self.next( + self.aggregated_model_validation, + foreach="collaborators", + exclude=["private"], + ) + + @collaborator + def aggregated_model_validation(self): + print("Performing aggregated model validation for collaborator") + self.agg_validation_score = inference(self.model, self.test_loader) + self.next(self.train) + + @collaborator + def train(self): + print("Train the model") + self.model.train() + self.optimizer = optim.SGD( + self.model.parameters(), lr=learning_rate, momentum=momentum + ) + for batch_idx, (data, target) in enumerate(self.train_loader): + self.optimizer.zero_grad() + output = self.model(data) + loss = F.nll_loss(output, target) + loss.backward() + self.optimizer.step() + if batch_idx % log_interval == 0: + self.loss = loss.item() + torch.save(self.model.state_dict(), "model.pth") + torch.save(self.optimizer.state_dict(), "optimizer.pth") + self.training_completed = True + self.next(self.local_model_validation) + + @collaborator + def local_model_validation(self): + self.local_validation_score = inference(self.model, self.test_loader) + print("Doing local model validation for collaborator") + self.next(self.join, exclude=["training_completed"]) + + @aggregator + def join(self, inputs): + print("Executing join") + self.current_round += 1 + if self.current_round < self.num_rounds: + self.next(self.start) + else: + self.next(self.end) + + @aggregator + def end(self): + print("This is the end of the flow") + + expected_flow_steps = [ + "start", + "aggregated_model_validation", + "train", + "local_model_validation", + "join", + ] # List to verify expected steps + validate_datastore_cli( + self, expected_flow_steps, self.num_rounds + ) # Function to validate datastore and cli + + +def validate_datastore_cli(flow_obj, expected_flow_steps, num_rounds): + """ + This function test the flow as below + 1. Verify datastore steps and expected steps are matching + 2. Verify task stdout and task stderr verified through \ + cli is as expected + 3. Verify no of tasks executed is aligned with the total \ + number of rounds and total number of collaborators + """ + validate_flow_error = [] + + verify_stdout = { + "start": + "\x1b[94mTesting FederatedFlow - Starting Test for Dataflow" + + " and CLI Functionality\x1b[0m\x1b[94m\n\x1b[0m\n", + "aggregated_model_validation": + "\x1b[94mPerforming aggregated model validation for" + + " collaborator\x1b[0m\x1b[94m\n\x1b[0m\n", + "train": "\x1b[94mTrain the model\x1b[0m\x1b[94m\n\x1b[0m\n", + "local_model_validation": + "\x1b[94mDoing local model validation for collaborator" + + "\x1b[0m\x1b[94m\n\x1b[0m\n", + "join": "\x1b[94mExecuting join\x1b[0m\x1b[94m\n\x1b[0m\n", + "end": "\x1b[94mThis is the end of the flow\x1b[0m\x1b[94m\n\x1b[0m\n", + } + + # fetch data from metaflow + from metaflow import Flow + + cli_flow_obj = Flow("TestFlowDatastoreAndCli") + cli_flow_steps = list(list(cli_flow_obj)[0]) + cli_step_names = [step.id for step in cli_flow_steps] + + steps_present_in_cli = [ + step for step in expected_flow_steps if step in cli_step_names + ] + missing_steps_in_cli = [ + step for step in expected_flow_steps if step not in cli_step_names + ] + extra_steps_in_cli = [ + step for step in cli_step_names if step not in expected_flow_steps + ] + + if len(steps_present_in_cli) != len(expected_flow_steps): + validate_flow_error.append( + f"{Bcolors.FAIL}... Error : Number of steps fetched from \ + Datastore through CLI do not match the Expected steps provided {Bcolors.ENDC} \n" + ) + + if len(missing_steps_in_cli) != 0: + validate_flow_error.append( + f"{Bcolors.FAIL}... Error : Following steps missing from Datastore: \ + {missing_steps_in_cli} {Bcolors.ENDC} \n" + ) + + if len(extra_steps_in_cli) != 0: + validate_flow_error.append( + f"{Bcolors.FAIL}... Error : Following steps are extra in Datastore: \ + {extra_steps_in_cli} {Bcolors.ENDC} \n" + ) + + for step in cli_flow_steps: + task_count = 0 + func = getattr(flow_obj, step.id) + for task in list(step): + task_count = task_count + 1 + if verify_stdout.get(step.id) != task.stdout: + validate_flow_error.append( + f"{Bcolors.FAIL}... Error : task stdout detected issues : \ + {step} {task} {Bcolors.ENDC} \n" + ) + + if ( + (func.aggregator_step) + and (task_count != num_rounds) + and (func.__name__ != "end") + ): + validate_flow_error.append( + f"{Bcolors.FAIL}... Error : More than one execution detected \ + for Aggregator Step: {step} {Bcolors.ENDC} \n" + ) + + if ( + (func.aggregator_step) + and (task_count != 1) + and (func.__name__ == "end") + ): + validate_flow_error.append( + f"{Bcolors.FAIL}... Error : More than one execution detected \ + for Aggregator Step: {step} {Bcolors.ENDC} \n" + ) + + if (func.collaborator_step) and ( + task_count != len(flow_obj.collaborators) * num_rounds + ): + validate_flow_error.append( + f"{Bcolors.FAIL}... Error : Incorrect number of execution \ + detected for Collaborator Step: {step}. \ + Expected: {num_rounds*len(flow_obj.collaborators)} \ + Actual: {task_count}{Bcolors.ENDC} \n" + ) + + if validate_flow_error: + display_validate_errors(validate_flow_error) + else: + print(f"""{Bcolors.OKGREEN}\n**** Summary of internal flow testing **** + No issues found and below are the tests that ran successfully + 1. Datastore steps and expected steps are matching + 2. Task stdout and task stderr verified through metaflow cli is as expected + 3. Number of tasks are aligned with number of rounds and number """ + f"""of collaborators {Bcolors.ENDC}""") + + +def display_validate_errors(validate_flow_error): + """ + Function to display error that is captured during datastore and cli test + """ + print( + f"{Bcolors.OKBLUE}Testing FederatedFlow - Ending test for validatng \ + the Datastore and Cli Testing {Bcolors.ENDC}" + ) + print("".join(validate_flow_error)) + print(f"{Bcolors.FAIL}\n ... Test case failed ... {Bcolors.ENDC}") diff --git a/tests/github/experimental/workspace/testcase_datastore_cli/src/utils.py b/tests/github/experimental/workspace/testcase_datastore_cli/src/utils.py new file mode 100644 index 0000000000..1e56f3e68d --- /dev/null +++ b/tests/github/experimental/workspace/testcase_datastore_cli/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/tests/github/experimental/workspace/testcase_include_exclude/.workspace b/tests/github/experimental/workspace/testcase_include_exclude/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_include_exclude/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/tests/github/experimental/workspace/testcase_include_exclude/plan/cols.yaml b/tests/github/experimental/workspace/testcase_include_exclude/plan/cols.yaml new file mode 100644 index 0000000000..95307de3bc --- /dev/null +++ b/tests/github/experimental/workspace/testcase_include_exclude/plan/cols.yaml @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: + \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_include_exclude/plan/data.yaml b/tests/github/experimental/workspace/testcase_include_exclude/plan/data.yaml new file mode 100644 index 0000000000..f187079f4b --- /dev/null +++ b/tests/github/experimental/workspace/testcase_include_exclude/plan/data.yaml @@ -0,0 +1,10 @@ +## Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +col1: + +col2: \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_include_exclude/plan/defaults b/tests/github/experimental/workspace/testcase_include_exclude/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_include_exclude/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/tests/github/experimental/workspace/testcase_include_exclude/plan/plan.yaml b/tests/github/experimental/workspace/testcase_include_exclude/plan/plan.yaml new file mode 100644 index 0000000000..b0ccdd65c7 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_include_exclude/plan/plan.yaml @@ -0,0 +1,26 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.aggregator.Aggregator + settings : + rounds_to_train : 10 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.collaborator.Collaborator + settings : {} + + +federated_flow: + template: src.testflow_include_exclude.TestFlowIncludeExclude + settings: + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml diff --git a/tests/github/experimental/workspace/testcase_include_exclude/requirements.txt b/tests/github/experimental/workspace/testcase_include_exclude/requirements.txt new file mode 100644 index 0000000000..32a96eaef3 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_include_exclude/requirements.txt @@ -0,0 +1 @@ +wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability diff --git a/tests/github/experimental/workspace/testcase_include_exclude/src/__init__.py b/tests/github/experimental/workspace/testcase_include_exclude/src/__init__.py new file mode 100644 index 0000000000..6e02c1c951 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_include_exclude/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/tests/github/experimental/workspace/testcase_include_exclude/src/testflow_include_exclude.py b/tests/github/experimental/workspace/testcase_include_exclude/src/testflow_include_exclude.py new file mode 100644 index 0000000000..77eb7b9274 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_include_exclude/src/testflow_include_exclude.py @@ -0,0 +1,198 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from openfl.experimental.interface import FLSpec +from openfl.experimental.placement import aggregator, collaborator + + +class bcolors: # NOQA: N801 + HEADER = "\033[95m" + OKBLUE = "\033[94m" + OKCYAN = "\033[96m" + OKGREEN = "\033[92m" + WARNING = "\033[93m" + FAIL = "\033[91m" + ENDC = "\033[0m" + BOLD = "\033[1m" + UNDERLINE = "\033[4m" + + +class TestFlowIncludeExclude(FLSpec): + """ + Testflow to validate include and exclude functionality in Federated Flow. + """ + + include_exclude_error_list = [] + + def __init__(self, checkpoint: bool = False): + super().__init__(checkpoint) + + @aggregator + def start(self): + """ + Flow start. + """ + print( + f"{bcolors.OKBLUE}Testing FederatedFlow - Starting Test for Include and Exclude " + + f"Attributes {bcolors.ENDC}" + ) + self.collaborators = self.runtime.collaborators + + self.exclude_agg_to_agg = 10 + self.include_agg_to_agg = 100 + self.next(self.test_include_exclude_agg_to_agg, exclude=["exclude_agg_to_agg"]) + + @aggregator + def test_include_exclude_agg_to_agg(self): + """ + Testing whether attributes are excluded from agg to agg + """ + if ( + hasattr(self, "include_agg_to_agg") is True + and hasattr(self, "exclude_agg_to_agg") is False + ): + print( + f"{bcolors.OKGREEN} ... Exclude test passed in test_include_exclude_agg_to_agg " + + f"{bcolors.ENDC}" + ) + else: + TestFlowIncludeExclude.include_exclude_error_list.append( + "test_include_exclude_agg_to_agg" + ) + print( + f"{bcolors.FAIL} ... Exclude test failed in test_incude_exclude_agg_to_agg " + + f"{bcolors.ENDC}" + ) + + self.include_agg_to_collab = 100 + self.exclude_agg_to_collab = 78 + self.next( + self.test_include_exclude_agg_to_collab, + foreach="collaborators", + include=["include_agg_to_collab", "collaborators"], + ) + + @collaborator + def test_include_exclude_agg_to_collab(self): + """ + Testing whether attributes are included from agg to collab + """ + if ( + hasattr(self, "include_agg_to_agg") is False + and hasattr(self, "exclude_agg_to_agg") is False + and hasattr(self, "exclude_agg_to_collab") is False + and hasattr(self, "include_agg_to_collab") is True + ): + print( + f"{bcolors.OKGREEN} ... Include test passed in test_include_exclude_agg_to_collab " + + f"{bcolors.ENDC}" + ) + else: + TestFlowIncludeExclude.include_exclude_error_list.append( + "test_incude_exclude_agg_to_collab" + ) + print( + f"{bcolors.FAIL} ... Include test failed in test_include_exclude_agg_to_collab " + + f"{bcolors.ENDC}" + ) + self.exclude_collab_to_collab = 10 + self.include_collab_to_collab = 44 + self.next( + self.test_include_exclude_collab_to_collab, + exclude=["exclude_collab_to_collab"], + ) + + @collaborator + def test_include_exclude_collab_to_collab(self): + """ + Testing whether attributes are excluded from collab to collab + """ + if ( + hasattr(self, "include_agg_to_agg") is False + and hasattr(self, "include_agg_to_collab") is True + and hasattr(self, "include_collab_to_collab") is True + and hasattr(self, "exclude_agg_to_agg") is False + and hasattr(self, "exclude_agg_to_collab") is False + and hasattr(self, "exclude_collab_to_collab") is False + ): + print( + f"{bcolors.OKGREEN} ... Exclude test passed in " + + f"test_include_exclude_collab_to_collab {bcolors.ENDC}" + ) + else: + TestFlowIncludeExclude.include_exclude_error_list.append( + "test_incude_exclude_collab_to_collab" + ) + print( + f"{bcolors.FAIL} ... Exclude test failed in test_include_exclude_collab_to_collab " + + f"{bcolors.ENDC}" + ) + + self.exclude_collab_to_agg = 20 + self.include_collab_to_agg = 56 + self.next(self.join, include=["include_collab_to_agg"]) + + @aggregator + def join(self, inputs): + """ + Testing whether attributes are included from collab to agg + """ + # Aggregator attribute check + validate = ( + hasattr(self, "include_agg_to_agg") is True + and hasattr(self, "include_agg_to_collab") is True + and hasattr(self, "exclude_agg_to_collab") is True + and hasattr(self, "exclude_agg_to_agg") is False + ) + + # Collaborator attribute check + for input in inputs: + validation = validate and ( + hasattr(input, "include_collab_to_collab") is False + and hasattr(input, "exclude_collab_to_collab") is False + and hasattr(input, "exclude_collab_to_agg") is False + and hasattr(input, "include_collab_to_agg") is True + ) + + if validation: + print( + f"{bcolors.OKGREEN} ... Include and Exclude tests passed in join {bcolors.ENDC}" + ) + else: + TestFlowIncludeExclude.include_exclude_error_list.append("join") + print( + f"{bcolors.FAIL} ... Include and Exclude tests failed in join {bcolors.ENDC}" + ) + + print( + f"\n{bcolors.UNDERLINE} Include and exclude attributes test summary: {bcolors.ENDC}\n" + ) + + if TestFlowIncludeExclude.include_exclude_error_list: + validated_include_exclude_variables = ",".join( + TestFlowIncludeExclude.include_exclude_error_list + ) + print( + f"{bcolors.FAIL} ...Test case failed for {validated_include_exclude_variables} " + + f"{bcolors.ENDC}" + ) + + self.next(self.end) + + @aggregator + def end(self): + """ + This is the 'end' step. All flows must have an 'end' step, which is the + last step in the flow. + """ + print( + f"{bcolors.OKBLUE}Testing FederatedFlow - Ending Test for Include and Exclude " + + f"Attributes {bcolors.ENDC}" + ) + if TestFlowIncludeExclude.include_exclude_error_list: + raise ( + AssertionError( + f"{bcolors.FAIL}\n ...Test case failed ... {bcolors.ENDC}" + ) + ) + print(f"{bcolors.OKBLUE}End of Testing FederatedFlow {bcolors.ENDC}") diff --git a/tests/github/experimental/workspace/testcase_include_exclude/src/utils.py b/tests/github/experimental/workspace/testcase_include_exclude/src/utils.py new file mode 100644 index 0000000000..1e56f3e68d --- /dev/null +++ b/tests/github/experimental/workspace/testcase_include_exclude/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/tests/github/experimental/workspace/testcase_internalloop/.workspace b/tests/github/experimental/workspace/testcase_internalloop/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_internalloop/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/tests/github/experimental/workspace/testcase_internalloop/plan/cols.yaml b/tests/github/experimental/workspace/testcase_internalloop/plan/cols.yaml new file mode 100644 index 0000000000..59d4f60bce --- /dev/null +++ b/tests/github/experimental/workspace/testcase_internalloop/plan/cols.yaml @@ -0,0 +1,4 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_internalloop/plan/data.yaml b/tests/github/experimental/workspace/testcase_internalloop/plan/data.yaml new file mode 100644 index 0000000000..840b029d23 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_internalloop/plan/data.yaml @@ -0,0 +1,10 @@ +## Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +col1: + +col2: \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_internalloop/plan/defaults b/tests/github/experimental/workspace/testcase_internalloop/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_internalloop/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/tests/github/experimental/workspace/testcase_internalloop/plan/plan.yaml b/tests/github/experimental/workspace/testcase_internalloop/plan/plan.yaml new file mode 100644 index 0000000000..24327dce52 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_internalloop/plan/plan.yaml @@ -0,0 +1,27 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.aggregator.Aggregator + settings : + rounds_to_train : 1 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.collaborator.Collaborator + settings : {} + + +federated_flow: + template: src.testflow_internalloop.TestFlowInternalLoop + settings: + rounds: 5 + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_internalloop/requirements.txt b/tests/github/experimental/workspace/testcase_internalloop/requirements.txt new file mode 100644 index 0000000000..16b349007c --- /dev/null +++ b/tests/github/experimental/workspace/testcase_internalloop/requirements.txt @@ -0,0 +1,4 @@ +torch==1.13.1 +torchvision==0.14.1 +tensorboard +wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability diff --git a/tests/github/experimental/workspace/testcase_internalloop/src/__init__.py b/tests/github/experimental/workspace/testcase_internalloop/src/__init__.py new file mode 100644 index 0000000000..6e02c1c951 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_internalloop/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/tests/github/experimental/workspace/testcase_internalloop/src/testflow_internalloop.py b/tests/github/experimental/workspace/testcase_internalloop/src/testflow_internalloop.py new file mode 100644 index 0000000000..3661b1c4d0 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_internalloop/src/testflow_internalloop.py @@ -0,0 +1,219 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from openfl.experimental.interface.fl_spec import FLSpec +from openfl.experimental.placement.placement import aggregator, collaborator +import numpy as np + + +class bcolors: # NOQA: N801 + HEADER = "\033[95m" + OKBLUE = "\033[94m" + OKCYAN = "\033[96m" + OKGREEN = "\033[92m" + WARNING = "\033[93m" + FAIL = "\033[91m" + ENDC = "\033[0m" + BOLD = "\033[1m" + UNDERLINE = "\033[4m" + + +class TestFlowInternalLoop(FLSpec): + def __init__(self, model=None, optimizer=None, rounds=None, **kwargs): + super().__init__(**kwargs) + self.training_rounds = rounds + self.train_count = 0 + self.end_count = 0 + + @aggregator + def start(self): + """ + Flow start. + """ + print( + f"{bcolors.OKBLUE}Testing FederatedFlow - " + + f"Test for Internal Loops - Round: {self.train_count}" + + f" of Training Rounds: {self.training_rounds}{bcolors.ENDC}" + ) + self.model = np.zeros((10, 10, 10)) # Test model + self.collaborators = self.runtime.collaborators + self.next(self.agg_model_mean, foreach="collaborators") + + @collaborator + def agg_model_mean(self): + """ + Calculating the mean of the model created in start. + """ + self.agg_mean_value = np.mean(self.model) + print(f": {self.input} Mean of Agg model: {self.agg_mean_value} ") + self.next(self.collab_model_update) + + @collaborator + def collab_model_update(self): + """ + Initializing the model with random numbers. + """ + print(f": {self.input} Initializing the model randomly ") + self.model = np.random.randint(1, len(self.input), (10, 10, 10)) + self.next(self.local_model_mean) + + @collaborator + def local_model_mean(self): + """ + Calculating the mean of the model created in train. + """ + self.local_mean_value = np.mean(self.model) + print(f": {self.input} Local mean: {self.local_mean_value} ") + self.next(self.join) + + @aggregator + def join(self, inputs): + """ + Joining inputs from collaborators + """ + self.agg_mean = sum(input.local_mean_value for input in inputs) / len(inputs) + print(f"Aggregated mean : {self.agg_mean}") + self.next(self.internal_loop) + + @aggregator + def internal_loop(self): + """ + Internally Loop for training rounds + """ + self.train_count = self.train_count + 1 + if self.training_rounds == self.train_count: + self.next(self.end) + else: + self.next(self.start) + + @aggregator + def end(self): + """ + This is the 'end' step. All flows must have an 'end' step, which is the + last step in the flow. + """ + self.end_count += 1 + print("This is the end of the flow") + + flflow = self + # Flow Test Begins + expected_flow_steps = [ + "join", + "internal_loop", + "agg_model_mean", + "collab_model_update", + "local_model_mean", + "start", + ] # List to verify expected steps + try: + validate_flow( + flflow, expected_flow_steps + ) # Function to validate the internal flow + except Exception as e: + raise e + # Flow Test Ends + + +def validate_flow(flow_obj, expected_flow_steps): + """ + Validate: + 1. If the given training round were completed + 2. If all the steps were executed + 3. If each collaborator step was executed + 4. If end was executed once + """ + validate_flow_error = [] # List to capture any errors in the flow + + from metaflow import Flow + + cli_flow_obj = Flow("TestFlowInternalLoop") # Flow object from CLI + cli_flow_steps = list(cli_flow_obj.latest_run) # Steps from CLI + cli_step_names = [step.id for step in cli_flow_steps] + + # 1. If the given training round were completed + if not flow_obj.training_rounds == flow_obj.train_count: + validate_flow_error.append( + f"{bcolors.FAIL}... Error : Number of training completed is not equal" + + f" to training rounds {bcolors.ENDC} \n" + ) + + for step in cli_flow_steps: + task_count = 0 + func = getattr(flow_obj, step.id) + for task in list(step): + task_count = task_count + 1 + + # Each aggregator step should be executed for training rounds times + if ( + (func.aggregator_step is True) + and (task_count != flow_obj.training_rounds) + and (step.id != "end") + ): + validate_flow_error.append( + f"{bcolors.FAIL}... Error : More than one execution detected for " + + f"Aggregator Step: {step} {bcolors.ENDC} \n" + ) + + # Each collaborator step is executed for (training rounds)*(number of collaborator) times + if (func.collaborator_step is True) and ( + task_count != len(flow_obj.collaborators) * flow_obj.training_rounds + ): + validate_flow_error.append( + f"{bcolors.FAIL}... Error : Incorrect number of execution detected for " + + f"Collaborator Step: {step}. Expected: " + + f"{flow_obj.training_rounds*len(flow_obj.collaborators)} " + + f"Actual: {task_count}{bcolors.ENDC} \n" + ) + + steps_present_in_cli = [ + step for step in expected_flow_steps if step in cli_step_names + ] + missing_steps_in_cli = [ + step for step in expected_flow_steps if step not in cli_step_names + ] + extra_steps_in_cli = [ + step for step in cli_step_names if step not in expected_flow_steps + ] + + if len(steps_present_in_cli) != len(expected_flow_steps): + validate_flow_error.append( + f"{bcolors.FAIL}... Error : Number of steps fetched from Datastore through CLI do not " + + f"match the Expected steps provided {bcolors.ENDC} \n" + ) + + if len(missing_steps_in_cli) != 0: + validate_flow_error.append( + f"{bcolors.FAIL}... Error : Following steps missing from Datastore: " + + f"{missing_steps_in_cli} {bcolors.ENDC} \n" + ) + + if len(extra_steps_in_cli) != 0: + validate_flow_error.append( + f"{bcolors.FAIL}... Error : Following steps are extra in Datastore: " + + f"{extra_steps_in_cli} {bcolors.ENDC} \n" + ) + + if not flow_obj.end_count == 1: + validate_flow_error.append( + f"{bcolors.FAIL}... Error : End function called more than one time...{bcolors.ENDC}" + ) + + if validate_flow_error: + display_validate_errors(validate_flow_error) + raise Exception(f"{bcolors.FAIL}Test for Internal Loop FAILED") + else: + print( + f"""{bcolors.OKGREEN}\n **** Summary of internal flow testing **** + No issues found and below are the tests that ran successfully + 1. Number of training completed is equal to training rounds + 2. Cli steps and Expected steps are matching + 3. Number of tasks are aligned with number of rounds and number of collaborators + 4. End function executed one time {bcolors.ENDC}""" + ) + + +def display_validate_errors(validate_flow_error): + """ + Function to display error that is captured during flow test + """ + print("".join(validate_flow_error)) diff --git a/tests/github/experimental/workspace/testcase_internalloop/src/utils.py b/tests/github/experimental/workspace/testcase_internalloop/src/utils.py new file mode 100644 index 0000000000..1e56f3e68d --- /dev/null +++ b/tests/github/experimental/workspace/testcase_internalloop/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/tests/github/experimental/workspace/testcase_private_attributes/.workspace b/tests/github/experimental/workspace/testcase_private_attributes/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_private_attributes/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/tests/github/experimental/workspace/testcase_private_attributes/plan/cols.yaml b/tests/github/experimental/workspace/testcase_private_attributes/plan/cols.yaml new file mode 100644 index 0000000000..59d4f60bce --- /dev/null +++ b/tests/github/experimental/workspace/testcase_private_attributes/plan/cols.yaml @@ -0,0 +1,4 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_private_attributes/plan/data.yaml b/tests/github/experimental/workspace/testcase_private_attributes/plan/data.yaml new file mode 100644 index 0000000000..ccbf25acfb --- /dev/null +++ b/tests/github/experimental/workspace/testcase_private_attributes/plan/data.yaml @@ -0,0 +1,23 @@ +## Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +col1: + callable_func: + settings: + index: 1 + template: src.collaborator_private_attrs.collaborator_private_attrs + +col2: + callable_func: + settings: + index: 2 + template: src.collaborator_private_attrs.collaborator_private_attrs + +aggregator: + callable_func: + settings: {} + template: src.aggregator_private_attrs.aggregator_private_attrs \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_private_attributes/plan/defaults b/tests/github/experimental/workspace/testcase_private_attributes/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_private_attributes/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/tests/github/experimental/workspace/testcase_private_attributes/plan/plan.yaml b/tests/github/experimental/workspace/testcase_private_attributes/plan/plan.yaml new file mode 100644 index 0000000000..b5ab688b84 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_private_attributes/plan/plan.yaml @@ -0,0 +1,26 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.aggregator.Aggregator + settings : + rounds_to_train : 10 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.collaborator.Collaborator + settings : {} + + +federated_flow: + template: src.testflow_privateattributes.TestFlowPrivateAttributes + settings: + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_private_attributes/requirements.txt b/tests/github/experimental/workspace/testcase_private_attributes/requirements.txt new file mode 100644 index 0000000000..16b349007c --- /dev/null +++ b/tests/github/experimental/workspace/testcase_private_attributes/requirements.txt @@ -0,0 +1,4 @@ +torch==1.13.1 +torchvision==0.14.1 +tensorboard +wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability diff --git a/tests/github/experimental/workspace/testcase_private_attributes/src/__init__.py b/tests/github/experimental/workspace/testcase_private_attributes/src/__init__.py new file mode 100644 index 0000000000..6e02c1c951 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_private_attributes/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/tests/github/experimental/workspace/testcase_private_attributes/src/aggregator_private_attrs.py b/tests/github/experimental/workspace/testcase_private_attributes/src/aggregator_private_attrs.py new file mode 100644 index 0000000000..8e5756f71c --- /dev/null +++ b/tests/github/experimental/workspace/testcase_private_attributes/src/aggregator_private_attrs.py @@ -0,0 +1,7 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +import numpy as np + + +def aggregator_private_attrs(): + return {"test_loader": np.random.rand(10, 28, 28)} # Random data diff --git a/tests/github/experimental/workspace/testcase_private_attributes/src/collaborator_private_attrs.py b/tests/github/experimental/workspace/testcase_private_attributes/src/collaborator_private_attrs.py new file mode 100644 index 0000000000..bf439d00f4 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_private_attributes/src/collaborator_private_attrs.py @@ -0,0 +1,10 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +import numpy as np + + +def collaborator_private_attrs(index): + return { + "train_loader": np.random.rand(index * 50, 28, 28), + "test_loader": np.random.rand(index * 10, 28, 28), + } diff --git a/tests/github/experimental/workspace/testcase_private_attributes/src/testflow_privateattributes.py b/tests/github/experimental/workspace/testcase_private_attributes/src/testflow_privateattributes.py new file mode 100644 index 0000000000..3f19ed71c7 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_private_attributes/src/testflow_privateattributes.py @@ -0,0 +1,193 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from openfl.experimental.component import Aggregator +from openfl.experimental.interface import FLSpec +from openfl.experimental.placement import aggregator, collaborator + + +class bcolors: # NOQA: N801 + HEADER = "\033[95m" + OKBLUE = "\033[94m" + OKCYAN = "\033[96m" + OKGREEN = "\033[92m" + WARNING = "\033[93m" + FAIL = "\033[91m" + ENDC = "\033[0m" + BOLD = "\033[1m" + UNDERLINE = "\033[4m" + + +class TestFlowPrivateAttributes(FLSpec): + """ + Testflow to validate Aggregator private attributes are not accessible to collaborators + and vice versa + """ + + ERROR_LIST = [] + + @aggregator + def start(self): + """ + Flow start. + """ + print( + f"{bcolors.OKBLUE}Testing FederatedFlow - Starting Test for accessibility of private " + + f"attributes {bcolors.ENDC}" + ) + self.collaborators = self.runtime.collaborators + + validate_collab_private_attr(self, "test_loader", "start") + + self.exclude_agg_to_agg = 10 + self.include_agg_to_agg = 100 + self.next(self.aggregator_step, exclude=["exclude_agg_to_agg"]) + + @aggregator + def aggregator_step(self): + """ + Testing whether Agg private attributes are accessible in next agg step. + Collab private attributes should not be accessible here + """ + validate_collab_private_attr(self, "test_loader", "aggregator_step") + + self.include_agg_to_collab = 42 + self.exclude_agg_to_collab = 40 + self.next( + self.collaborator_step_a, + foreach="collaborators", + exclude=["exclude_agg_to_collab"], + ) + + @collaborator + def collaborator_step_a(self): + """ + Testing whether Collab private attributes are accessible in collab step + Aggregator private attributes should not be accessible here + """ + validate_agg_private_attrs( + self, "train_loader", "test_loader", "collaborator_step_a" + ) + + self.exclude_collab_to_collab = 2 + self.include_collab_to_collab = 22 + self.next(self.collaborator_step_b, exclude=["exclude_collab_to_collab"]) + + @collaborator + def collaborator_step_b(self): + """ + Testing whether Collab private attributes are accessible in collab step + Aggregator private attributes should not be accessible here + """ + + validate_agg_private_attrs( + self, "train_loader", "test_loader", "collaborator_step_b" + ) + self.exclude_collab_to_agg = 10 + self.include_collab_to_agg = 12 + self.next(self.join, exclude=["exclude_collab_to_agg"]) + + @aggregator + def join(self, inputs): + """ + Testing whether attributes are excluded from collab to agg + """ + # Aggregator should only be able to access its own attributes + if hasattr(self, "test_loader") is False: + TestFlowPrivateAttributes.ERROR_LIST.append( + "aggregator_join_aggregator_attributes_missing" + ) + print( + f"{bcolors.FAIL} ... Attribute test failed in join - aggregator private attributes" + + f" not accessible {bcolors.ENDC}" + ) + + for input in enumerate(inputs): + collab = input[1].input + if ( + hasattr(input, "train_loader") is True + or hasattr(input, "test_loader") is True + ): + # Error - we are able to access collaborator attributes + TestFlowPrivateAttributes.ERROR_LIST.append( + "join_collaborator_attributes_found" + ) + print( + f"{bcolors.FAIL} ... Attribute test failed in Join - Collaborator: {collab}" + + f" private attributes accessible {bcolors.ENDC}" + ) + + self.next(self.end) + + @aggregator + def end(self): + """ + This is the 'end' step. All flows must have an 'end' step, which is the + last step in the flow. + + """ + print( + f"{bcolors.OKBLUE}Testing FederatedFlow - Ending Test for accessibility of private " + + f"attributes {bcolors.ENDC}" + ) + + if TestFlowPrivateAttributes.ERROR_LIST: + raise ( + AssertionError( + f"{bcolors.FAIL}\n ...Test case failed ... {bcolors.ENDC}" + ) + ) + else: + print(f"{bcolors.OKGREEN}\n ...Test case passed ... {bcolors.ENDC}") + + TestFlowPrivateAttributes.ERROR_LIST = [] + + +def validate_collab_private_attr(self, private_attr, step_name): + # Aggregator should only be able to access its own attributes + if hasattr(self, private_attr) is False: + TestFlowPrivateAttributes.ERROR_LIST.append( + step_name + "_aggregator_attributes_missing" + ) + print( + f"{bcolors.FAIL} ...Failed in {step_name} - aggregator private attributes not " + + f"accessible {bcolors.ENDC}" + ) + + for idx, collab in enumerate(self.collaborators): + # Collaborator private attributes should not be accessible + if ( + type(self.collaborators[idx]) is not str + or hasattr(self.runtime, "_collaborators") is True + or hasattr(self.runtime, "__collaborators") is True + ): + # Error - we are able to access collaborator attributes + TestFlowPrivateAttributes.ERROR_LIST.append( + step_name + "_collaborator_attributes_found" + ) + print( + f"{bcolors.FAIL} ... Attribute test failed in {step_name} - collaborator {collab} " + + f"private attributes accessible {bcolors.ENDC}" + ) + + +def validate_agg_private_attrs(self, private_attr_1, private_attr_2, step_name): + # Collaborator should only be able to access its own attributes + if not hasattr(self, private_attr_1) or not hasattr(self, private_attr_2): + TestFlowPrivateAttributes.ERROR_LIST.append( + step_name + "collab_attributes_not_found" + ) + print( + f"{bcolors.FAIL} ... Attribute test failed in {step_name} - Collab " + + f"private attributes not accessible {bcolors.ENDC}" + ) + + if hasattr(self.runtime, "_aggregator") and isinstance(self.runtime._aggregator, Aggregator): + # Error - we are able to access aggregator attributes + TestFlowPrivateAttributes.ERROR_LIST.append( + step_name + "_aggregator_attributes_found" + ) + print( + f"{bcolors.FAIL} ... Attribute test failed in {step_name} - Aggregator" + + f" private attributes accessible {bcolors.ENDC}" + ) diff --git a/tests/github/experimental/workspace/testcase_private_attributes/src/utils.py b/tests/github/experimental/workspace/testcase_private_attributes/src/utils.py new file mode 100644 index 0000000000..1e56f3e68d --- /dev/null +++ b/tests/github/experimental/workspace/testcase_private_attributes/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/tests/github/experimental/workspace/testcase_reference/.workspace b/tests/github/experimental/workspace/testcase_reference/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/tests/github/experimental/workspace/testcase_reference/plan/cols.yaml b/tests/github/experimental/workspace/testcase_reference/plan/cols.yaml new file mode 100644 index 0000000000..59d4f60bce --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference/plan/cols.yaml @@ -0,0 +1,4 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_reference/plan/data.yaml b/tests/github/experimental/workspace/testcase_reference/plan/data.yaml new file mode 100644 index 0000000000..e97add9b78 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference/plan/data.yaml @@ -0,0 +1,18 @@ +## Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +col1: + callable_func: + settings: + index: 1 + template: src.collaborator_private_attrs.collaborator_private_attrs + +col2: + callable_func: + settings: + index: 2 + template: src.collaborator_private_attrs.collaborator_private_attrs diff --git a/tests/github/experimental/workspace/testcase_reference/plan/defaults b/tests/github/experimental/workspace/testcase_reference/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/tests/github/experimental/workspace/testcase_reference/plan/plan.yaml b/tests/github/experimental/workspace/testcase_reference/plan/plan.yaml new file mode 100644 index 0000000000..9a62ceb8fb --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference/plan/plan.yaml @@ -0,0 +1,26 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.aggregator.Aggregator + settings : + rounds_to_train : 2 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.collaborator.Collaborator + settings : {} + + +federated_flow: + template: src.testflow_reference.TestFlowReference + settings: + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_reference/requirements.txt b/tests/github/experimental/workspace/testcase_reference/requirements.txt new file mode 100644 index 0000000000..16b349007c --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference/requirements.txt @@ -0,0 +1,4 @@ +torch==1.13.1 +torchvision==0.14.1 +tensorboard +wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability diff --git a/tests/github/experimental/workspace/testcase_reference/src/__init__.py b/tests/github/experimental/workspace/testcase_reference/src/__init__.py new file mode 100644 index 0000000000..6e02c1c951 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/tests/github/experimental/workspace/testcase_reference/src/collaborator_private_attrs.py b/tests/github/experimental/workspace/testcase_reference/src/collaborator_private_attrs.py new file mode 100644 index 0000000000..d0520a3bcb --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference/src/collaborator_private_attrs.py @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +def collaborator_private_attrs(index): + return {"index": index + 1} diff --git a/tests/github/experimental/workspace/testcase_reference/src/testflow_reference.py b/tests/github/experimental/workspace/testcase_reference/src/testflow_reference.py new file mode 100644 index 0000000000..98ad6686b4 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference/src/testflow_reference.py @@ -0,0 +1,321 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from openfl.experimental.interface import FLSpec +from openfl.experimental.placement import aggregator, collaborator + +import io +import math +import logging +import torch.nn as nn +import torch.optim as optim +import inspect +from types import MethodType + + +class bcolors: # NOQA: N801 + HEADER = "\033[95m" + OKBLUE = "\033[94m" + OKCYAN = "\033[96m" + OKGREEN = "\033[92m" + WARNING = "\033[93m" + FAIL = "\033[91m" + ENDC = "\033[0m" + BOLD = "\033[1m" + UNDERLINE = "\033[4m" + + +class Net(nn.Module): + def __init__(self): + super(Net, self).__init__() + self.linear1 = nn.Linear(60, 100) + self.linear2 = nn.Linear(100, 10) + + def forward(self, x): + x = self.linear1(x) + x = self.linear2(x) + return x + + +class TestFlowReference(FLSpec): + + """ + Testflow to validate references of collabartor attributes in Federated Flow. + + """ + + step_one_collab_attrs = [] + step_two_collab_attrs = [] + all_ref_error_dict = {} + agg_attr_dict = {} + + @aggregator + def start(self): + """ + Flow start. + + """ + print( + f"{bcolors.OKBLUE}Testing FederatedFlow - Starting Test for validating references. " + + f"{bcolors.ENDC}" + ) + self.next(self.test_create_agg_attr) + + @aggregator + def test_create_agg_attr(self): + """ + Create different types of objects. + """ + + self.agg_attr_str = "Test string data" + self.agg_attr_list = [1, 2, 5, 6, 7, 8] + self.agg_attr_dict = {key: key for key in range(5)} + self.agg_attr_file = io.StringIO("Test file data in aggregator") + self.agg_attr_math = math.sqrt(2) + self.agg_attr_complex_num = complex(2, 3) + self.agg_attr_log = logging.getLogger("Test logger data in aggregator") + self.agg_attr_model = Net() + self.agg_attr_optimizer = optim.SGD( + self.agg_attr_model.parameters(), lr=1e-3, momentum=1e-2 + ) + self.collaborators = self.runtime.collaborators + + # get aggregator attributes + agg_attr_list = filter_attrs(inspect.getmembers(self)) + for attr in agg_attr_list: + agg_attr_id = id(getattr(self, attr)) + TestFlowReference.agg_attr_dict[attr] = agg_attr_id + self.next(self.test_create_collab_attr, foreach="collaborators") + + @collaborator + def test_create_collab_attr(self): + """ + Modify the attirbutes of aggregator to validate the references. + Create different types of objects. + """ + + self.agg_attr_str = self.agg_attr_str + " " + self.input + self.agg_attr_complex_num += complex(self.index, self.index) + self.agg_attr_math += self.index + self.agg_attr_log = " " + self.input + + self.collab_attr_str_one = "Test string data in collab " + self.input + self.collab_attr_list_one = [1, 2, 5, 6, 7, 8] + self.collab_attr_dict_one = {key: key for key in range(5)} + self.collab_attr_file_one = io.StringIO("Test file data in collaborator") + self.collab_attr_math_one = math.sqrt(self.index) + self.collab_attr_complex_num_one = complex(self.index, self.index) + + # append attributes of collaborator + TestFlowReference.step_one_collab_attrs.append(self) + + if len(TestFlowReference.step_one_collab_attrs) >= 2: + collab_attr_list = filter_attrs(inspect.getmembers(self)) + matched_ref_dict = find_matched_references( + collab_attr_list, TestFlowReference.step_one_collab_attrs + ) + validate_collab_references(matched_ref_dict) + + self.next(self.test_create_more_collab_attr) + + @collaborator + def test_create_more_collab_attr(self): + """ + Create different types of objects. + """ + + self.collab_attr_str_two = "String reference three " + self.input + self.collab_attr_list_two = [1, 2, 3, 5, 6, 8] + self.collab_attr_dict_two = {key: key for key in range(5)} + self.collab_attr_file_two = io.StringIO("Test file reference one") + self.collab_attr_math_two = math.sqrt(2) + self.collab_attr_complex_num_two = complex(2, 3) + + TestFlowReference.step_two_collab_attrs.append(self) + + if len(TestFlowReference.step_two_collab_attrs) >= 2: + collab_attr_list = filter_attrs(inspect.getmembers(self)) + matched_ref_dict = find_matched_references( + collab_attr_list, TestFlowReference.step_two_collab_attrs + ) + validate_collab_references(matched_ref_dict) + + self.next(self.join) + + @aggregator + def join(self, inputs): + """ + Iterate over the references of collaborator attributes + validate uniqueness of attributes and raise assertion + """ + + all_attr_list = filter_attrs(inspect.getmembers(inputs[0])) + agg_attrs = filter_attrs(inspect.getmembers(self)) + + # validate aggregator references are intact after coming out of collaborators. + validate_agg_attr_ref(agg_attrs, self) + + # validate collaborators references are not shared in between. + matched_ref_dict = find_matched_references(all_attr_list, inputs) + validate_collab_references(matched_ref_dict) + + # validate aggregator references are not shared with any of the collaborators . + validate_agg_collab_references(inputs, self, agg_attrs) + + all_shared_attr = "" + print(f"\n{bcolors.UNDERLINE}Reference test summary: {bcolors.ENDC}\n") + for val in TestFlowReference.all_ref_error_dict.values(): + all_shared_attr = all_shared_attr + ",".join(val) + if all_shared_attr: + print( + f"{bcolors.FAIL}...Test case failed for {all_shared_attr} {bcolors.ENDC}" + ) + else: + print( + f"{bcolors.OKGREEN}...Test case passed for all the attributes.{bcolors.ENDC}" + ) + + self.next(self.end) + + @aggregator + def end(self): + """ + This is the 'end' step. All flows must have an 'end' step, which is the + last step in the flow. + + """ + print( + f"{bcolors.OKBLUE}Testing FederatedFlow - Ending test for validating the references. " + + f"{bcolors.ENDC}" + ) + if TestFlowReference.all_ref_error_dict: + raise ( + AssertionError( + f"{bcolors.FAIL}\n ...Test case failed ... {bcolors.ENDC}" + ) + ) + + TestFlowReference.step_one_collab_attrs = [] + TestFlowReference.step_two_collab_attrs = [] + TestFlowReference.all_ref_error_dict = {} + + +def filter_attrs(attr_list): + valid_attrs = [] + reserved_words = ["next", "runtime", "execute_next"] + for attr in attr_list: + if ( + not attr[0].startswith("_") + and attr[0] not in reserved_words + and not hasattr(TestFlowReference, attr[0]) + ): + if not isinstance(attr[1], MethodType): + valid_attrs.append(attr[0]) + return valid_attrs + + +def find_matched_references(collab_attr_list, all_collaborators): + """ + Iterate attributes of collborator and capture the duplicate reference + return: dict: { + 'Portland': ['failed attributes'], 'Seattle': [], + } + """ + matched_ref_dict = {} + for i in range(len(all_collaborators)): + matched_ref_dict[all_collaborators[i].input] = [] + + # For each attribute in the collaborator attribute list, check if any of the collaborator + # attributes are shared with another collaborator + for attr_name in collab_attr_list: + for i, curr_collab in enumerate(all_collaborators): + # Compare the current collaborator with the collaborator(s) that come(s) after it. + for next_collab in all_collaborators[i + 1:]: + # Check if both collaborators have the current attribute + if hasattr(curr_collab, attr_name) and hasattr(next_collab, attr_name): + # Check if both collaborators are sharing same reference + if id(getattr(curr_collab, attr_name)) is id(getattr( + next_collab, attr_name + )): + matched_ref_dict[curr_collab.input].append(attr_name) + print( + f"{bcolors.FAIL} ... Reference test failed - {curr_collab.input} \ + sharing same " + + f"{attr_name} reference with {next_collab.input} {bcolors.ENDC}" + ) + + return matched_ref_dict + + +def validate_collab_references(matched_ref_dict): + """ + Iterate reference list and raise assertion for conflicts + """ + collborators_sharing_ref = [] + reference_flag = False + + for collab, val in matched_ref_dict.items(): + if val: + collborators_sharing_ref.append(collab) + reference_flag = True + if collborators_sharing_ref: + for collab in collborators_sharing_ref: + if collab not in TestFlowReference.all_ref_error_dict: + TestFlowReference.all_ref_error_dict[collab] = matched_ref_dict.get( + collab + ) + + if not reference_flag: + print( + f"{bcolors.OKGREEN} Pass : Reference test passed for collaborators. {bcolors.ENDC}" + ) + + +def validate_agg_attr_ref(agg_attrs, agg_obj): + """ + Verifies aggregator attributes are retained after + collaborator execution + """ + attr_flag = False + for attr in agg_attrs: + if TestFlowReference.agg_attr_dict.get(attr) == id(getattr(agg_obj, attr)): + attr_flag = True + if not attr_flag: + print( + f"{bcolors.FAIL}...Aggregator references are not intact after coming out of " + + f"collaborators.{bcolors.ENDC}" + ) + else: + print( + f"{bcolors.OKGREEN} Pass : Aggregator references are intact after coming out of " + + f"collaborators.{bcolors.ENDC}" + ) + + +def validate_agg_collab_references(all_collborators, agg_obj, agg_attrs): + """ + Iterate attributes of aggregator and collborator to capture the mismatched references. + """ + + mis_matched_ref = {} + for collab in all_collborators: + mis_matched_ref[collab.input] = [] + + attr_ref_flag = False + for attr in agg_attrs: + agg_attr_id = id(getattr(agg_obj, attr)) + for collab in all_collborators: + collab_attr_id = id(getattr(collab, attr)) + if agg_attr_id is collab_attr_id: + attr_ref_flag = True + mis_matched_ref.get(collab).append(attr) + + if attr_ref_flag: + print( + f"{bcolors.FAIL}...Aggregator references are shared between collaborators." + + f"{bcolors.ENDC}" + ) + else: + print( + f"{bcolors.OKGREEN} Pass : Reference test passed for aggregator.{bcolors.ENDC}" + ) diff --git a/tests/github/experimental/workspace/testcase_reference/src/utils.py b/tests/github/experimental/workspace/testcase_reference/src/utils.py new file mode 100644 index 0000000000..1e56f3e68d --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/tests/github/experimental/workspace/testcase_reference_with_include_exclude/.workspace b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/tests/github/experimental/workspace/testcase_reference_with_include_exclude/plan/cols.yaml b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/plan/cols.yaml new file mode 100644 index 0000000000..59d4f60bce --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/plan/cols.yaml @@ -0,0 +1,4 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_reference_with_include_exclude/plan/data.yaml b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/plan/data.yaml new file mode 100644 index 0000000000..43d733f864 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/plan/data.yaml @@ -0,0 +1,10 @@ +## Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +col1: + +col2: diff --git a/tests/github/experimental/workspace/testcase_reference_with_include_exclude/plan/defaults b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/tests/github/experimental/workspace/testcase_reference_with_include_exclude/plan/plan.yaml b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/plan/plan.yaml new file mode 100644 index 0000000000..d54aed913a --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/plan/plan.yaml @@ -0,0 +1,26 @@ +# Copyright (C) 2020-2023 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.aggregator.Aggregator + settings : + rounds_to_train : 2 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.collaborator.Collaborator + settings : {} + + +federated_flow: + template: src.testflow_reference_with_include_exclude.TestFlowReferenceWithIncludeExclude + settings: + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_reference_with_include_exclude/requirements.txt b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/requirements.txt new file mode 100644 index 0000000000..16b349007c --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/requirements.txt @@ -0,0 +1,4 @@ +torch==1.13.1 +torchvision==0.14.1 +tensorboard +wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability diff --git a/tests/github/experimental/workspace/testcase_reference_with_include_exclude/src/__init__.py b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/src/__init__.py new file mode 100644 index 0000000000..6e02c1c951 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/tests/github/experimental/workspace/testcase_reference_with_include_exclude/src/testflow_reference_with_include_exclude.py b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/src/testflow_reference_with_include_exclude.py new file mode 100644 index 0000000000..7799c493bc --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/src/testflow_reference_with_include_exclude.py @@ -0,0 +1,240 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from openfl.experimental.interface import FLSpec +from openfl.experimental.placement import aggregator, collaborator + +import torch.nn as nn +import torch.optim as optim +import inspect +from types import MethodType + +MIN_COLLECTION_COUNT = 2 + + +class bcolors: # NOQA: N801 + HEADER = "\033[95m" + OKBLUE = "\033[94m" + OKCYAN = "\033[96m" + OKGREEN = "\033[92m" + WARNING = "\033[93m" + FAIL = "\033[91m" + ENDC = "\033[0m" + BOLD = "\033[1m" + UNDERLINE = "\033[4m" + + +class Net(nn.Module): + def __init__(self): + super(Net, self).__init__() + self.linear1 = nn.Linear(60, 100) + self.linear2 = nn.Linear(100, 10) + + def forward(self, x): + x = self.linear1(x) + x = self.linear2(x) + return x + + +class TestFlowReferenceWithIncludeExclude(FLSpec): + + """ + Testflow to validate references of collabartor attributes in Federated Flow with include. + + """ + step_one_collab_attrs = [] + step_two_collab_attrs = [] + all_ref_error_dict = {} + + @aggregator + def start(self): + """ + Flow start. + + """ + self.agg_agg_attr_dict = {key: key for key in range(5)} + print( + f"{bcolors.OKBLUE}Testing FederatedFlow - Starting Test for validating references " + + f"{bcolors.ENDC}" + ) + self.next(self.test_create_agg_attr, exclude=["agg_agg_attr_dict"]) + + @aggregator + def test_create_agg_attr(self): + """ + Create different types of objects + """ + + self.agg_attr_list = [1, 2, 5, 6, 7, 8] + self.agg_attr_dict = {key: key for key in range(5)} + + self.agg_attr_model = Net() + self.agg_attr_optimizer = optim.SGD( + self.agg_attr_model.parameters(), lr=1e-3, momentum=1e-2 + ) + self.collaborators = self.runtime.collaborators + self.next( + self.test_create_collab_attr, + foreach="collaborators", + include=["collaborators", "agg_attr_list"], + ) + + @collaborator + def test_create_collab_attr(self): + """ + Modify the attirbutes of aggregator to validate the references. + Create different types of objects. + """ + + self.collab_attr_list_one = [1, 2, 5, 6, 7, 8] + self.collab_attr_dict_one = {key: key for key in range(5)} + + # append self attributes of collaborators + TestFlowReferenceWithIncludeExclude.step_one_collab_attrs.append(self) + + if ( + len(TestFlowReferenceWithIncludeExclude.step_one_collab_attrs) + >= MIN_COLLECTION_COUNT + ): + collab_attr_list = filter_attrs(inspect.getmembers(self)) + matched_ref_dict = find_matched_references( + collab_attr_list, + TestFlowReferenceWithIncludeExclude.step_one_collab_attrs, + ) + validate_references(matched_ref_dict) + + self.next(self.test_create_more_collab_attr, exclude=["collab_attr_dict_one"]) + + @collaborator + def test_create_more_collab_attr(self): + """ + Create different types of objects. + """ + + self.collab_attr_list_two = [1, 2, 3, 5, 6, 8] + self.collab_attr_dict_two = {key: key for key in range(5)} + + TestFlowReferenceWithIncludeExclude.step_two_collab_attrs.append(self) + + if ( + len(TestFlowReferenceWithIncludeExclude.step_two_collab_attrs) + >= MIN_COLLECTION_COUNT + ): + collab_attr_list = filter_attrs(inspect.getmembers(self)) + matched_ref_dict = find_matched_references( + collab_attr_list, + TestFlowReferenceWithIncludeExclude.step_two_collab_attrs, + ) + validate_references(matched_ref_dict) + + self.next(self.join, include=["collab_attr_dict_two"]) + + @aggregator + def join(self, inputs): + """ + Iterate over the references of collaborator attributes + validate uniqueness of attributes and raise assertion + """ + + all_attr_list = filter_attrs(inspect.getmembers(inputs[0])) + + matched_ref_dict = find_matched_references(all_attr_list, inputs) + validate_references(matched_ref_dict) + all_shared_attr = "" + print(f"\n{bcolors.UNDERLINE}Reference test summary: {bcolors.ENDC}\n") + for val in TestFlowReferenceWithIncludeExclude.all_ref_error_dict.values(): + all_shared_attr = all_shared_attr + ",".join(val) + if all_shared_attr: + print( + f"{bcolors.FAIL}...Test case failed for {all_shared_attr} {bcolors.ENDC}" + ) + else: + print(f"{bcolors.OKGREEN}...Test case passed for all the attributes.") + + self.next(self.end) + + @aggregator + def end(self): + print( + f"{bcolors.OKBLUE}Testing FederatedFlow - Ending test for validatng the references. " + + f"{bcolors.ENDC}" + ) + if TestFlowReferenceWithIncludeExclude.all_ref_error_dict: + raise ( + AssertionError( + f"{bcolors.FAIL}\n ...Test case failed ... {bcolors.ENDC}" + ) + ) + + TestFlowReferenceWithIncludeExclude.step_one_collab_attrs = [] + TestFlowReferenceWithIncludeExclude.step_two_collab_attrs = [] + TestFlowReferenceWithIncludeExclude.all_ref_error_dict = {} + + +def filter_attrs(attr_list): + valid_attrs = [] + reserved_words = ["next", "runtime", "execute_next"] + for attr in attr_list: + if ( + not attr[0].startswith("_") + and attr[0] not in reserved_words + and not hasattr(TestFlowReferenceWithIncludeExclude, attr[0]) + ): + if not isinstance(attr[1], MethodType): + valid_attrs.append(attr[0]) + return valid_attrs + + +def find_matched_references(collab_attr_list, all_collaborators): + """ + Iterate attributes of collborator and capture the duplicate reference + return: dict: { + 'Portland': ['failed attributes'], 'Seattle': [], + } + """ + matched_ref_dict = {} + for i in range(len(all_collaborators)): + matched_ref_dict[all_collaborators[i].input] = [] + + # For each attribute in the collaborator attribute list, check if any of the collaborator + # attributes are shared with another collaborator + for attr_name in collab_attr_list: + for i, curr_collab in enumerate(all_collaborators): + # Compare the current collaborator with the collaborator(s) that come(s) after it. + for next_collab in all_collaborators[i + 1:]: + # Check if both collaborators have the current attribute + if hasattr(curr_collab, attr_name) and hasattr(next_collab, attr_name): + # Check if both collaborators are sharing same reference + if getattr(curr_collab, attr_name) is getattr( + next_collab, attr_name + ): + matched_ref_dict[curr_collab.input].append(attr_name) + print( + f"{bcolors.FAIL} ... Reference test failed - {curr_collab.input} \ + sharing same " + + f"{attr_name} reference with {next_collab.input} {bcolors.ENDC}" + ) + + return matched_ref_dict + + +def validate_references(matched_ref_dict): + """ + Iterate reference list and raise assertion for conflicts + """ + collborators_sharing_ref = [] + reference_flag = False + + for collab, val in matched_ref_dict.items(): + if val: + collborators_sharing_ref.append(collab) + reference_flag = True + if collborators_sharing_ref: + for collab in collborators_sharing_ref: + if collab not in TestFlowReferenceWithIncludeExclude.all_ref_error_dict: + TestFlowReferenceWithIncludeExclude.all_ref_error_dict[ + collab + ] = matched_ref_dict.get(collab) + + if not reference_flag: + print(f"{bcolors.OKGREEN} Pass : Reference test passed {bcolors.ENDC}") diff --git a/tests/github/experimental/workspace/testcase_reference_with_include_exclude/src/utils.py b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/src/utils.py new file mode 100644 index 0000000000..1e56f3e68d --- /dev/null +++ b/tests/github/experimental/workspace/testcase_reference_with_include_exclude/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/tests/github/experimental/workspace/testcase_subset_of_collaborators/.workspace b/tests/github/experimental/workspace/testcase_subset_of_collaborators/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_subset_of_collaborators/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/tests/github/experimental/workspace/testcase_subset_of_collaborators/plan/cols.yaml b/tests/github/experimental/workspace/testcase_subset_of_collaborators/plan/cols.yaml new file mode 100644 index 0000000000..95307de3bc --- /dev/null +++ b/tests/github/experimental/workspace/testcase_subset_of_collaborators/plan/cols.yaml @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +collaborators: + \ No newline at end of file diff --git a/tests/github/experimental/workspace/testcase_subset_of_collaborators/plan/data.yaml b/tests/github/experimental/workspace/testcase_subset_of_collaborators/plan/data.yaml new file mode 100644 index 0000000000..856ed96773 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_subset_of_collaborators/plan/data.yaml @@ -0,0 +1,30 @@ +## Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs. +# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object +# construct the shard of the mnist dataset to be use for this collaborator. + +col1: + callable_func: + settings: + collab_name: col1 + template: src.collaborator_private_attrs.callable_to_initialize_collaborator_private_attributes + +col2: + callable_func: + settings: + collab_name: col2 + template: src.collaborator_private_attrs.callable_to_initialize_collaborator_private_attributes + +col3: + callable_func: + settings: + collab_name: col3 + template: src.collaborator_private_attrs.callable_to_initialize_collaborator_private_attributes + +col4: + callable_func: + settings: + collab_name: col4 + template: src.collaborator_private_attrs.callable_to_initialize_collaborator_private_attributes diff --git a/tests/github/experimental/workspace/testcase_subset_of_collaborators/plan/defaults b/tests/github/experimental/workspace/testcase_subset_of_collaborators/plan/defaults new file mode 100644 index 0000000000..fb82f9c5b6 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_subset_of_collaborators/plan/defaults @@ -0,0 +1,2 @@ +../../workspace/plan/defaults + diff --git a/tests/github/experimental/workspace/testcase_subset_of_collaborators/plan/plan.yaml b/tests/github/experimental/workspace/testcase_subset_of_collaborators/plan/plan.yaml new file mode 100644 index 0000000000..ff8ae1a463 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_subset_of_collaborators/plan/plan.yaml @@ -0,0 +1,26 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.experimental.component.aggregator.Aggregator + settings : + rounds_to_train : 1 + log_metric_callback : + template : src.utils.write_metric + + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.experimental.component.collaborator.Collaborator + settings : {} + + +federated_flow: + template: src.testflow_subset_of_collaborators.TestFlowSubsetCollaborators + settings: + checkpoint: true + + +network : + defaults : plan/defaults/network.yaml diff --git a/tests/github/experimental/workspace/testcase_subset_of_collaborators/requirements.txt b/tests/github/experimental/workspace/testcase_subset_of_collaborators/requirements.txt new file mode 100644 index 0000000000..32a96eaef3 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_subset_of_collaborators/requirements.txt @@ -0,0 +1 @@ +wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability diff --git a/tests/github/experimental/workspace/testcase_subset_of_collaborators/src/__init__.py b/tests/github/experimental/workspace/testcase_subset_of_collaborators/src/__init__.py new file mode 100644 index 0000000000..6e02c1c951 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_subset_of_collaborators/src/__init__.py @@ -0,0 +1,2 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 diff --git a/tests/github/experimental/workspace/testcase_subset_of_collaborators/src/collaborator_private_attrs.py b/tests/github/experimental/workspace/testcase_subset_of_collaborators/src/collaborator_private_attrs.py new file mode 100644 index 0000000000..883cc7db87 --- /dev/null +++ b/tests/github/experimental/workspace/testcase_subset_of_collaborators/src/collaborator_private_attrs.py @@ -0,0 +1,5 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +def callable_to_initialize_collaborator_private_attributes(collab_name): + return {"name": collab_name} diff --git a/tests/github/experimental/workspace/testcase_subset_of_collaborators/src/testflow_subset_of_collaborators.py b/tests/github/experimental/workspace/testcase_subset_of_collaborators/src/testflow_subset_of_collaborators.py new file mode 100644 index 0000000000..a6aa013e4e --- /dev/null +++ b/tests/github/experimental/workspace/testcase_subset_of_collaborators/src/testflow_subset_of_collaborators.py @@ -0,0 +1,132 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +from metaflow import Flow + +from openfl.experimental.interface.fl_spec import FLSpec +from openfl.experimental.placement.placement import aggregator, collaborator + + +class bcolors: # NOQA: N801 + OKBLUE = "\033[94m" + OKCYAN = "\033[96m" + OKGREEN = "\033[92m" + HEADER = "\033[95m" + WARNING = "\033[93m" + FAIL = "\033[91m" + BOLD = "\033[1m" + UNDERLINE = "\033[4m" + ENDC = "\033[0m" + + +class TestFlowSubsetCollaborators(FLSpec): + """ + Testflow to validate working of Subset Collaborators in Federated Flow. + """ + + def __init__(self, **kwargs) -> None: + super().__init__(**kwargs) + + @aggregator + def start(self): + """ + Starting the flow with random subset of collaborators + """ + print( + f"{bcolors.OKBLUE}Testing FederatedFlow - Starting Test for " + + f"validating Subset of collaborators {bcolors.ENDC}" + ) + self.collaborators = self.runtime.collaborators + + # select subset of collaborators + self.subset_collabrators = self.collaborators[:2] + + print( + f"... Executing flow for {len(self.subset_collabrators)} collaborators out of Total: " + + f"{len(self.collaborators)}" + ) + + self.next(self.test_valid_collaborators, foreach="subset_collabrators") + + @collaborator + def test_valid_collaborators(self): + """ + set the collaborator name + """ + print("executing collaborator step test_valid_collaborators for " + + f"collaborator {self.name}.") + self.collaborator_ran = self.name + self.next(self.join) + + @aggregator + def join(self, inputs): + """ + List of collaboartors ran successfully + """ + print("inside join") + self.collaborators_ran = [input.collaborator_ran for input in inputs] + self.next(self.end) + + @aggregator + def end(self): + """ + End of the flow + """ + print(f"End of the test case {TestFlowSubsetCollaborators.__name__} reached.") + testcase() + + +def testcase(): + tc_pass_fail = { + "passed": [], "failed": [] + } + subset_collaborators = ["col1", "col2"] + f = Flow("TestFlowSubsetCollaborators/") + r = f.latest_run + # Collaborator test_valid_collaborators step + step = list(r)[1] + # Aggregator join step + join = list(r)[0] + + collaborators_ran = list(join)[0].data.collaborators_ran + print(f"collaborators_ran: {collaborators_ran}") + + if len(list(step)) != len(subset_collaborators): + tc_pass_fail["failed"].append( + f"{bcolors.FAIL}...Flow only ran for {len(list(step))} " + + f"instead of the {len(subset_collaborators)} expected " + + f"collaborators- Testcase Failed.{bcolors.ENDC} " + ) + else: + tc_pass_fail["passed"].append( + f"{bcolors.OKGREEN}Found {len(list(step))} tasks for each of the " + + f"{len(subset_collaborators)} collaborators - " + + f"Testcase Passed.{bcolors.ENDC}" + ) + passed = True + for collaborator_name in subset_collaborators: + if collaborator_name not in collaborators_ran: + passed = False + tc_pass_fail["failed"].append( + f"{bcolors.FAIL}...Flow did not execute for " + + f"collaborator {collaborator_name}" + + f" - Testcase Failed.{bcolors.ENDC}" + ) + + if passed: + tc_pass_fail["passed"].append( + f"{bcolors.OKGREEN}Flow executed for all collaborators" + + f"- Testcase Passed.{bcolors.ENDC}" + ) + for values in tc_pass_fail.values(): + print(*values, sep="\n") + + print( + f"{bcolors.OKBLUE}Testing FederatedFlow - Ending test for validating " + + f"the subset of collaborators. {bcolors.ENDC}" + ) + if tc_pass_fail.get("failed"): + tc_pass_fail_len = len(tc_pass_fail.get("failed")) + raise AssertionError( + f"{bcolors.FAIL}\n {tc_pass_fail_len} Test " + + f"case(s) failed ... {bcolors.ENDC}" + ) diff --git a/tests/github/experimental/workspace/testcase_subset_of_collaborators/src/utils.py b/tests/github/experimental/workspace/testcase_subset_of_collaborators/src/utils.py new file mode 100644 index 0000000000..1e56f3e68d --- /dev/null +++ b/tests/github/experimental/workspace/testcase_subset_of_collaborators/src/utils.py @@ -0,0 +1,20 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +from torch.utils.tensorboard import SummaryWriter + + +writer = None + + +def get_writer(): + """Create global writer object.""" + global writer + if not writer: + writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5) + + +def write_metric(node_name, task_name, metric_name, metric, round_number): + """Write metric callback.""" + get_writer() + writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number) diff --git a/tests/github/experimental/workspace/utils.py b/tests/github/experimental/workspace/utils.py new file mode 100644 index 0000000000..7f7da4496f --- /dev/null +++ b/tests/github/experimental/workspace/utils.py @@ -0,0 +1,143 @@ +# Copyright (C) 2020-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +import shutil +from subprocess import check_call +import os +from pathlib import Path +import re +import tarfile + + +def create_collaborator(col, workspace_root, archive_name, fed_workspace): + # Copy workspace to collaborator directories (these can be on different machines) + col_path = workspace_root / col + shutil.rmtree(col_path, ignore_errors=True) # Remove any existing directory + col_path.mkdir() # Create a new directory for the collaborator + + # Import the workspace to this collaborator + check_call( + ['fx', 'workspace', 'import', '--archive', workspace_root / archive_name], + cwd=col_path + ) + + # Create collaborator certificate request and + # Remove '--silent' if you run this manually + check_call( + ['fx', 'collaborator', 'generate-cert-request', '-n', col, '--silent'], + cwd=col_path / fed_workspace + ) + + # Sign collaborator certificate + # Remove '--silent' if you run this manually + request_pkg = col_path / fed_workspace / f'col_{col}_to_agg_cert_request.zip' + check_call( + ['fx', 'collaborator', 'certify', '--request-pkg', str(request_pkg), '--silent'], + cwd=workspace_root) + + # Import the signed certificate from the aggregator + import_path = workspace_root / f'agg_to_col_{col}_signed_cert.zip' + check_call( + ['fx', 'collaborator', 'certify', '--import', import_path], + cwd=col_path / fed_workspace + ) + + +def create_certified_workspace(path, custom_template, template, fqdn, rounds_to_train): + shutil.rmtree(path, ignore_errors=True) + if template is not None: + check_call( + ['fx', 'workspace', 'create', '--prefix', path, '--template', template] + ) + else: + check_call( + ['fx', 'workspace', 'create', '--prefix', path, + '--custom_template', custom_template] + ) + os.chdir(path) + + # Initialize FL plan + check_call(['fx', 'plan', 'initialize', '-a', fqdn]) + plan_path = Path('plan/plan.yaml') + try: + rounds_to_train = int(rounds_to_train) + with open(plan_path, "r", encoding='utf-8') as sources: + lines = sources.readlines() + with open(plan_path, "w", encoding='utf-8') as sources: + for line in lines: + sources.write( + re.sub(r'rounds_to_train.*', f'rounds_to_train: {rounds_to_train}', line) + ) + except (ValueError, TypeError): + pass + # Create certificate authority for workspace + check_call(['fx', 'workspace', 'certify']) + + # Export FL workspace + check_call(['fx', 'workspace', 'export']) + + +def certify_aggregator(fqdn): + # Create aggregator certificate + check_call(['fx', 'aggregator', 'generate-cert-request', '--fqdn', fqdn]) + + # Sign aggregator certificate + check_call(['fx', 'aggregator', 'certify', '--fqdn', fqdn, '--silent']) + + +def create_signed_cert_for_collaborator(col, data_path): + ''' + We do certs exchage for all participants in a single workspace to speed up this test run. + Do not do this in real experiments in untrusted environments + ''' + print(f'Certifying collaborator {col} with data path {data_path}...') + # Create collaborator certificate request + check_call([ + 'fx', 'collaborator', 'generate-cert-request', '-n', col, '--silent' + ]) + # Sign collaborator certificate + check_call([ + 'fx', + 'collaborator', + 'certify', + '--request-pkg', + f'col_{col}_to_agg_cert_request.zip', + '--silent' + ]) + + # Pack the collaborators private key and the signed cert + # as well as it's data.yaml to a tarball + tarfiles = ['plan/data.yaml', f'agg_to_col_{col}_signed_cert.zip'] + with os.scandir('cert/client') as iterator: + for entry in iterator: + if entry.name.endswith('key'): + tarfiles.append(entry.path) + with tarfile.open(f'cert_col_{col}.tar', 'w') as t: + for f in tarfiles: + t.add(f) + for f in tarfiles: + os.remove(f) + # Remove request archive + os.remove(f'col_{col}_to_agg_cert_request.zip') + + +def start_aggregator_container(workspace_image_name, aggregator_required_files): + check_call( + 'docker run --rm ' + '--network host ' + f'-v {Path.cwd().resolve()}/{aggregator_required_files}:/certs.tar ' + '-e \"CONTAINER_TYPE=aggregator\" ' + f'{workspace_image_name} ' + 'bash /openfl/openfl-docker/start_actor_in_container.sh', + shell=True) + + +def start_collaborator_container(workspace_image_name, col_name): + check_call( + 'docker run --rm ' + '--network host ' + f'-v {Path.cwd()}/cert_col_{col_name}.tar:/certs.tar ' + '-e \"CONTAINER_TYPE=collaborator\" ' + f'-e \"COL={col_name}\" ' + f'{workspace_image_name} ' + 'bash /openfl/openfl-docker/start_actor_in_container.sh', + shell=True) diff --git a/tests/github/experimental/workspace/workspace/.workspace b/tests/github/experimental/workspace/workspace/.workspace new file mode 100644 index 0000000000..3c2c5d08b4 --- /dev/null +++ b/tests/github/experimental/workspace/workspace/.workspace @@ -0,0 +1,2 @@ +current_plan_name: default + diff --git a/tests/github/experimental/workspace/workspace/__init__.py b/tests/github/experimental/workspace/workspace/__init__.py new file mode 100644 index 0000000000..f1410b1298 --- /dev/null +++ b/tests/github/experimental/workspace/workspace/__init__.py @@ -0,0 +1,3 @@ +# Copyright (C) 2020-2021 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +"""You may copy this file as the starting point of your own model.""" diff --git a/tests/github/experimental/workspace/workspace/plan/defaults/aggregator.yaml b/tests/github/experimental/workspace/workspace/plan/defaults/aggregator.yaml new file mode 100644 index 0000000000..78f0242dc6 --- /dev/null +++ b/tests/github/experimental/workspace/workspace/plan/defaults/aggregator.yaml @@ -0,0 +1 @@ +template : openfl.experimental.component.Aggregator \ No newline at end of file diff --git a/tests/github/experimental/workspace/workspace/plan/defaults/collaborator.yaml b/tests/github/experimental/workspace/workspace/plan/defaults/collaborator.yaml new file mode 100644 index 0000000000..1c561cf5f5 --- /dev/null +++ b/tests/github/experimental/workspace/workspace/plan/defaults/collaborator.yaml @@ -0,0 +1 @@ +template : openfl.experimental.component.Collaborator \ No newline at end of file diff --git a/tests/github/experimental/workspace/workspace/plan/defaults/network.yaml b/tests/github/experimental/workspace/workspace/plan/defaults/network.yaml new file mode 100644 index 0000000000..07d2e3aeec --- /dev/null +++ b/tests/github/experimental/workspace/workspace/plan/defaults/network.yaml @@ -0,0 +1,9 @@ +template: openfl.federation.Network +settings: + agg_addr : auto + agg_port : auto + hash_salt : auto + tls : True + client_reconnect_interval : 5 + disable_client_auth : False + cert_folder : cert diff --git a/tests/github/experimental/workspace/workspace/plan/plans/default/base_plan_interactive_api.yaml b/tests/github/experimental/workspace/workspace/plan/plans/default/base_plan_interactive_api.yaml new file mode 100644 index 0000000000..06370bd272 --- /dev/null +++ b/tests/github/experimental/workspace/workspace/plan/plans/default/base_plan_interactive_api.yaml @@ -0,0 +1,36 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.component.Aggregator + settings : + init_state_path : save/init.pbuf + best_state_path : save/best.pbuf + last_state_path : save/last.pbuf + rounds_to_train : 10 + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.component.Collaborator + settings : + delta_updates : false + opt_treatment : RESET + +data_loader : + defaults : plan/defaults/data_loader.yaml + +task_runner : + template : openfl.federated.task.task_runner.CoreTaskRunner + +network : + defaults : plan/defaults/network.yaml + +assigner : + defaults : plan/defaults/assigner.yaml + +tasks : + defaults : null + +compression_pipeline : + defaults : plan/defaults/compression_pipeline.yaml \ No newline at end of file diff --git a/tests/github/experimental/workspace/workspace/plan/plans/default/plan.yaml b/tests/github/experimental/workspace/workspace/plan/plans/default/plan.yaml new file mode 100644 index 0000000000..af976f3f43 --- /dev/null +++ b/tests/github/experimental/workspace/workspace/plan/plans/default/plan.yaml @@ -0,0 +1,39 @@ +# Copyright (C) 2020-2021 Intel Corporation +# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +aggregator : + defaults : plan/defaults/aggregator.yaml + template : openfl.component.Aggregator + settings : + init_state_path : save/init.pbuf + best_state_path : save/best.pbuf + last_state_path : save/last.pbuf + rounds_to_train : 10 + +collaborator : + defaults : plan/defaults/collaborator.yaml + template : openfl.component.Collaborator + settings : + delta_updates : false + opt_treatment : RESET + +data_loader : + defaults : plan/defaults/data_loader.yaml + template : src.tfmnist_inmemory.TensorFlowMNISTInMemory + settings : + collaborator_count : 2 + data_group_name : mnist + batch_size : 256 + +task_runner : + defaults : plan/defaults/task_runner.yaml + template : src.keras_cnn.KerasCNN + +network : + defaults : plan/defaults/network.yaml + +assigner : + defaults : plan/defaults/assigner.yaml + +tasks : + defaults : plan/defaults/tasks_keras.yaml