Skip to content

Commit

Permalink
Merge branch 'kta/neuralchat_finetune' of https://github.com/kta-inte…
Browse files Browse the repository at this point in the history
…l/openfl into kta/neuralchat_finetune
  • Loading branch information
kta-intel committed Jan 8, 2024
2 parents 851f114 + 8770aaf commit 2f7868a
Show file tree
Hide file tree
Showing 14 changed files with 538 additions and 7 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/fets-challenge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up Python 3.8
- name: Set up Python 3.10
uses: actions/setup-python@v3
with:
python-version: "3.8"
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install torch==1.13.1+cpu torchvision==0.14.1+cpu torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cpu
pip install torch==2.1.0+cpu torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
pip install .
- name: Setup FeTS Challenge Prerequisites
uses: actions/checkout@master
Expand Down
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,11 @@ OpenFL supports training with TensorFlow 2+ or PyTorch 1.3+ which should be inst


### Background
OpenFL builds on a collaboration between Intel and the University of Pennsylvania (UPenn) to develop the [Federated Tumor Segmentation (FeTS, www.fets.ai)](https://www.fets.ai/) platform (grant award number: U01-CA242871).
OpenFL builds on a collaboration between Intel and the Bakas lab at the University of Pennsylvania (UPenn) to develop the [Federated Tumor Segmentation (FeTS, www.fets.ai)](https://www.fets.ai/) platform (grant award number: U01-CA242871).

The grant for FeTS was awarded to the [Center for Biomedical Image Computing and Analytics (CBICA)](https://www.cbica.upenn.edu/) at UPenn (PI: S. Bakas) from the [Informatics Technology for Cancer Research (ITCR)](https://itcr.cancer.gov/) program of the National Cancer Institute (NCI) of the National Institutes of Health (NIH).
The grant for FeTS was awarded from the [Informatics Technology for Cancer Research (ITCR)](https://itcr.cancer.gov/) program of the National Cancer Institute (NCI) of the National Institutes of Health (NIH), to Dr Spyridon Bakas (Principal Investigator) when he was affiliated with the [Center for Biomedical Image Computing and Analytics (CBICA)](https://www.cbica.upenn.edu/) at UPenn and now heading up the [Division of Computational Pathology at Indiana University (IU)](https://medicine.iu.edu/pathology/research/computational-pathology).

FeTS is a real-world medical federated learning platform with international collaborators. The original OpenFederatedLearning project and OpenFL are designed to serve as the backend for the FeTS platform,
and OpenFL developers and researchers continue to work very closely with UPenn on the FeTS project. An example is the [FeTS-AI/Front-End](https://github.com/FETS-AI/Front-End), which integrates UPenn’s medical AI expertise with OpenFL framework to create a federated learning solution for medical imaging.
FeTS is a real-world medical federated learning platform with international collaborators. The original OpenFederatedLearning project and OpenFL are designed to serve as the backend for the FeTS platform, and OpenFL developers and researchers continue to work very closely with IU on the FeTS project. An example is the [FeTS-AI/Front-End](https://github.com/FETS-AI/Front-End), which integrates the group’s medical AI expertise with OpenFL framework to create a federated learning solution for medical imaging.

Although initially developed for use in medical imaging, OpenFL designed to be agnostic to the use-case, the industry, and the machine learning framework.

Expand Down
110 changes: 110 additions & 0 deletions docs/federated_evaluation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
Federated Evaluation with OpenFL
=================================

Introduction to Federated Evaluation
-------------------------------------

Model evaluation is an essential part of the machine learning development cycle. In a traditional centralized learning system, all evaluation data is collected on a localized server. Because of this, centralized evaluation of machine learning models is a fairly straightforward task. However, in a federated learning system, data is distributed across multiple decentralized devices or nodes. In an effort to preserve the security and privacy of the distributed data, it is infeasible to simply aggregate all the data into a centralized system. Federated evaluation offers a solution by assessing the model at the client side and aggregating the accuracy without ever having to share the data. This is crucial for ensuring the model's effectiveness and reliability in diverse and real-world environments while respecting privacy and data locality

OpenFL's Support for Federated Evaluation
-----------------------------------------

OpenFL, a flexible framework for Federated Learning, has the capability to perform federated evaluation by modifying the federation plan. In this document, we will show how OpenFL can facilitate this process through its task runner API (aggregator-based workflow), where the model evaluation is distributed across various collaborators before being sent to the aggregator. For the task runner API, this involves minor modifications to the ``plan.yaml`` file, which defines the workflow and tasks for the federation. In particular, the federation plan should be defined to run for one forward pass and perform only aggregated model validation

In general pipeline is as follows:

1. **Setup**: Initialize the federation with the modified ``plan.yaml`` set to run for one round and only perform aggregated model validation
2. **Execution**: Run the federation. The model is distributed across collaborators for evaluation.
3. **Evaluation**: Each collaborator evaluates the model on its local data.
4. **Aggregation**: The aggregator collects and aggregates these metrics to assess overall model performance.

Example Using the Task Runner API (Aggregator-based Workflow)
-------------------------------------------------------------------

To demonstrate usage of the task runner API (aggregator-based workflow) for federated evaluation, consider the `Hello Federation example <https://github.com/securefederatedai/openfl/blob/develop/tests/github/test_hello_federation.py>`_. This sample script creates a simple federation with two collaborator nodes and one aggregator node, and executes based on a user specified workspace template. We provide a ``torch_cnn_mnist_fed_eval`` template, which is a federated evaluation template adapted from ``torch_cnn_mnist``.

This script can be directly executed as follows:

.. code-block:: console
python test_hello_federation.py --template torch_cnn_mnist_fed_eval
In order to adapt this template for federated evaluation, the following modifications were made to ``plan.yaml``:

.. code-block:: yaml
# Copyright (C) 2020-2023 Intel Corporation
# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.
aggregator :
defaults : plan/defaults/aggregator.yaml
template : openfl.component.Aggregator
settings :
init_state_path : save/torch_cnn_mnist_init.pbuf
best_state_path : save/torch_cnn_mnist_best.pbuf
last_state_path : save/torch_cnn_mnist_last.pbuf
########################
rounds_to_train : 1
########################
log_metric_callback :
template : src.mnist_utils.write_metric
collaborator :
defaults : plan/defaults/collaborator.yaml
template : openfl.component.Collaborator
settings :
delta_updates : false
opt_treatment : RESET
data_loader :
defaults : plan/defaults/data_loader.yaml
template : src.ptmnist_inmemory.PyTorchMNISTInMemory
settings :
collaborator_count : 2
data_group_name : mnist
batch_size : 256
task_runner :
defaults : plan/defaults/task_runner.yaml
template : src.pt_cnn.PyTorchCNN
network :
defaults : plan/defaults/network.yaml
assigner :
########################
template : openfl.component.RandomGroupedAssigner
settings :
task_groups :
- name : validate
percentage : 1.0
tasks :
- aggregated_model_validation
########################
tasks :
########################
aggregated_model_validation:
function : validate
kwargs :
apply : global
metrics :
- acc
########################
compression_pipeline :
defaults : plan/defaults/compression_pipeline.yaml
Key Changes for Federated Evaluation:

1. **aggregator.settings.rounds_to_train**: Set to 1
2. **assigner**: Assign to aggregated_model_validation instead of default assignments
3. **tasks**: Set to aggregated_model_validation instead of default tasks

**Optional**: modify ``src/pt_cnn.py`` to remove optimizer initialization and definition of loss function as these are not needed for evaluation

This sample script will create a federation based on the `torch_cnn_mnist_fed_eval` template using the `plan.yaml` file defined above, spawning two collaborator nodes and a single aggregator node. The model will be sent to the two collaborator nodes, where each collaborator will perform model validation on its own local data. The accuracy from this model validation will then be send back to the aggregator where it will aggregated into a final accuracy metric. The federation will then be shutdown.

---

Congratulations, you have successfully performed federated evaluation across two decentralized collaborator nodes.
1 change: 1 addition & 0 deletions docs/manual.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ Explore new and experimental features:
install
running_the_federation
running_the_federation_with_gandlf
federated_evaluation
source/utilities/utilities
advanced_topics
source/workflow/running_the_federation.tutorial
Expand Down
2 changes: 2 additions & 0 deletions openfl-workspace/torch_cnn_mnist_fed_eval/.workspace
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
current_plan_name: default

5 changes: 5 additions & 0 deletions openfl-workspace/torch_cnn_mnist_fed_eval/plan/cols.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Copyright (C) 2020-2021 Intel Corporation
# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.

collaborators:

9 changes: 9 additions & 0 deletions openfl-workspace/torch_cnn_mnist_fed_eval/plan/data.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## Copyright (C) 2020-2021 Intel Corporation
# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.

# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs.
# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object
# construct the shard of the mnist dataset to be use for this collaborator.

# collaborator_name ,data_directory_path
one,1
2 changes: 2 additions & 0 deletions openfl-workspace/torch_cnn_mnist_fed_eval/plan/defaults
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
../../workspace/plan/defaults

61 changes: 61 additions & 0 deletions openfl-workspace/torch_cnn_mnist_fed_eval/plan/plan.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Copyright (C) 2020-2023 Intel Corporation
# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.

aggregator :
defaults : plan/defaults/aggregator.yaml
template : openfl.component.Aggregator
settings :
init_state_path : save/torch_cnn_mnist_init.pbuf
best_state_path : save/torch_cnn_mnist_best.pbuf
last_state_path : save/torch_cnn_mnist_last.pbuf
######### SET ROUNDS TO 1 #############
rounds_to_train : 1
#######################################
log_metric_callback :
template : src.mnist_utils.write_metric

collaborator :
defaults : plan/defaults/collaborator.yaml
template : openfl.component.Collaborator
settings :
delta_updates : false
opt_treatment : RESET

data_loader :
defaults : plan/defaults/data_loader.yaml
template : src.ptmnist_inmemory.PyTorchMNISTInMemory
settings :
collaborator_count : 2
data_group_name : mnist
batch_size : 256

task_runner :
defaults : plan/defaults/task_runner.yaml
template : src.pt_cnn.PyTorchCNN

network :
defaults : plan/defaults/network.yaml

assigner :
######### SET ASSIGNER TO ONLY INCLUDE AGGREGATED MODEL VALIDATION #############
template : openfl.component.RandomGroupedAssigner
settings :
task_groups :
- name : validate
percentage : 1.0
tasks :
- aggregated_model_validation
################################################################################

tasks :
######### SET AGGREGATED MODEL VALIDATION AS ONLY TASK #############
aggregated_model_validation:
function : validate
kwargs :
apply : global
metrics :
- acc
####################################################################

compression_pipeline :
defaults : plan/defaults/compression_pipeline.yaml
4 changes: 4 additions & 0 deletions openfl-workspace/torch_cnn_mnist_fed_eval/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
torch==1.13.1
torchvision==0.14.1
tensorboard
wheel>=0.38.0 # not directly required, pinned by Snyk to avoid a vulnerability
3 changes: 3 additions & 0 deletions openfl-workspace/torch_cnn_mnist_fed_eval/src/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Copyright (C) 2020-2021 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
"""You may copy this file as the starting point of your own model."""
115 changes: 115 additions & 0 deletions openfl-workspace/torch_cnn_mnist_fed_eval/src/mnist_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Copyright (C) 2020-2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

"""You may copy this file as the starting point of your own model."""

from logging import getLogger

import numpy as np
from torch.utils.tensorboard import SummaryWriter
from torchvision import datasets
from torchvision import transforms

logger = getLogger(__name__)

writer = None


def get_writer():
"""Create global writer object."""
global writer
if not writer:
writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5)


def write_metric(node_name, task_name, metric_name, metric, round_number):
"""Write metric callback."""
get_writer()
writer.add_scalar(f'{node_name}/{task_name}/{metric_name}', metric, round_number)


def one_hot(labels, classes):
"""
One Hot encode a vector.
Args:
labels (list): List of labels to onehot encode
classes (int): Total number of categorical classes
Returns:
np.array: Matrix of one-hot encoded labels
"""
return np.eye(classes)[labels]


def _load_raw_datashards(shard_num, collaborator_count, transform=None):
"""
Load the raw data by shard.
Returns tuples of the dataset shard divided into training and validation.
Args:
shard_num (int): The shard number to use
collaborator_count (int): The number of collaborators in the federation
transform: torchvision.transforms.Transform to apply to images
Returns:
2 tuples: (image, label) of the training, validation dataset
"""
train_data, val_data = (
datasets.MNIST('data', train=train, download=True, transform=transform)
for train in (True, False)
)
X_train_tot, y_train_tot = train_data.train_data, train_data.train_labels
X_valid_tot, y_valid_tot = val_data.test_data, val_data.test_labels

# create the shards
shard_num = int(shard_num)
X_train = X_train_tot[shard_num::collaborator_count].unsqueeze(1).float()
y_train = y_train_tot[shard_num::collaborator_count]

X_valid = X_valid_tot[shard_num::collaborator_count].unsqueeze(1).float()
y_valid = y_valid_tot[shard_num::collaborator_count]

return (X_train, y_train), (X_valid, y_valid)


def load_mnist_shard(shard_num, collaborator_count,
categorical=False, channels_last=True, **kwargs):
"""
Load the MNIST dataset.
Args:
shard_num (int): The shard to use from the dataset
collaborator_count (int): The number of collaborators in the
federation
categorical (bool): True = convert the labels to one-hot encoded
vectors (Default = True)
channels_last (bool): True = The input images have the channels
last (Default = True)
**kwargs: Additional parameters to pass to the function
Returns:
list: The input shape
int: The number of classes
numpy.ndarray: The training data
numpy.ndarray: The training labels
numpy.ndarray: The validation data
numpy.ndarray: The validation labels
"""
num_classes = 10

(X_train, y_train), (X_valid, y_valid) = _load_raw_datashards(
shard_num, collaborator_count, transform=transforms.ToTensor())

logger.info(f'MNIST > X_train Shape : {X_train.shape}')
logger.info(f'MNIST > y_train Shape : {y_train.shape}')
logger.info(f'MNIST > Train Samples : {X_train.shape[0]}')
logger.info(f'MNIST > Valid Samples : {X_valid.shape[0]}')

if categorical:
# convert class vectors to binary class matrices
y_train = one_hot(y_train, num_classes)
y_valid = one_hot(y_valid, num_classes)

return num_classes, X_train, y_train, X_valid, y_valid
Loading

0 comments on commit 2f7868a

Please sign in to comment.