securefederatedai · tanwarsh · Dec 9, 2024 · Dec 9, 2024 · Dec 10, 2024 · Dec 10, 2024
diff --git a/openfl-workspace/deprecated/tf_2dunet/.workspace b/openfl-workspace/deprecated/tf_2dunet/.workspace
@@ -0,0 +1,2 @@
+current_plan_name: default
+
diff --git a/openfl-workspace/deprecated/tf_2dunet/README.md b/openfl-workspace/deprecated/tf_2dunet/README.md
@@ -0,0 +1,44 @@
+Running steps:
+1) Download and extract data to any folder (`$DATA_PATH`). The output of `tree $DATA_PATH -L 2`:
+```
+.
+├── MICCAI_BraTS_2019_Data_Training
+│   ├── HGG
+│   ├── LGG
+│   ├── name_mapping.csv
+│   └── survival_data.csv
+```
+To use a `tree` command, you have to install it first: `sudo apt-get install tree`
+
+2) Choose a subfolder (`$SUBFOLDER`) corresponding to scan subset: 
+    - `HGG`: glioblastoma scans
+    - `LGG`: lower grade glioma scans
+
+Let's pick `HGG`: `export SUBFOLDER=HGG`. The learning rate has been already tuned for this task, so you don't have to change it. If you pick `LGG`, all the next steps will be the same.
+
+3) In order for each collaborator to use separate slice of data, we split main folder into `n` subfolders:
+```bash
+cd $DATA_PATH/$SUBFOLDER
+i=0; 
+for f in *; 
+do 
+    d=dir_$(printf $((i%n)));  # change n to number of data slices (number of collaborators in federation)
+    mkdir -p $d; 
+    mv "$f" $d; 
+    let i++; 
+done
+```
+Output of `tree $DATA_PATH/$SUBFOLDER -L 1` in case when `n = 2`:
+```
+.
+├── 0
+└── 1
+```
+If BraTS20 has the same structure, we can split it in the same way.
+Each slice contains subdirectories containing `*.nii.gz` files. According to `load_from_NIfTI` function [docstring](https://github.com/securefederatedai/openfl/blob/2e6680fedcd4d99363c94792c4a9cc272e4eebc0/openfl-workspace/tf_2dunet/src/brats_utils.py#L68), `NIfTI files for whole brains are assumed to be contained in subdirectories of the parent directory`. So we can use these slice folders as collaborator data paths.
+
+4) We are ready to train! Try executing the [Hello Federation](https://openfl.readthedocs.io/en/latest/running_the_federation.baremetal.html#hello-federation-your-first-federated-learning-training) steps. Make sure you have `openfl` installed in your Python virtual environment. All you have to do is to specify collaborator data paths to slice folders. We have combined all 'Hello Federation' steps in a single bash script, so it is easier to test:
+```bash
+bash tests/github/test_hello_federation.sh tf_2dunet fed_work12345alpha81671 one123dragons beta34unicorns localhost --col1-data-path $DATA_PATH/MICCAI_BraTS_2019_Data_Training/$SUBFOLDER/0 --col2-data-path $DATA_PATH/MICCAI_BraTS_2019_Data_Training/$SUBFOLDER/1 --rounds-to-train 5
+```
+The result of the execution of the command above is 5 completed training rounds. 
diff --git a/openfl-workspace/deprecated/tf_2dunet/plan/cols.yaml b/openfl-workspace/deprecated/tf_2dunet/plan/cols.yaml
@@ -0,0 +1,4 @@
+# Copyright (C) 2020-2021 Intel Corporation
+# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.
+
+collaborators:
diff --git a/openfl-workspace/deprecated/tf_2dunet/plan/data.yaml b/openfl-workspace/deprecated/tf_2dunet/plan/data.yaml
@@ -0,0 +1,8 @@
+# Copyright (C) 2020-2021 Intel Corporation
+# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.
+
+# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs.
+# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object
+# construct the shard of the mnist dataset to be use for this collaborator.
+one,/raid/datasets/MICCAI_BraTS_2019_Data_Training/HGG/0
+two,/raid/datasets/MICCAI_BraTS_2019_Data_Training/HGG/1
diff --git a/openfl-workspace/deprecated/tf_2dunet/plan/defaults b/openfl-workspace/deprecated/tf_2dunet/plan/defaults
@@ -0,0 +1,2 @@
+../../workspace/plan/defaults
+
diff --git a/openfl-workspace/deprecated/tf_2dunet/plan/plan.yaml b/openfl-workspace/deprecated/tf_2dunet/plan/plan.yaml
@@ -0,0 +1,44 @@
+# Copyright (C) 2020-2021 Intel Corporation
+# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.
+
+aggregator :
+  defaults : plan/defaults/aggregator.yaml
+  template : openfl.component.Aggregator
+  settings :
+    init_state_path : save/tf_2dunet_brats_init.pbuf
+    last_state_path : save/tf_2dunet_brats_latest.pbuf
+    best_state_path : save/tf_2dunet_brats_best.pbuf
+    rounds_to_train : 10
+    db_store_rounds : 2
+
+collaborator :
+  defaults : plan/defaults/collaborator.yaml
+  template : openfl.component.Collaborator
+  settings :
+    delta_updates    : true
+    opt_treatment    : RESET
+
+data_loader :
+  defaults : plan/defaults/data_loader.yaml
+  template : src.tfbrats_inmemory.TensorFlowBratsInMemory
+  settings :
+    batch_size: 64
+    percent_train: 0.8
+    collaborator_count : 2
+    data_group_name    : brats
+
+task_runner :
+  defaults : plan/defaults/task_runner.yaml
+  template : src.tf_2dunet.TensorFlow2DUNet
+
+network :
+  defaults : plan/defaults/network.yaml
+
+assigner :
+  defaults : plan/defaults/assigner.yaml
+
+tasks :
+  defaults : plan/defaults/tasks_tensorflow.yaml
+
+compression_pipeline :
+  defaults : plan/defaults/compression_pipeline.yaml
diff --git a/openfl-workspace/deprecated/tf_2dunet/requirements.txt b/openfl-workspace/deprecated/tf_2dunet/requirements.txt
@@ -0,0 +1,3 @@
+nibabel
+setuptools>=65.5.1 # not directly required, pinned by Snyk to avoid a vulnerability
+tensorflow==2.13
diff --git a/openfl-workspace/deprecated/tf_2dunet/src/__init__.py b/openfl-workspace/deprecated/tf_2dunet/src/__init__.py
@@ -0,0 +1,3 @@
+# Copyright (C) 2020-2021 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+"""You may copy this file as the starting point of your own model."""
diff --git a/openfl-workspace/deprecated/tf_2dunet/src/brats_utils.py b/openfl-workspace/deprecated/tf_2dunet/src/brats_utils.py
@@ -0,0 +1,137 @@
+# Copyright (C) 2020-2021 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+"""You may copy this file as the starting point of your own model."""
+
+import logging
+import os
+
+import numpy as np
+
+from .nii_reader import nii_reader
+
+logger = logging.getLogger(__name__)
+
+
+def train_val_split(features, labels, percent_train, shuffle):
+    """Train/validation splot of the BraTS dataset.
+
+    Splits incoming feature and labels into training and validation. The value
+    of shuffle determines whether shuffling occurs before the split is performed.
+
+    Args:
+        features: The input images
+        labels: The ground truth labels
+        percent_train (float): The percentage of the dataset that is training.
+        shuffle (bool): True = shuffle the dataset before the split
+
+    Returns:
+        train_features: The input images for the training dataset
+        train_labels: The ground truth labels for the training dataset
+        val_features: The input images for the validation dataset
+        val_labels: The ground truth labels for the validation dataset
+    """
+
+    def split(lst, idx):
+        """Split a Python list into 2 lists.
+
+        Args:
+            lst: The Python list to split
+            idx: The index where to split the list into 2 parts
+
+        Returns:
+            Two lists
+
+        """
+        if idx < 0 or idx > len(lst):
+            raise ValueError('split was out of expected range.')
+        return lst[:idx], lst[idx:]
+
+    nb_features = len(features)
+    nb_labels = len(labels)
+    if nb_features != nb_labels:
+        raise RuntimeError('Number of features and labels do not match.')
+    if shuffle:
+        new_order = np.random.permutation(np.arange(nb_features))
+        features = features[new_order]
+        labels = labels[new_order]
+    split_idx = int(percent_train * nb_features)
+    train_features, val_features = split(lst=features, idx=split_idx)
+    train_labels, val_labels = split(lst=labels, idx=split_idx)
+    return train_features, train_labels, val_features, val_labels
+
+
+def load_from_nifti(parent_dir,
+                    percent_train,
+                    shuffle,
+                    channels_last=True,
+                    task='whole_tumor',
+                    **kwargs):
+    """Load the BraTS dataset from the NiFTI file format.
+
+    Loads data from the parent directory (NIfTI files for whole brains are
+    assumed to be contained in subdirectories of the parent directory).
+    Performs a split of the data into training and validation, and the value
+    of shuffle determined whether shuffling is performed before this split
+    occurs - both split and shuffle are done in a way to
+    keep whole brains intact. The kwargs are passed to nii_reader.
+
+    Args:
+        parent_dir: The parent directory for the BraTS data
+        percent_train (float): The percentage of the data to make the training dataset
+        shuffle (bool): True means shuffle the dataset order before the split
+        channels_last (bool): Input tensor uses channels as last dimension (Default is True)
+        task: Prediction task (Default is 'whole_tumor' prediction)
+        **kwargs: Variable arguments to pass to the function
+
+    Returns:
+        train_features: The input images for the training dataset
+        train_labels: The ground truth labels for the training dataset
+        val_features: The input images for the validation dataset
+        val_labels: The ground truth labels for the validation dataset
+
+    """
+    path = os.path.join(parent_dir)
+    subdirs = os.listdir(path)
+    subdirs.sort()
+    if not subdirs:
+        raise SystemError(f'''{parent_dir} does not contain subdirectories.
+Please make sure you have BraTS dataset downloaded
+and located in data directory for this collaborator.
+        ''')
+    subdir_paths = [os.path.join(path, subdir) for subdir in subdirs]
+
+    imgs_all = []
+    msks_all = []
+    for brain_path in subdir_paths:
+        these_imgs, these_msks = nii_reader(
+            brain_path=brain_path,
+            task=task,
+            channels_last=channels_last,
+            **kwargs
+        )
+        # the needed files where not present if a tuple of None is returned
+        if these_imgs is None:
+            logger.debug(f'Brain subdirectory: {brain_path} did not contain the needed files.')
+        else:
+            imgs_all.append(these_imgs)
+            msks_all.append(these_msks)
+
+    # converting to arrays to allow for numpy indexing used during split
+    imgs_all = np.array(imgs_all)
+    msks_all = np.array(msks_all)
+
+    # note here that each is a list of 155 slices per brain, and so the
+    # split keeps brains intact
+    imgs_all_train, msks_all_train, imgs_all_val, msks_all_val = train_val_split(
+        features=imgs_all,
+        labels=msks_all,
+        percent_train=percent_train,
+        shuffle=shuffle
+    )
+    # now concatenate the lists
+    imgs_train = np.concatenate(imgs_all_train, axis=0)
+    msks_train = np.concatenate(msks_all_train, axis=0)
+    imgs_val = np.concatenate(imgs_all_val, axis=0)
+    msks_val = np.concatenate(msks_all_val, axis=0)
+
+    return imgs_train, msks_train, imgs_val, msks_val