Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP:Update TensorFlow workspaces #1204

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions openfl-workspace/deprecated/tf_2dunet/.workspace
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
current_plan_name: default

44 changes: 44 additions & 0 deletions openfl-workspace/deprecated/tf_2dunet/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
Running steps:
1) Download and extract data to any folder (`$DATA_PATH`). The output of `tree $DATA_PATH -L 2`:
```
.
├── MICCAI_BraTS_2019_Data_Training
│ ├── HGG
│ ├── LGG
│ ├── name_mapping.csv
│ └── survival_data.csv
```
To use a `tree` command, you have to install it first: `sudo apt-get install tree`

2) Choose a subfolder (`$SUBFOLDER`) corresponding to scan subset:
- `HGG`: glioblastoma scans
- `LGG`: lower grade glioma scans

Let's pick `HGG`: `export SUBFOLDER=HGG`. The learning rate has been already tuned for this task, so you don't have to change it. If you pick `LGG`, all the next steps will be the same.

3) In order for each collaborator to use separate slice of data, we split main folder into `n` subfolders:
```bash
cd $DATA_PATH/$SUBFOLDER
i=0;
for f in *;
do
d=dir_$(printf $((i%n))); # change n to number of data slices (number of collaborators in federation)
mkdir -p $d;
mv "$f" $d;
let i++;
done
```
Output of `tree $DATA_PATH/$SUBFOLDER -L 1` in case when `n = 2`:
```
.
├── 0
└── 1
```
If BraTS20 has the same structure, we can split it in the same way.
Each slice contains subdirectories containing `*.nii.gz` files. According to `load_from_NIfTI` function [docstring](https://github.com/securefederatedai/openfl/blob/2e6680fedcd4d99363c94792c4a9cc272e4eebc0/openfl-workspace/tf_2dunet/src/brats_utils.py#L68), `NIfTI files for whole brains are assumed to be contained in subdirectories of the parent directory`. So we can use these slice folders as collaborator data paths.

4) We are ready to train! Try executing the [Hello Federation](https://openfl.readthedocs.io/en/latest/running_the_federation.baremetal.html#hello-federation-your-first-federated-learning-training) steps. Make sure you have `openfl` installed in your Python virtual environment. All you have to do is to specify collaborator data paths to slice folders. We have combined all 'Hello Federation' steps in a single bash script, so it is easier to test:
```bash
bash tests/github/test_hello_federation.sh tf_2dunet fed_work12345alpha81671 one123dragons beta34unicorns localhost --col1-data-path $DATA_PATH/MICCAI_BraTS_2019_Data_Training/$SUBFOLDER/0 --col2-data-path $DATA_PATH/MICCAI_BraTS_2019_Data_Training/$SUBFOLDER/1 --rounds-to-train 5
```
The result of the execution of the command above is 5 completed training rounds.
4 changes: 4 additions & 0 deletions openfl-workspace/deprecated/tf_2dunet/plan/cols.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Copyright (C) 2020-2021 Intel Corporation
# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.

collaborators:
8 changes: 8 additions & 0 deletions openfl-workspace/deprecated/tf_2dunet/plan/data.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Copyright (C) 2020-2021 Intel Corporation
# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.

# all keys under 'collaborators' corresponds to a specific colaborator name the corresponding dictionary has data_name, data_path pairs.
# Note that in the mnist case we do not store the data locally, and the data_path is used to pass an integer that helps the data object
# construct the shard of the mnist dataset to be use for this collaborator.
one,/raid/datasets/MICCAI_BraTS_2019_Data_Training/HGG/0
two,/raid/datasets/MICCAI_BraTS_2019_Data_Training/HGG/1
2 changes: 2 additions & 0 deletions openfl-workspace/deprecated/tf_2dunet/plan/defaults
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
../../workspace/plan/defaults

44 changes: 44 additions & 0 deletions openfl-workspace/deprecated/tf_2dunet/plan/plan.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Copyright (C) 2020-2021 Intel Corporation
# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.

aggregator :
defaults : plan/defaults/aggregator.yaml
template : openfl.component.Aggregator
settings :
init_state_path : save/tf_2dunet_brats_init.pbuf
last_state_path : save/tf_2dunet_brats_latest.pbuf
best_state_path : save/tf_2dunet_brats_best.pbuf
rounds_to_train : 10
db_store_rounds : 2

collaborator :
defaults : plan/defaults/collaborator.yaml
template : openfl.component.Collaborator
settings :
delta_updates : true
opt_treatment : RESET

data_loader :
defaults : plan/defaults/data_loader.yaml
template : src.tfbrats_inmemory.TensorFlowBratsInMemory
settings :
batch_size: 64
percent_train: 0.8
collaborator_count : 2
data_group_name : brats

task_runner :
defaults : plan/defaults/task_runner.yaml
template : src.tf_2dunet.TensorFlow2DUNet

network :
defaults : plan/defaults/network.yaml

assigner :
defaults : plan/defaults/assigner.yaml

tasks :
defaults : plan/defaults/tasks_tensorflow.yaml

compression_pipeline :
defaults : plan/defaults/compression_pipeline.yaml
3 changes: 3 additions & 0 deletions openfl-workspace/deprecated/tf_2dunet/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
nibabel
setuptools>=65.5.1 # not directly required, pinned by Snyk to avoid a vulnerability
tensorflow==2.13
3 changes: 3 additions & 0 deletions openfl-workspace/deprecated/tf_2dunet/src/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Copyright (C) 2020-2021 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
"""You may copy this file as the starting point of your own model."""
137 changes: 137 additions & 0 deletions openfl-workspace/deprecated/tf_2dunet/src/brats_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Copyright (C) 2020-2021 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
"""You may copy this file as the starting point of your own model."""

import logging
import os

import numpy as np

from .nii_reader import nii_reader

logger = logging.getLogger(__name__)


def train_val_split(features, labels, percent_train, shuffle):
"""Train/validation splot of the BraTS dataset.

Splits incoming feature and labels into training and validation. The value
of shuffle determines whether shuffling occurs before the split is performed.

Args:
features: The input images
labels: The ground truth labels
percent_train (float): The percentage of the dataset that is training.
shuffle (bool): True = shuffle the dataset before the split

Returns:
train_features: The input images for the training dataset
train_labels: The ground truth labels for the training dataset
val_features: The input images for the validation dataset
val_labels: The ground truth labels for the validation dataset
"""

def split(lst, idx):
"""Split a Python list into 2 lists.

Args:
lst: The Python list to split
idx: The index where to split the list into 2 parts

Returns:
Two lists

"""
if idx < 0 or idx > len(lst):
raise ValueError('split was out of expected range.')
return lst[:idx], lst[idx:]

nb_features = len(features)
nb_labels = len(labels)
if nb_features != nb_labels:
raise RuntimeError('Number of features and labels do not match.')
if shuffle:
new_order = np.random.permutation(np.arange(nb_features))
features = features[new_order]
labels = labels[new_order]
split_idx = int(percent_train * nb_features)
train_features, val_features = split(lst=features, idx=split_idx)
train_labels, val_labels = split(lst=labels, idx=split_idx)
return train_features, train_labels, val_features, val_labels


def load_from_nifti(parent_dir,
percent_train,
shuffle,
channels_last=True,
task='whole_tumor',
**kwargs):
"""Load the BraTS dataset from the NiFTI file format.

Loads data from the parent directory (NIfTI files for whole brains are
assumed to be contained in subdirectories of the parent directory).
Performs a split of the data into training and validation, and the value
of shuffle determined whether shuffling is performed before this split
occurs - both split and shuffle are done in a way to
keep whole brains intact. The kwargs are passed to nii_reader.

Args:
parent_dir: The parent directory for the BraTS data
percent_train (float): The percentage of the data to make the training dataset
shuffle (bool): True means shuffle the dataset order before the split
channels_last (bool): Input tensor uses channels as last dimension (Default is True)
task: Prediction task (Default is 'whole_tumor' prediction)
**kwargs: Variable arguments to pass to the function

Returns:
train_features: The input images for the training dataset
train_labels: The ground truth labels for the training dataset
val_features: The input images for the validation dataset
val_labels: The ground truth labels for the validation dataset

"""
path = os.path.join(parent_dir)
subdirs = os.listdir(path)
subdirs.sort()
if not subdirs:
raise SystemError(f'''{parent_dir} does not contain subdirectories.
Please make sure you have BraTS dataset downloaded
and located in data directory for this collaborator.
''')
subdir_paths = [os.path.join(path, subdir) for subdir in subdirs]

imgs_all = []
msks_all = []
for brain_path in subdir_paths:
these_imgs, these_msks = nii_reader(
brain_path=brain_path,
task=task,
channels_last=channels_last,
**kwargs
)
# the needed files where not present if a tuple of None is returned
if these_imgs is None:
logger.debug(f'Brain subdirectory: {brain_path} did not contain the needed files.')
else:
imgs_all.append(these_imgs)
msks_all.append(these_msks)

# converting to arrays to allow for numpy indexing used during split
imgs_all = np.array(imgs_all)
msks_all = np.array(msks_all)

# note here that each is a list of 155 slices per brain, and so the
# split keeps brains intact
imgs_all_train, msks_all_train, imgs_all_val, msks_all_val = train_val_split(
features=imgs_all,
labels=msks_all,
percent_train=percent_train,
shuffle=shuffle
)
# now concatenate the lists
imgs_train = np.concatenate(imgs_all_train, axis=0)
msks_train = np.concatenate(msks_all_train, axis=0)
imgs_val = np.concatenate(imgs_all_val, axis=0)
msks_val = np.concatenate(msks_all_val, axis=0)

return imgs_train, msks_train, imgs_val, msks_val
Loading
Loading