Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test merge: run collaborator in docker #341

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
a0935f0
wip
dmitryagapov Dec 8, 2021
6a5651f
wip
dmitryagapov Dec 16, 2021
dacd4f1
wip
dmitryagapov Dec 16, 2021
f2afcf9
refactoring
dmitryagapov Dec 23, 2021
52b8284
add aiodocker requirements
dmitryagapov Dec 23, 2021
add5710
refactoring
dmitryagapov Jan 10, 2022
20e318e
refactoring
dmitryagapov Jan 11, 2022
96d9c7d
refactoring
dmitryagapov Jan 12, 2022
517494d
refactoring
dmitryagapov Jan 12, 2022
b6c1102
Merge branch 'develop' into feature/run_collaborator_in_docker
dmitryagapov Jan 12, 2022
51db0d0
refactoring
dmitryagapov Jan 12, 2022
05c18c1
refactoring
dmitryagapov Jan 12, 2022
bde6023
refactoring
dmitryagapov Jan 12, 2022
f3ed426
create docker module
dmitryagapov Jan 19, 2022
7e65c40
refactoring
dmitryagapov Feb 2, 2022
0dd2c7c
Merge branch 'develop' into feature/run_collaborator_in_docker
dmitryagapov Feb 16, 2022
9321627
add --use_docker to envoy
dmitryagapov Feb 16, 2022
7b9c625
Merge branch 'dockerezation-launch' into feature/run_collaborator_in_…
dmitryagapov Feb 16, 2022
41c8399
fix flake8
dmitryagapov Feb 16, 2022
a414cfb
add openfl.docker module to packages
dmitryagapov Feb 16, 2022
c9d0223
fix initial tensor path
dmitryagapov Feb 16, 2022
baa5be6
merge fix
dmitryagapov Feb 17, 2022
1cc73eb
add --use-docker flag for envoy
dmitryagapov Feb 17, 2022
898a22e
add --use-docker flag for director
dmitryagapov Feb 17, 2022
493eb96
fix
dmitryagapov Feb 17, 2022
3f907c1
Merge branch 'develop' into feature/run_collaborator_in_docker
dmitryagapov Mar 1, 2022
fe749ca
merge
dmitryagapov Mar 1, 2022
f5c4aac
fix
dmitryagapov Mar 3, 2022
bd3b31d
fix
dmitryagapov Mar 3, 2022
f3df364
fix
dmitryagapov Mar 3, 2022
33efbcc
fix
dmitryagapov Mar 9, 2022
82de52f
Merge branch 'develop' into feature/run_collaborator_in_docker
dmitryagapov Mar 10, 2022
9ffc06e
add docker proxy for director and envoy configs
dmitryagapov Mar 15, 2022
04a387e
add docker proxy for director and envoy configs
dmitryagapov Mar 16, 2022
c5457d0
fix
dmitryagapov Mar 18, 2022
814bc85
add buildargs config to envoy/director configs
dmitryagapov Mar 18, 2022
4923046
add buildargs config to envoy/director configs
dmitryagapov Mar 18, 2022
c335c78
docker config
dmitryagapov Mar 23, 2022
03ad397
Merge branch 'develop' into feature/run_collaborator_in_docker
dmitryagapov Mar 23, 2022
160b94c
Merge branch 'dockerezation-launch' into feature/run_collaborator_in_…
dmitryagapov Mar 23, 2022
b24b995
refactoring
dmitryagapov Mar 23, 2022
4d510fc
fixes
dmitryagapov Mar 25, 2022
e9a5a1c
fixes
dmitryagapov Mar 25, 2022
cfc178a
add volumes for PyTorch_Kvasir_UNet
dmitryagapov Mar 25, 2022
6e481be
fix
dmitryagapov Mar 25, 2022
425bc1a
send only one model to aggregator when last == best
dmitryagapov Mar 31, 2022
adf79c5
relative import to absolute
dmitryagapov Mar 31, 2022
79cf99a
fixes
dmitryagapov Apr 4, 2022
9d9a968
Diagrams
dmitryagapov Apr 5, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/mermaid/director_aggregator.mmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
sequenceDiagram
participant D as Director
participant A as Aggregator
rect rgb(0, 255, 0,.1)
Note over D: An Experiment's start
D->>D: Get next experiment<br>from Experiment Registry
opt Docker specific logic
D->>D: Create aggregator docker build<br>context from experiment workspace.<br>(Add Dockerfile and execution script<br>to context specific for aggregator)
D->>D: Build aggregator docker image
D->>D: Create aggregator docker container
D->>D: Run aggregator docker container
D->>D: Monitor aggregator docker container
end
loop every round
A->>D: Send last/best model to director
D->>D: Save model on director
end
opt Docker specific logic
D->>D: Delete aggregator docker container<br>when experiment was finished
end
end
Note over D: The Experiment ended. <br> The Federation keeps existing.
55 changes: 55 additions & 0 deletions docs/mermaid/director_envoy.mmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
sequenceDiagram
participant N as NoteBook
participant D as Director
participant E as Envoy
rect rgb(0, 255, 0,.1)
Note over D,E: A Federation startup process
D->D: Starts
E->E: Adapting a dataset
E->E: Starts
Note over D,E: Exchange certs
E-->>D: Connects using FQDN and pwd
E-->>D: Communicates dataset info to
D-->D: Keeps a list of connected Envoys
D-->D: Ensures unified data interface
end
Note over D,E: We consider a Federation set up
rect rgb(0, 255, 0,.1)
Note over N,D: Create new experiment
N->>N: Prepare experiment in Notebook
N->>N: Connect to federation
N->>N: Run experiment
N->>D: Send Experiment workspace
D->>D: Create new experiment
D->>D: Add experiment to regestry
end
rect rgb(0, 255, 0,.1)
Note over D,E: An Experiment's start
D->>D: Get next experiment<br>from Experiment Registry
opt Docker specific logic
D->>D: Create aggregator docker build<br>context from experiment workspace.<br>(Add Dockerfile and execution script<br>to context specific for aggregator)
D->>D: Build aggregator docker image
D->>D: Create aggregator docker container
D->>D: Run aggregator docker container
D->>D: Monitor aggregator docker container
end
E->>D: WaitExperiment
D-->>E: Send Experiment name
E->>D: GetExperimentData(experiment_name)
D-->>E: Send Experiment workspace
opt Docker specific logic
E->>E: Create collaborator docker build<br>context from Experiment workspace.<br>(Add Dockerfile and execution script<br>to context specific for collaborator)
E->>E: Build collaborator docker image
E->>E: Create collaborator docker container
E->>E: Run collaborator docker container
E->>E: Monitor collaborator docker container
end
Note over D,E: Wait for last round finished
opt Docker specific logic
E->>E: Delete collaborator docker container<br>when experiment was finished
D->>D: Delete aggregator docker container<br>when experiment was finished
end
end
N->>D: Get best model
D-->>N: Send best model
Note over D,E: The Experiment ended. <br> The Federation keeps existing.
7 changes: 7 additions & 0 deletions openfl-docker/Dockerfile.aggregator
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
FROM python:3.8

RUN pip install --upgrade pip
RUN pip install git+https://github.com/dmitryagapov/openfl.git@feature/run_collaborator_in_docker

COPY . /code
WORKDIR /code
10 changes: 10 additions & 0 deletions openfl-docker/Dockerfile.collaborator
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM python:3.8

RUN pip install --upgrade pip
RUN pip install git+https://github.com/dmitryagapov/openfl.git@feature/run_collaborator_in_docker

WORKDIR /code
COPY ./requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx director start --disable-tls -c director_config.yaml --use-docker
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash
set -e
ENVOY_NAME=$1
SHARD_CONF=$2

fx envoy start -n "$ENVOY_NAME" --disable-tls --envoy-config-path "$SHARD_CONF" -dh localhost -dp 50051
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
params:
cuda_devices: [0]
cuda_devices: [ 0 ]

optional_plugin_components:
cuda_device_monitor:
template: openfl.plugins.processing_units_monitor.pynvml_monitor.PynvmlCUDADeviceMonitor
settings: []
cuda_device_monitor:
template: openfl.plugins.processing_units_monitor.pynvml_monitor.PynvmlCUDADeviceMonitor
settings: [ ]

shard_descriptor:
template: dogs_cats_shard_descriptor.DogsCatsShardDescriptor
volumes:
- '~/.kaggle/kaggle.json'
- './data'
params:
data_folder: data
rank_worldsize: 1,2
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
settings:
listen_host: localhost
listen_port: 50050
sample_shape: ['300', '400', '3']
target_shape: ['300', '400']
sample_shape: [ '300', '400', '3' ]
target_shape: [ '300', '400' ]
envoy_health_check_period: 5 # in seconds
docker:
env:
http_proxy:
https_proxy:
no_proxy:
buildargs:
HTTP_PROXY:
HTTPS_PROXY:
NO_PROXY:
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx director start --disable-tls -c director_config.yaml --use-docker
Original file line number Diff line number Diff line change
@@ -1,14 +1,27 @@
params:
cuda_devices: [0,2]

cuda_devices: [ 0, 2 ]
docker:
env:
http_proxy:
https_rpoxy:
no_proxy:
buildargs:
HTTP_PROXY:
HTTPS_PROXY:
NO_PROXY:
volumes:
- './kvasir_data'

optional_plugin_components:
cuda_device_monitor:
template: openfl.plugins.processing_units_monitor.pynvml_monitor.PynvmlCUDADeviceMonitor
settings: []
cuda_device_monitor:
template: openfl.plugins.processing_units_monitor.pynvml_monitor.PynvmlCUDADeviceMonitor
settings: [ ]

shard_descriptor:
template: kvasir_shard_descriptor.KvasirShardDescriptor
params:
data_folder: kvasir_data
rank_worldsize: 1,10
enforce_image_hw: '300,400'


Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
params:
cuda_devices: []
cuda_devices: [ ]
docker_env:
http_proxy:
https_rpoxy:
no_proxy:

optional_plugin_components: {}
optional_plugin_components: { }

shard_descriptor:
template: kvasir_shard_descriptor.KvasirShardDescriptor
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx envoy start -n env_one --disable-tls --envoy-config-path envoy_config.yaml -dh localhost -dp 50050 --use-docker
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def get_dataset(self, dataset_type: str) -> np.ndarray:
"""
Return a shard dataset by type.

A simple list with elements (x, y) implemets the Shard Dataset interface.
A simple list with elements (x, y) implements the Shard Dataset interface.
"""
if dataset_type == 'train':
return self.data[:self.n_samples // 2]
Expand Down
Loading
Loading