Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run collaborator in docker #280

Conversation

dmitryagapov
Copy link
Contributor

@dmitryagapov dmitryagapov commented Dec 16, 2021

nvidia-container-runtime should be installed
https://docs.docker.com/config/containers/resource_constraints/#gpu

  1. Add gpgkey for nvidia-container-runtime
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
  1. Install nvidia-container-runtime
sudo apt-get install nvidia-container-runtime
  1. Ensure the nvidia-container-runtime-hook is accessible from $PATH.
which nvidia-container-runtime-hook
  1. Restart the Docker daemon
sudo service docker restart

Docker proxy:
In order to use docker with proxy it can be defined in director_config.yaml and envoy_config.yaml

#director_config.yaml
settings:
  listen_host: localhost
  listen_port: 50050
  sample_shape: [ '300', '400', '3' ]
  target_shape: [ '300', '400' ]
  envoy_health_check_period: 5  # in seconds
  docker:
    env:
      http_proxy:
      https_proxy:
      no_proxy:
    buildargs:
      HTTP_PROXY:
      HTTPS_PROXY:
      NO_PROXY:

#envoy_config.yaml
params:
  cuda_devices: [ 0, 2 ]
  docker:
    env:
      http_proxy:
      https_rpoxy:
      no_proxy:
    buildargs:
      HTTP_PROXY:
      HTTPS_PROXY:
      NO_PROXY:

optional_plugin_components:
  cuda_device_monitor:
    template: openfl.plugins.processing_units_monitor.pynvml_monitor.PynvmlCUDADeviceMonitor
    settings: [ ]

shard_descriptor:
  template: kvasir_shard_descriptor.KvasirShardDescriptor
  params:
    data_folder: kvasir_data
    rank_worldsize: 1,10
    enforce_image_hw: '300,400'

Manage Docker as a non-root user:
https://docs.docker.com/engine/install/linux-postinstall/

@psfoley psfoley self-requested a review December 16, 2021 15:40
dmitryagapov and others added 16 commits January 11, 2022 18:34
* Implementation of director-aggregator gRPC communication

* Retry on unavailable aggregator for async client

* Check if the experiment is available

* Increase timeout

* Retrun instead raise error after timeout

* Fix test

* enforce_image_hw is string

* Col exp can be empty

* Artifacts were removed, less dependency from aggregator attribute

* Wait experiment readiness to get an aggregator client.

* Director requests validation

* Some enhancements

* Doc strings

* Update openfl/federated/plan/plan.py

Co-authored-by: Igor Davidyuk <[email protected]>

* Update plan.py

* Remove redundant method

* Additional error handling

Co-authored-by: Igor Davidyuk <[email protected]>
@psfoley
Copy link
Contributor

psfoley commented Feb 16, 2022

@dmitryagapov @alexey-gruzdev Can a tag be added to PR's like this to reflect that the feature is experimental / needs pending design review before merge? WIP is used for PR's that aren't ready for review yet, but it seems like this belongs in a different category

cuda_devices: [ ]
docker_env:
http_proxy:
https_rpoxy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling

@MasterSkepticista
Copy link
Collaborator

Closing since the PR is stale. GPU support will be ported from OpenFL-Security with a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants