Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of director-aggregator gRPC communication #306

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

aleksandr-mokrov
Copy link
Contributor

@aleksandr-mokrov aleksandr-mokrov commented Jan 31, 2022

This is implementation of Director - Aggregator communication by gRPC in order to provide possibility of a separate launch.
1 . There was implemented the next APIs: GetMetricStream, GetTrainedModel, GetExperimentDescription.
2. All request from user sent to the Director service. The Director service checks a caller, an experiment status and etc, and if it possible sends request to Aggregator service. User -> Federation -> director client -> Director service -> Director -> aggregator client -> Aggregator service -> Aggregator.
3. It was implemented an asynchronous aggregator client (AsyncAggregatorGRPCClient) for asynchronous Director based on the synchronous client.
4. The async aggregator client makes several attempts to connect to the server in case of unavailability (each second during 30 seconds by default; synchronous aggregator client, that used by collaborators makes it last forever). It implemented using Interceptors.
5. Aggregator's IP address and ports are taken from the plan (a port is generated by using the plan hash).
6. pytorch_kvasir_unet test was fixed.

There is a restriction: it is possible get an information from an aggregator only when it is alive. It is acceptable for GetMetricStream, but should be resolved for GetTrainedModel in the future (there several way to resolve, but they should be discussed).

openfl/transport/grpc/aggregator_server.py Show resolved Hide resolved
openfl/component/director/director.py Show resolved Hide resolved
openfl/component/director/director.py Show resolved Hide resolved
openfl/component/director/director.py Outdated Show resolved Hide resolved
openfl/component/director/director.py Outdated Show resolved Hide resolved
openfl/component/director/experiment.py Outdated Show resolved Hide resolved
openfl/component/director/experiment.py Outdated Show resolved Hide resolved
openfl/transport/grpc/aggregator_client.py Outdated Show resolved Hide resolved
openfl/transport/grpc/aggregator_client.py Outdated Show resolved Hide resolved
openfl/transport/grpc/aggregator_client.py Outdated Show resolved Hide resolved
@aleksandr-mokrov aleksandr-mokrov force-pushed the director-aggregator-communication-by-rpc branch from 5d66d6e to 954d675 Compare January 24, 2023 22:37
Signed-off-by: Aleksandr Mokrov <[email protected]>

Update openfl/federated/plan/plan.py

Co-authored-by: Igor Davidyuk <[email protected]>
@aleksandr-mokrov aleksandr-mokrov force-pushed the director-aggregator-communication-by-rpc branch from 33a14fb to d5ec206 Compare February 16, 2023 13:54
@theakshaypant
Copy link
Collaborator

@aleksandr-mokrov can you please resolve the conflicts and request for review again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants