Updates to straggler handling functionality #996

ParthMandaliya · 2024-07-04T10:25:12Z

This PR to resolves issue #981

Summary of changes

StragglerHandlingFunction: Added an interface method start_policy
CutoffTimeBasedStragglerHandling: start_policy method implements a timer to wait for cutoff-time and then call provided call callback method.
Aggregator:
- sendlocaltaskresults: update collaborators_done to keep track of collaborators that have finished ALL tasks
- _straggler_cutoff_time_elapsed: call back function that is called after cutoff time has elapsed and applies the straggler policy

Verification performed

Tested various scenarios for time based cutoff straggler handling policy
All Director based workflow and Aggregator based workflow test-cases

Signed-off-by: Parth Mandaliya <[email protected]>

…lingFunction abstract class. Renamed start_timer and __timer_expired functions to start_straggler_cutoff_timer and _straggler_cutoff_time_elapsed respectively. Added docstring to both functions mentioned above. Signed-off-by: Parth Mandaliya <[email protected]>

If one or more collaborator(s) does not even 1 task results in time, all tasks results sent by that collaborator is excluded from aggregation. Signed-off-by: Parth Mandaliya <[email protected]>

removing start_straggler_cutoff_timer function from parent class StragglerHandlingFunction Signed-off-by: Parth Mandaliya <[email protected]>

Signed-off-by: Parth Mandaliya <[email protected]>

…lts for all tasks Changed straggler handling logs Added docstring for functions in all straggler handling policies Signed-off-by: Parth Mandaliya <[email protected]>

2. CutoffTimeBasedStragglerHandling: start_policy method implements a timer to wait for cutoff-time and then call provided call callback method 3. Aggregator: - sendlocaltaskresults: update collaborators_done to keep track of collaborators that have finished ALL tasks - _straggler_cutoff_time_elapsed: call back function that is called after cutoff time has elapsed and applies the straggler policy Signed-off-by: Parth Mandaliya <[email protected]>

Added logger argument to get_straggler_handling_policy in plan.py Signed-off-by: Parth Mandaliya <[email protected]>

Signed-off-by: Parth Mandaliya <[email protected]>

minimum_reporting cannot be set 0 in any straggler policy Signed-off-by: Parth Mandaliya <[email protected]>

Signed-off-by: Parth Mandaliya <[email protected]>

Logs updated Logger argument removed from straggler handling policy classes Signed-off-by: Parth Mandaliya <[email protected]>

Signed-off-by: Parth Mandaliya <[email protected]>

Signed-off-by: Ishant Thakare <[email protected]>

teoparvanov

I have some additional questions and comments after the latest update.

openfl/component/aggregator/aggregator.py

openfl/component/straggler_handling_functions/cutoff_time_based_straggler_handling.py

tests/openfl/component/aggregator/test_aggregator.py

Signed-off-by: Ishant Thakare <[email protected]>

openfl/component/aggregator/aggregator.py

MasterSkepticista · 2024-09-04T09:24:08Z

openfl/component/aggregator/aggregator.py

+        # maintain a list of collaborators that have completed task and
+        # reported results in a given round
+        self.collaborators_done = []
+


If this variable is only used by straggler policies, it should be stored in the policy object. Intent is to not spill a particular component's needs into other component's code.

Example APIs: self.straggler_policy.reset() or self.straggler_policy.mark_as_done(...)

It is not only used by straggler policies but also to identify collaborators who have completed all their tasks in the
_compute_validation_related_task_metrics() function.

MasterSkepticista · 2024-09-04T09:25:32Z

openfl/component/aggregator/aggregator.py

Class attribute docstring is out of line with the actual arguments. Can you please fix that as well?

This change was introduced in seperate PR: #1003.
Unsure whether the changes to class attribute were intentional. Suggested to be discussed seperately as they do not seem to be related to straggler handling.

MasterSkepticista · 2024-09-04T09:31:55Z

openfl/component/__init__.py

@@ -14,5 +14,5 @@
    PercentageBasedStragglerHandling,
 )
 from openfl.component.straggler_handling_functions.straggler_handling_function import (


Even to a developer, this naming scheme is unintuitive. How can we expect users to remember?
Please eliminate redundancy in naming modules.

openfl.component.straggler_policy.CutOffPolicy # This is self-documenting, and easy to remember than openfl.component.straggler_handling_functions.straggler_handling_function.CutoffTimeBasedStragglerHandling

If this also means merging the three policy modules into one file, consider doing so. The code complexity of any policy does not justify having separate modules.

Teo had alternate naming suggestions #996 (comment).
Recommended that naming and other suggestions to be taken up in a seperate follow-up PR.

MasterSkepticista · 2024-09-04T09:36:46Z

openfl/component/straggler_handling_functions/cutoff_time_based_straggler_handling.py

-class CutoffTimeBasedStragglerHandling(StragglerHandlingFunction):
-    """Cutoff time based Straggler Handling function."""
-
+class CutoffTimeBasedStragglerHandling(StragglerHandlingPolicy):


Missing docstring (explaining usage and behavior, not the arguments).

openfl/component/straggler_handling_functions/cutoff_time_based_straggler_handling.py

MasterSkepticista · 2024-09-04T09:54:31Z

openfl/utilities/optimizers/numpy/adam_optimizer.py

Is this a breaking change? It seems unrelated to the PR.

Looks like an intermediate merge. Modified as per main line

MasterSkepticista · 2024-09-04T09:56:38Z

openfl/component/straggler_handling_functions/straggler_handling_function.py

@@ -7,15 +7,48 @@
 from abc import ABC, abstractmethod


-class StragglerHandlingFunction(ABC):
+class StragglerHandlingPolicy(ABC):


Eliminate redundancy, for a user would already know what is being imported by the time they type the module path.

Suggested: openfl.component.straggler_policy.BasePolicy
(+ prev comment on merging all classes to single module as it is simple logic)

This can be handled along with #996 (comment) recommended to be done as a seperate PR.

MasterSkepticista · 2024-09-04T10:02:57Z

openfl/component/straggler_handling_functions/percentage_based_straggler_handling.py

I think percentage based cutoff is using an incorrect logic.

Percentage-based cutoff should be a specialization of time-based cutoff.

If all collaborators take 1 second to finish each task, give or take, the policy should not go ahead with a x% cutoff "just-because" some collaborators won the race to call RPC. In this case there are no stragglers per-se.

Ideally, this policy should give a time budget to each collaborator, and then implement a percentage cutoff if the time budget is violated at each participant. What do you think?

Our understanding is that Percentage-based policy is not a subset of time-based policy, it is rather provided as an alternative to time-based straggler handling policy.
Due to this, suggestion of treating it as a specialization of cutoff policy cannot be considered
"Ideally, this policy should give a time budget to each collaborator, and then implement a percentage cutoff if the time budget is violated at each participant." - this seems quite similar to CutOffTimebased policy where Aggregator waits for a certain time for Collaborators to report before proceeding with early round end.

…afety Signed-off-by: Ishant Thakare <[email protected]>

openfl/component/aggregator/aggregator.py

Signed-off-by: Ishant Thakare <[email protected]>

openfl/component/straggler_handling_functions/cutoff_time_based_straggler_handling.py

* v1.0 straggler handling added Signed-off-by: Parth Mandaliya <[email protected]> * Added start_straggler_cutoff_timer abstract function in StragglerHandlingFunction abstract class. Renamed start_timer and __timer_expired functions to start_straggler_cutoff_timer and _straggler_cutoff_time_elapsed respectively. Added docstring to both functions mentioned above. Signed-off-by: Parth Mandaliya <[email protected]> * Updated logs straggler handling in aggregator.py. If one or more collaborator(s) does not even 1 task results in time, all tasks results sent by that collaborator is excluded from aggregation. Signed-off-by: Parth Mandaliya <[email protected]> * Only time based straggler handling policies require timer thread, removing start_straggler_cutoff_timer function from parent class StragglerHandlingFunction Signed-off-by: Parth Mandaliya <[email protected]> * Find all unfinished tasks and straggler collaborators Signed-off-by: Parth Mandaliya <[email protected]> * Review comments incorporated Signed-off-by: Parth Mandaliya <[email protected]> * Added inline comments Signed-off-by: Parth Mandaliya <[email protected]> * Changed logic to keep track of collaborators which have reported results for all tasks Changed straggler handling logs Added docstring for functions in all straggler handling policies Signed-off-by: Parth Mandaliya <[email protected]> * 1. StragglerHandlingFunction: Added an interface method start_policy 2. CutoffTimeBasedStragglerHandling: start_policy method implements a timer to wait for cutoff-time and then call provided call callback method 3. Aggregator: - sendlocaltaskresults: update collaborators_done to keep track of collaborators that have finished ALL tasks - _straggler_cutoff_time_elapsed: call back function that is called after cutoff time has elapsed and applies the straggler policy Signed-off-by: Parth Mandaliya <[email protected]> * Removed logger argument from start_policy & Added logger argument to get_straggler_handling_policy in plan.py Signed-off-by: Parth Mandaliya <[email protected]> * If cutoff time is set to infinite, do not start the timer thread. Signed-off-by: Parth Mandaliya <[email protected]> * Resolved lint issues. Signed-off-by: Parth Mandaliya <[email protected]> * Pytest and code coverage test case failure resolved Signed-off-by: Parth Mandaliya <[email protected]> * Default logger value is set to None Signed-off-by: Parth Mandaliya <[email protected]> * Redesigned percentage based straggler policy. minimum_reporting cannot be set 0 in any straggler policy Signed-off-by: Parth Mandaliya <[email protected]> * Code cleanup Signed-off-by: Parth Mandaliya <[email protected]> * Only collaborators_done are used for aggregation Signed-off-by: Parth Mandaliya <[email protected]> * Internal review comments incorporated Logs updated Logger argument removed from straggler handling policy classes Signed-off-by: Parth Mandaliya <[email protected]> * Resolving potential issues found during testing Signed-off-by: Parth Mandaliya <[email protected]> * Use _collaborator_task_completed method to check if all given tasks to collaborator are completed or not Signed-off-by: Parth Mandaliya <[email protected]> * Few test cases failing issue resolved Signed-off-by: Parth Mandaliya <[email protected]> * Potential issue in aggregator based workflow tutorial resolved Signed-off-by: Parth Mandaliya <[email protected]> * Corner case issue discovered during testing is patched. Signed-off-by: Parth Mandaliya <[email protected]> * Added reset_policy_for_round function in straggler handling policy base class. Signed-off-by: Parth Mandaliya <[email protected]> * This commit includes following changes: 1. In cutoff time based policy after cutoff time expires wait for all collaborators not just minimum required. 2. Irregardless of tasks assigned to collaborators if minimum required collaborators report resultsw in time apply straggler handling policy. Signed-off-by: Parth Mandaliya <[email protected]> * Code cleanup Signed-off-by: Parth Mandaliya <[email protected]> * Logs modified Signed-off-by: Parth Mandaliya <[email protected]> * Review comments on PR incorporated. Signed-off-by: Parth Mandaliya <[email protected]> * Log updated Signed-off-by: Parth Mandaliya <[email protected]> * Condition in straggler_cutoff_check fixed Signed-off-by: Parth Mandaliya <[email protected]> * If straggler cutoff time set to infinite only wait for minimum required collaborators to report results. Signed-off-by: Parth Mandaliya <[email protected]> * If straggler cutoff time is set to infinite wait for ALL collaborators not only for minimum_reporting collaborators Signed-off-by: Parth Mandaliya <[email protected]> * Teodor's review comments and internal review comments incorporated. flake8 issues resovled. Signed-off-by: Parth Mandaliya <[email protected]> * Changed minimum_reporting validation. Merged conditions in straggler_cutoff_check function in CutoffTimeBasedStragglerHandling class. Signed-off-by: Parth Mandaliya <[email protected]> * Resolved all flake8 issues. Signed-off-by: Parth Mandaliya <[email protected]> * Modified single inline comment Signed-off-by: Parth Mandaliya <[email protected]> * Review comments incorporated. Signed-off-by: Parth Mandaliya <[email protected]> * Lint fixes Signed-off-by: Ishant Thakare <[email protected]> * Incorporated Micah's review comments and removed unused code Signed-off-by: Ishant Thakare <[email protected]> * Incorporated Teo's review comments Signed-off-by: Ishant Thakare <[email protected]> * Incorporated review comments & added mutex in aggregator for thread safety Signed-off-by: Ishant Thakare <[email protected]> * Review comments incorporated and updated logs Signed-off-by: Ishant Thakare <[email protected]> * Reverted a comment for Pytest and code coverage fix Signed-off-by: Ishant Thakare <[email protected]> --------- Signed-off-by: Parth Mandaliya <[email protected]> Signed-off-by: Ishant Thakare <[email protected]> Co-authored-by: Ishant Thakare <[email protected]>

ParthMandaliya added 16 commits June 20, 2024 11:11

v1.0 straggler handling added

fe27f17

Signed-off-by: Parth Mandaliya <[email protected]>

Updated logs straggler handling in aggregator.py.

419abe1

If one or more collaborator(s) does not even 1 task results in time, all tasks results sent by that collaborator is excluded from aggregation. Signed-off-by: Parth Mandaliya <[email protected]>

Only time based straggler handling policies require timer thread,

f175657

removing start_straggler_cutoff_timer function from parent class StragglerHandlingFunction Signed-off-by: Parth Mandaliya <[email protected]>

Find all unfinished tasks and straggler collaborators

9dd88dc

Signed-off-by: Parth Mandaliya <[email protected]>

Merge branch 'securefederatedai:develop' into straggler-handling

4ccac38

Review comments incorporated

210cdf3

Signed-off-by: Parth Mandaliya <[email protected]>

Added inline comments

22e9ea1

Signed-off-by: Parth Mandaliya <[email protected]>

Changed logic to keep track of collaborators which have reported resu…

535dddb

…lts for all tasks Changed straggler handling logs Added docstring for functions in all straggler handling policies Signed-off-by: Parth Mandaliya <[email protected]>

Removed logger argument from start_policy &

9aaba8f

Added logger argument to get_straggler_handling_policy in plan.py Signed-off-by: Parth Mandaliya <[email protected]>

Merge branch 'develop' into straggler-handling

4e477d8

If cutoff time is set to infinite, do not start the timer thread.

37af42e

Signed-off-by: Parth Mandaliya <[email protected]>

Resolved lint issues.

21c0a23

Signed-off-by: Parth Mandaliya <[email protected]>

Pytest and code coverage test case failure resolved

7796414

Signed-off-by: Parth Mandaliya <[email protected]>

Default logger value is set to None

3343ecb

Signed-off-by: Parth Mandaliya <[email protected]>

ParthMandaliya marked this pull request as ready for review July 4, 2024 12:33

ParthMandaliya requested a review from psfoley July 4, 2024 12:33

ParthMandaliya marked this pull request as draft July 5, 2024 12:54

ParthMandaliya added 2 commits July 9, 2024 12:57

Redesigned percentage based straggler policy.

38d21c0

minimum_reporting cannot be set 0 in any straggler policy Signed-off-by: Parth Mandaliya <[email protected]>

Code cleanup

ae82b53

Signed-off-by: Parth Mandaliya <[email protected]>

ParthMandaliya linked an issue Jul 9, 2024 that may be closed by this pull request

Potential issues observed in straggler handling policy #981

Closed

ParthMandaliya self-assigned this Jul 9, 2024

Only collaborators_done are used for aggregation

7a028ff

Signed-off-by: Parth Mandaliya <[email protected]>

ParthMandaliya added workflow_interface and removed workflow_interface labels Jul 9, 2024

ParthMandaliya added 4 commits July 10, 2024 18:22

Internal review comments incorporated

dcf4a36

Logs updated Logger argument removed from straggler handling policy classes Signed-off-by: Parth Mandaliya <[email protected]>

Merge branch 'securefederatedai:develop' into straggler-handling

c5a84c2

Merge branch 'securefederatedai:develop' into straggler-handling

2995e39

Resolving potential issues found during testing

83c03d8

Signed-off-by: Parth Mandaliya <[email protected]>

ishant162 added 5 commits August 10, 2024 13:41

Merge branch 'securefederatedai:develop' into straggler-handling

3433e36

Merge branch 'develop' into straggler-handling

148d230

Lint fixes

474c2e1

Signed-off-by: Ishant Thakare <[email protected]>

Merge branch 'securefederatedai:develop' into straggler-handling

8a1cc8d

Incorporated Micah's review comments and removed unused code

75dd979

Signed-off-by: Ishant Thakare <[email protected]>

teoparvanov reviewed Aug 22, 2024

View reviewed changes

ishant162 added 5 commits August 25, 2024 18:50

Merge branch 'securefederatedai:develop' into straggler-handling

1d94a7e

Merge branch 'securefederatedai:develop' into straggler-handling

d4b9323

Incorporated Teo's review comments

55f71fd

Signed-off-by: Ishant Thakare <[email protected]>

Merge branch 'securefederatedai:develop' into straggler-handling

20073e5

Merge branch 'securefederatedai:develop' into straggler-handling

afa27ff

psfoley requested review from MasterSkepticista and bsenthilr September 3, 2024 16:31

Merge branch 'securefederatedai:develop' into straggler-handling

6c8b755

MasterSkepticista requested changes Sep 4, 2024

View reviewed changes

ishant162 added 2 commits September 10, 2024 16:50

Incorporated review comments & added mutex in aggregator for thread s…

185243f

…afety Signed-off-by: Ishant Thakare <[email protected]>

Merge branch 'securefederatedai:develop' into straggler-handling

65e3576

teoparvanov reviewed Sep 12, 2024

View reviewed changes

openfl/component/aggregator/aggregator.py Outdated Show resolved Hide resolved

Review comments incorporated and updated logs

7be92ea

Signed-off-by: Ishant Thakare <[email protected]>

teoparvanov approved these changes Sep 13, 2024

View reviewed changes

ishant162 added 2 commits September 19, 2024 08:46

Merge branch 'securefederatedai:develop' into straggler-handling

f07592e

Reverted a comment for Pytest and code coverage fix

3aca322

Signed-off-by: Ishant Thakare <[email protected]>

psfoley requested changes Sep 23, 2024

View reviewed changes

openfl/component/straggler_handling_functions/cutoff_time_based_straggler_handling.py Show resolved Hide resolved

psfoley self-requested a review September 23, 2024 18:21

psfoley approved these changes Sep 23, 2024

View reviewed changes

psfoley merged commit 7c33420 into securefederatedai:develop Sep 23, 2024
26 checks passed

psfoley deleted the straggler-handling branch September 23, 2024 18:22

ishant162 mentioned this pull request Oct 23, 2024

Straggler handling Follow-up #1097

Draft

1 task

teoparvanov mentioned this pull request Nov 15, 2024

tensor_db get_aggregated_tensor duplicates code, returns tuple in corner case #1087

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to straggler handling functionality #996

Updates to straggler handling functionality #996

ParthMandaliya commented Jul 4, 2024

teoparvanov left a comment

MasterSkepticista Sep 4, 2024

ishant162 Sep 10, 2024

MasterSkepticista Sep 4, 2024

ishant162 Sep 10, 2024

MasterSkepticista Sep 4, 2024

ishant162 Sep 10, 2024

MasterSkepticista Sep 4, 2024

ishant162 Sep 10, 2024

MasterSkepticista Sep 4, 2024

ishant162 Sep 10, 2024

MasterSkepticista Sep 4, 2024

ishant162 Sep 10, 2024

MasterSkepticista Sep 4, 2024 •

edited

Loading

ishant162 Sep 10, 2024

Updates to straggler handling functionality #996

Updates to straggler handling functionality #996

Conversation

ParthMandaliya commented Jul 4, 2024

teoparvanov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MasterSkepticista Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MasterSkepticista Sep 4, 2024 •

edited

Loading