Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jean/efficient bootstrapping #3

Merged
merged 43 commits into from
Jan 23, 2024
Merged

Jean/efficient bootstrapping #3

merged 43 commits into from
Jan 23, 2024

Conversation

jeandut
Copy link
Collaborator

@jeandut jeandut commented Jan 9, 2024

This PR implements a more efficient way to do bootstrap in FL using substra using variations of @arthurPignetOwkin's code patterns.
While the details of this implementation are quite tricky the main idea is simple: hook all methods of algo and strategies so that they are executed n_bootstrap times producing n_bootstrap outputs. The rest is just boilerplate code to glue everything together respecting substra and substrafl's constraints.

This PR is a first attempt and will probably be simplified over time (next iteration will use different self for the different bootstrap runs aka the self will not merged, which will allow much cleaner code and avoid tricky side effects).

Note that the local bootstrap vs global bootstrap is an issue but is a detail of the PR and will be done in another PR.

setup.py Outdated Show resolved Hide resolved
@jeandut jeandut marked this pull request as ready for review January 22, 2024 16:04
@jeandut jeandut requested a review from a user January 22, 2024 16:04
@jeandut jeandut merged commit f94bc5d into main Jan 23, 2024
6 checks passed
@jeandut jeandut deleted the jean/efficient_bootstrapping branch January 23, 2024 10:59
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @jeandut . I have a few comments to better understand the logic


btst_algo = BtstAlgo(**strategy.algo.kwargs)

class BtstStrategy(strategy.__class__):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BootstrapStrategy would have been more explicit IMO (same for BootstrapAlgo vs BtstAlgo

Comment on lines +189 to +192
self.checkpoints_list = [
copy.deepcopy(self._get_state_to_save())
for _ in range(len(bootstrap_seeds_list))
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this part. @jeandut could you please explain why you do not need to pass a variable to indicate which bootstrap replicate to save?

datasamples.shape[0], replace=True, random_state=rng
)
# Loading the correct state into the current main algo
if self.checkpoints_list[idx] is not None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be not None, no? Otherwise, there is an issue?

Comment on lines +375 to +394
def equality_check(a, b):
if type(a) != type(b): # noqa: E721
return False
else:
if isinstance(a, dict):
for k in a.keys():
if not equality_check(a[k], b[k]):
return False
return True
elif isinstance(a, list):
for i in range(len(a)):
if not equality_check(a[i], b[i]):
return False
return True
elif isinstance(a, np.ndarray):
return np.all(a == b)
elif isinstance(a, torch.Tensor):
return torch.all(a == b)
else:
return a == b
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this function outside would easy readability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant