Skip to content

Commit

Permalink
Add Heterogeneity in FL glossary entry (#4236)
Browse files Browse the repository at this point in the history
Co-authored-by: Yan Gao <[email protected]>
  • Loading branch information
adam-narozniak and yan-gao-GY authored Oct 28, 2024
1 parent cfe5328 commit a98f069
Showing 1 changed file with 38 additions and 0 deletions.
38 changes: 38 additions & 0 deletions glossary/heterogeneity-in-federated-learning.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: "Heterogenity in Federated Learning"
description: "Heterogeneity is a core challenge in FL, and countering the problems that result from it is an active field of study. We distinguish statistical and structural heterogeneity."
date: "2024-05-24"
author:
name: "Adam Narożniak"
position: "ML Engineer at Flower Labs"
website: "https://discuss.flower.ai/u/adam.narozniak/summary"
---

Heterogeneity is a core challenge in federated learning (FL), and countering the problems that result from it is an active field of study in FL. We can distinguish the following categories:
* statistical heterogeneity (related to data),
* structural heterogeneity (related to resources and infrastructure).

Real-world FL training can exhibit any combination of the problems described below.

### Statistical Heterogeneity
Statistical heterogeneity is the situation in which the clients' distributions are not equal, which can be a result of the following:
* feature distribution skew (covariate shift),
* label distribution skew (prior probability shift),
* same label, different features (concept drift),
* same features, different label (concept shift),
* quantity skew.

### Structural Heterogeneity
Structural Heterogeneity results from different types of devices that can be in the same federation of FL devices, which can exhibit differences in the following:
* computation resource (different chips), which leads to different training times,
* storage resources (available disk space), which can imply, e.g., not enough resources to store the results (which can also indicate a different number of samples),
* energy levels/charging status, current resource consumption, which can change over time and can imply the lack of willingness/capabilities to join the training,
* network connection, e.g., unstable network connection, can lead to more frequent dropouts and lack of availability.

### Simulating Heterogeneity - Flower Datasets
Flower Datasets is a library that enables you to simulate statistical heterogeneity according to the various partitioning schemes (see all [here](https://flower.ai/docs/datasets/ref-api/flwr_datasets.partitioner.html)).
It provides ways of simulating quantity skew, label distribution skew, and mix of them, depending on the object used. It also enables working with datasets that naturally exhibit different types of heterogeneity.

### Countering Heterogeneity - Strategies in Flower
Flower is a library that enables you to perform federated learning in deployment (real-life scenario) and simulation. It provides out-of-the-box weight aggregation
strategies (see them [here](https://flower.ai/docs/framework/ref-api/flwr.server.strategy.html)), which are used as the core measures to mitigate problems in heterogeneous environments.

0 comments on commit a98f069

Please sign in to comment.