Add Heterogeneity in FL glossary entry (#4236)

Co-authored-by: Yan Gao <[email protected]>
adap · Oct 28, 2024 · a98f069 · a98f069
1 parent cfe5328
commit a98f069
Showing 1 changed file with 38 additions and 0 deletions.
diff --git a/glossary/heterogeneity-in-federated-learning.mdx b/glossary/heterogeneity-in-federated-learning.mdx
@@ -0,0 +1,38 @@
+---
+title: "Heterogenity in Federated Learning"
+description: "Heterogeneity is a core challenge in FL, and countering the problems that result from it is an active field of study. We distinguish statistical and structural heterogeneity."
+date: "2024-05-24"
+author:
+  name: "Adam Narożniak"
+  position: "ML Engineer at Flower Labs"
+  website: "https://discuss.flower.ai/u/adam.narozniak/summary"
+---
+
+Heterogeneity is a core challenge in federated learning (FL), and countering the problems that result from it is an active field of study in FL. We can distinguish the following categories:
+* statistical heterogeneity (related to data),
+* structural heterogeneity (related to resources and infrastructure).
+
+Real-world FL training can exhibit any combination of the problems described below.
+
+### Statistical Heterogeneity
+Statistical heterogeneity is the situation in which the clients' distributions are not equal, which can be a result of the following:
+* feature distribution skew (covariate shift),
+* label distribution skew (prior probability shift),
+* same label, different features (concept drift),
+* same features, different label (concept shift),
+* quantity skew.
+
+### Structural Heterogeneity 
+Structural Heterogeneity results from different types of devices that can be in the same federation of FL devices, which can exhibit differences in the following:
+* computation resource (different chips), which leads to different training times,
+* storage resources (available disk space), which can imply, e.g., not enough resources to store the results (which can also indicate a different number of samples),
+* energy levels/charging status, current resource consumption, which can change over time and can imply the lack of willingness/capabilities to join the training,
+* network connection, e.g., unstable network connection, can lead to more frequent dropouts and lack of availability.
+
+### Simulating Heterogeneity - Flower Datasets
+Flower Datasets is a library that enables you to simulate statistical heterogeneity according to the various partitioning schemes (see all [here](https://flower.ai/docs/datasets/ref-api/flwr_datasets.partitioner.html)).
+It provides ways of simulating quantity skew, label distribution skew, and mix of them, depending on the object used. It also enables working with datasets that naturally exhibit different types of heterogeneity.
+
+### Countering Heterogeneity - Strategies in Flower
+Flower is a library that enables you to perform federated learning in deployment (real-life scenario) and simulation. It provides out-of-the-box weight aggregation 
+strategies (see them [here](https://flower.ai/docs/framework/ref-api/flwr.server.strategy.html)), which are used as the core measures to mitigate problems in heterogeneous environments.