-
Notifications
You must be signed in to change notification settings - Fork 0
Federated Learning
"A machine learning technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them. This approach stands in contrast to traditional centralized machine learning techniques where all the local datasets are uploaded to one server, as well as to more classical decentralized approaches which often assume that local data samples are identically distributed.
Federated learning enables multiple actors to build a common, robust machine learning model without sharing data, thus allowing to address critical issues such as data privacy, data security, data access rights and access to heterogeneous data."
-- Federated Learning, Wikipedia
Federated learning has several advantages over "traditional centralized machine learning."
Here are two popular advantages:
In FedML, edge devices do not share their private data samples. Instead, they influence an overarching model by various weight updating techniques. In this way, FedML seeks to build models that can benefit from everyone's data without directly exposing everyone's private data.
Rather than having a single global model that everyone uses, a FedML system may enable local devices to exert some degree of personalization, allowing for increased accuracy for each local edge device. Edge devices may still contribute to a global model, but have the freedom to further hone in on individual data.
As defined in Federated Machine Learning: Concept and Applications, Q. Yang 2019, there are two important kinds of Federated Learning to consider:
Horizontal Federated Learning: Same feature space, different samples. Think of two banks, each with the same information types for each client (name, account balance, etc), but each bank has a set of different clients. Perhaps they would like to combine the information power of their respective client data to create a better machine learning model, but they would not like to actually reveal client data to each other.
Vertical Federated Learning: Same samples, different feature space. Think of a bank and a grocery store. Bank clients are also customers at the grocery store. The bank has financial information about each client. The grocery store also has data about these same people, but not the same features (perhaps they track purchase types, coupon usage, etc). The bank and grocery store would similarly like to work together to build a model that takes advantage of both data sources, without having to explicitly expose their own data to one another.
These two categories are not mutually exclusive. The aforementioned paper defines a mix of these two as "Transfer Learning."
There are many open challenges in Federated Learning. Some of these are challenges inherited from related fields, such as the broader field of machine learning, or from distributed systems. A few of these challenges are described below.
In many Federated Learning systems, each edge node device will contribute in some way to the overall model. This often occurs during "rounds" of training, whereby special criteria must be employed to decide which nodes will contribute. Will all nodes contribute every round? This could become an expensive operation. Is it really necessary? What happens if some nodes become unreachable for some reason, and cannot communicate?
Once edge nodes have been selected, the system must decide how they will contribute to the model. This often occurs through updates to the weights of a neural network, with each node making a small adjustment. How big of an adjustment should each node be allowed to make? Should all nodes be weighted equally?
The simplest approach is to simply weight every node equally and average the contributions evenly (see FedAvg). Other more advanced approaches attempt to assess the quality of the proposed changes from each node, either by test performance, analysis of underlying data distribution, or other criteria. "Higher quality" contributions are allowed to have a larger influence on the global model. Other approaches may employ stratification or other similar techniques designed to ensure some metric of fairness in order to ensure representation of key demographics within the sample space.
This is a challenge inherited from distributed systems in general. Who do we trust in a Federated Learning system? Do the edge devices trust a "central authority" that aggregates inputs and forms a global model? Does a central authority always trust edge nodes to make honest parameter updates? What if a malicious player attempts to attack the system by providing deliberately curated "weight updates" to either influence the global model, or discover information about other edge devices data (which should be private)?
With a centralized system, one might have a curated repository of labelled data beforehand to train on. However, in an edge based system, it may not be possible for edge devices to collect new data points and be expected to have good classification labels for the new data. We may also not trust edge devices to generate labels even where this is possible.
One clever approach to this problem involves having edge devices only perform unsupervised learning to "learn a feature space," while leaving the supervised task to a central authority.
One interesting case to note is a Federated Learning system that is designed to predict future events - in this case, labels are naturally available and known to be accurate (since the true label is simply what happens next, which can be easily recorded).
Please visit the Papers section of this wiki for a list of relevant papers for further reading.