"Open Source Platform Enables Research on Privacy-Preserving Machine Learning"
Researchers at the University of Michigan (U-M) have made the largest benchmarking data set available for a Machine Learning (ML) method created with data privacy in mind. The method, known as federated learning, trains ML models on end-user devices such as laptops and smartphones instead of requiring the transfer of private data to central servers. Researchers can train on larger real-world data by training in the original place where the data it is generated, according to Fan Lai, a U-M doctoral student in computer science and engineering who presented the FedScale training environment at the 2022 International Conference on ML. This also enables researchers to reduce privacy risks as well as the high communication and storage costs associated with collecting raw data from end-user devices and storing it in the cloud. Federated learning, which is still in its early stages, is based on an algorithm that acts as a centralized coordinator. It sends the model to the devices, trains it locally on the relevant user data, and then returns each partially trained model to create the final global model. This workflow adds an extra layer of data privacy and security to various applications. Models can be improved without fear of data center vulnerabilities by using messaging apps, health care data, personal documents, and other sensitive but useful training materials. In addition to protecting privacy, federated learning has the potential to make model training more resource-efficient by reducing and, in some cases, eliminating large data transfers. However, training across multiple devices means there are no guarantees about the computing resources available, and uncertainties such as user connection speeds and device specs result in a pool of data options of varying quality. Federated learning is a rapidly growing research area, but most of the work uses a handful of small data sets that do not represent many aspects of this type of ML technique. This is where FedScale comes into play, as the platform can simulate the behavior of millions of user devices on a few GPUs and CPUs, allowing ML model developers to see how their federated learning program will perform without requiring large-scale deployment. It can perform image classification, object detection, language modeling, speech recognition, and machine translation, among other things. This article continues to discuss the concept of federated learning as well as the FedScale training environment.