Federated Learning in Malware Detection
Author
Abstract

Malware detection constitutes a fundamental step in safe and secure computational systems, including industrial systems and the Internet of Things (IoT). Modern malware detection is based on machine learning methods that classify software samples as malware or benign, based on features that are extracted from the samples through static and/or dynamic analysis. State-of-the-art malware detection systems employ Deep Neural Networks (DNNs) whose accuracy increases as more data are analyzed and exploited. However, organizations also have significant privacy constraints and concerns which limit the data that they share with centralized security providers or other organizations, despite the malware detection accuracy improvements that can be achieved with the aggregated data. In this paper we investigate the effectiveness of federated learning (FL) methods for developing and distributing aggregated DNNs among autonomous interconnected organizations. We analyze a solution where multiple organizations use independent malware analysis platforms as part of their Security Operations Centers (SOCs) and train their own local DNN model on their own private data. Exploiting cross-silo FL, we combine these DNNs into a global one which is then distributed to all organizations, achieving the distribution of combined malware detection models using data from multiple sources without sample or feature sharing. We evaluate the approach using the EMBER benchmark dataset and demonstrate that our approach effectively reaches the same accuracy as the non-federated centralized DNN model, which is above 93\%.

Year of Publication
2023
Date Published
sep
Publisher
IEEE
Conference Location
Sinaia, Romania
ISBN Number
9798350339918
URL
https://ieeexplore.ieee.org/document/10275578/
DOI
10.1109/ETFA54631.2023.10275578
Google Scholar | BibTeX | DOI