Evaluating Synthetic Datasets for Training Machine Learning Models to Detect Malicious Commands
Author
Abstract

Electrical substations in power grid act as the critical interface points for the transmission and distribution networks. Over the years, digital technology has been integrated into the substations for remote control and automation. As a result, substations are more prone to cyber attacks and exposed to digital vulnerabilities. One of the notable cyber attack vectors is the malicious command injection, which can lead to shutting down of substations and subsequently power outages as demonstrated in Ukraine Power Plant Attack in 2015. Prevailing measures based on cyber rules (e.g., firewalls and intrusion detection systems) are often inadequate to detect advanced and stealthy attacks that use legitimate-looking measurements or control messages to cause physical damage. Additionally, defenses that use physics-based approaches (e.g., power flow simulation, state estimation, etc.) to detect malicious commands suffer from high latency. Machine learning serves as a potential solution in detecting command injection attacks with high accuracy and low latency. However, sufficient datasets are not readily available to train and evaluate the machine learning models. In this paper, focusing on this particular challenge, we discuss various approaches for the generation of synthetic data that can be used to train the machine learning models. Further, we evaluate the models trained with the synthetic data against attack datasets that simulates malicious commands injections with different levels of sophistication. Our findings show that synthetic data generated with some level of power grid domain knowledge helps train robust machine learning models against different types of attacks.

Year of Publication
2022
Conference Name
2022 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm)
Google Scholar | BibTeX