Spotlight on Lablet Research #34 - Foundations of CPS Resilience

Spotlight on Lablet Research #34 -

Foundations of CPS Resilience

Lablet: Vanderbilt University

The goals of this project are to develop the principles and methods for designing and analyzing resilient CPS architectures that deliver the required service in the face of compromised components. A fundamental challenge is to understand the basic tenets of CPS resilience and how they can be used in developing resilient architectures.

As CPS become more prevalent in critical application domains, ensuring security and resilience in the face of cyber-attacks is becoming an issue of paramount importance. Cyber-attacks against critical infrastructures, smart water-distribution and transportation systems, for example, pose serious threats to public health and safety. Owing to the severity of these threats, a variety of security techniques are available. However, no single technique can address the whole spectrum of cyber-attacks that may be launched by a determined and resourceful attacker. In light of this, the research team, led by Principal Investigator Xenofon Koutsoukos, is considering a multi-pronged approach for designing secure and resilient CPS, which integrates redundancy, diversity, and hardening techniques for designing passive resilience methods that are inherently robust against attacks and active resilience methods that allow responding to attacks. The team is also introducing a framework for quantifying cyber-security risks and optimizing the system design by determining security investments in redundancy, diversity, and hardening. To demonstrate the applicability of the framework, they use a modeling and simulation integration platform for experimentation and evaluation of resilient CPS using CPS application domains such as power, transportation, and water distribution systems.

Activities and accomplishments over the past year include the following:

Modeling and Simulating Attacks in Power Systems
Due to the increased deployment of novel communication, control and protection functions, the grid has become vulnerable to a variety of attacks. Robust machine learning-based attack detection and mitigation algorithms require large amounts of data that rely heavily on a representative environment, where different attacks can be simulated. Researchers have developed a comprehensive tool-chain for modeling and simulating attacks in power systems. First, they present a probabilistic domain-specific language to define multiple attack scenarios and simulation configuration parameters. Secondly, they extend the PyPower-dynamics simulator with protection system components to simulate cyber-attacks in control and protection layers of power system. They demonstrate multiple attack scenarios with a case study based on IEEE 39 bus system.

Byzantine Resilient Aggregation in Distributed Reinforcement Learning
Recent distributed Reinforcement Learning (RL) techniques utilize networked agents to accelerate exploration and speed up learning. However, such techniques are not resilient in the presence of Byzantine agents, which can disturb convergence. In this work, researchers present a Byzantine resilient aggregation rule for distributed RL with networked agents that incorporates the idea of optimizing the objective function in designing the aggregation rules. The research team evaluates their approach using multiple RL environments for both value-based and policy-based methods with homogeneous and heterogeneous agents. The results show that cooperation using the proposed approach exhibits better learning performance than the non-cooperative case and is resilient in the presence of an arbitrary number of Byzantine agents.

Assurance Monitoring of Learning-enabled CPS
Machine learning components such as deep neural networks are used extensively in CPS, but such components may introduce new types of hazards that can have disastrous consequences and need to be addressed for engineering trustworthy systems. Although deep neural networks offer advanced capabilities, they must be complemented by engineering methods and practices that allow effective integration in CPS. Researchers proposed an approach for assurance monitoring of learning-enabled CPS based on the conformal prediction framework. In order to allow real-time assurance monitoring, the approach employs distance learning to transform high-dimensional inputs into lower-size embedding representations. By leveraging conformal prediction, the approach provides well-calibrated confidence and ensures a bounded small error rate while limiting the number of inputs for which an accurate prediction cannot be made. The experimental results demonstrate that the error rates are well-calibrated while the number of alarms is very small. The method is computationally efficient and allows real-time monitoring. Current and future work includes using the approach for detection and classification of attacks in CPS/IoT.

Reliable Probability Intervals for Classification
Deep neural networks are frequently used by autonomous systems for their ability to learn complex, non-linear data patterns and make accurate predictions in dynamic environments. However, their use as black boxes introduces risks, as the confidence in each prediction is unknown. Different frameworks have been proposed to compute accurate confidence measures along with the predictions, but they are constrained by limitations such as execution time overhead or the inability to be used with high-dimensional data. In this research, the team uses Inductive Venn Predictors for computing probability intervals regarding the correctness of each prediction in real-time. Researchers propose taxonomies based on distance metric learning to compute informative probability intervals in applications involving high-dimensional inputs. Empirical evaluation on botnet attacks detection in Internet-of-Things (IoT) applications demonstrates improved accuracy and calibration. The proposed method is computationally efficient, and therefore, can be used in real-time.

Inductive Conformal Out-of-distribution Detection
Machine learning components are used extensively to cope with various complex tasks in highly uncertain environments. However, Out-Of-Distribution (OOD) data may lead to predictions with large errors and degrade performance considerably. This research first introduces different types of OOD data and then presents an approach for OOD detection for classification problems efficiently. The approach utilizes an Adversarial Autoencoder (AAE) for representing the training distribution and Inductive Conformal Anomaly Detection (ICAD) for online detection of OOD high-dimensional data. Experimental results using several datasets demonstrate that the approach can detect various types of OOD data with a small number of false alarms. Moreover, the execution time is very short, allowing for online detection.

Resilient Distributed Vector Consensus Using Centerpoint
The research team studied the resilient vector consensus problem in networks with adversarial agents and improved resilience guarantees of existing algorithms. A common approach to achieving resilient vector consensus is that every non-adversarial (or normal) agent in the network updates its state by moving towards a point in the convex hull of its normal neighbors’ states. Since an agent cannot distinguish between its normal and adversarial neighbors, computing such a point, often called safe point, is a challenging task. To compute a safe point, the team proposed to use the notion of centerpoint, which is an extension of the median in higher dimensions, instead of the Tverberg partition of points, which is often used for this purpose. Their research addresses the idea that centerpoint provides a complete characterization of safe points in a d-dimensional space. In particular, the research shows that a safe point is essentially an interior centerpoint if the number of adversaries in the neighborhood of a normal agent i is less than Ni/(d+1), where d is the dimension of the state vector and Ni is the total number of agents in the neighborhood of i. Consequently, they obtain necessary and sufficient conditions on the number of adversarial agents to guarantee resilient vector consensus. Further, by considering the complexity of computing centerpoints, they discuss improvements in the resilience guarantees of vector consensus algorithms and compare with the other existing approaches. Finally, they numerically evaluate the approach.

Resilient Multi-Agent Reinforcement Learning Using Medoid and Soft-Medoid Based Aggregation
A network of RL agents that cooperate with each other by sharing information can improve learning performance of control and coordination tasks when compared to non-cooperative agents. However, networked Multi-agent RL (MARL) is vulnerable to adversarial agents that can compromise some agents and send malicious information to the network. In this work, researchers consider the problem of resilient MARL in the presence of adversarial agents that aim to compromise the learning algorithm. First, the research presents an attack model which aims to degrade the performance of a target agent by modifying the parameters shared by an attacked agent. In order to improve resilience, the researchers develop aggregation methods using medoid and soft-medoid functions. Their analysis shows that the medoid-based MARL algorithms converge to an optimal solution given standard assumptions and improve the overall learning performance and robustness. Simulation results show the effectiveness of the aggregation methods compared with average and median-based aggregation.

Moving Target Defense for Cyber-Physical Systems
Memory corruption attacks such as code injection, code reuse, and non-control data attacks have become widely popular for compromising safety-critical CPS. Moving Target Defense (MTD) techniques such as Instruction Set Randomization (ISR), Address Space Randomization (ASR), and Data Space Randomization (DSR) can be used to protect systems against such attacks. CPS often use time-triggered architectures to guarantee predictable and reliable operation. MTD techniques can cause time delays with unpredictable behavior. To protect CPS against memory corruption attacks, MTD techniques can be implemented in a mixed time and event-triggered architecture that provides capabilities for maintaining safety and availability during an attack. This work presents a mixed time and event-triggered MTD security approach based on the ARINC 653 architecture that provides predictable and reliable operation during normal operation and rapid detection and reconfiguration upon detection of attacks. Researchers leverage a hardware-in-the-loop testbed and an Advanced Emergency Braking System (AEBS) case study to show the effectiveness of the approach.

Exploiting EM Side-Channel Information of GPUs to Eavesdrop on Your Neighbors
As the popularity of Graphics Processing Units (GPUs) grows rapidly in recent years, it becomes very critical to study and understand the security implications imposed by them. This research shows that modern GPUs can "broadcast" sensitive information over the air to make a number of attacks practical. Specifically, the researchers present a new electromagnetic (EM) side-channel vulnerability that they have discovered in many GPUs of both NVIDIA and AMD. They show that this vulnerability can be exploited to mount realistic attacks through two case studies, which are website fingerprinting and keystroke timing inference attacks. The investigation recognizes the commonly used Dynamic Voltage and Frequency Scaling (DVFS) feature in GPU as the root cause of this vulnerability. Nevertheless, the research also shows that simply disabling DVFS may not be an effective countermeasure since it will introduce another highly exploitable EM side-channel vulnerability. To the best of the research team’s knowledge, this is the first work that studies realistic physical side-channel attacks on non-shared GPUs at a distance.

Submitted by Anonymous on Wed, 09/28/2022 - 07:43