Spotlight on Lablet Research #39 - Multi-Model Test Bed for the Simulation-Based Evaluation of Resilience

Spotlight on Lablet Research #39 -

Multi-Model Test Bed for the Simulation-Based Evaluation of Resilience

Lablet: Vanderbilt University

The goal of the Multi-model Testbed is to provide a collaborative design tool for evaluating various cyberattack/defense strategies and their effects on the physical infrastructure. The web-based, cloud-hosted environment integrates state-of-the-art simulation engines for the different Cyber-Physical System (CPS) domains and presents interesting research challenges as ready-to-use scenarios. Input data, model parameters, and simulation results are archived, versioned with a strong emphasis on repeatability and provenance.

Under the direction of Principal Investigator (PI) Peter Volgyesi and CO-PI Himanshu Neema, the researchers developed the SURE platform, a modeling and simulation integration testbed for the evaluation of resilience for complex CPS. Previous efforts resulted in a web-based collaborative design environment for attack-defense scenarios supported by a cloud-deployed simulation engine for executing and evaluating the scenarios. The goal of this project is to significantly extend these design and simulation capabilities for a better understanding of the security and resilience aspects of CPS systems. These improvements include first-class support for the design of experiments (exploring different parameters and/or strategies), target alternative CPS domains (connected vehicles, railway systems, and smart grids), incorporating models of human behavior, and executing multistage games. Researchers also integrate state-of-the-art Machine Learning (ML) libraries and workflows to support security research with Artificial Intelligence (AI)-assisted CPS applications. To achieve these goals, they introduced significant changes to the SURE testbed architecture, replacing an HLA-based C2 Windtunnel federated simulation engine with a more lightweight integration approach within WebGME and DeepForge.

Deep Learning Testbed Infrastructure and Graph Neural Networks on AWS

The research team made significant improvements to DeepForge, the web-based collaborative design and experimentation platform for deep neural network-oriented research, including developing a graph neural network support for DeepForge to create a more accessible design and evaluation environment in this domain. Research in developing novel graph descriptor representations has been supported by an Amazon Web Services (AWS)-based scalable deployment. This work borrows some ideas from the controllability of Laplacian dynamics and obtains more expressive representations of the graph structure (graph embedding) based on how some phenomenon spreads/propagates/evolves in the structure. Network training and evaluation require significant computational power, thus relying on customized on-demand AWS instances to support this effort.

Threat Modeling and Risk Analysis in Industrial Control Systems

In this effort, the research team is working on developing a modeling and analysis framework for threats and cybersecurity risks in Industrial Control Systems (ICS). Identification of system vulnerabilities and implementation of appropriate risk mitigation strategies are crucial for ensuring the cybersecurity of ICS. These system vulnerabilities must be evaluated depending on their exploitability, impact, mitigation status, and target platform and environments. Therefore, in order to assess system vulnerabilities and risk mitigation strategies quantitatively, the team is focusing on threat modeling and risk analysis methods for the cybersecurity of Railway Transportation Systems (RTS), which are real-world ICS and have become increasingly vulnerable to cyberattacks due to growing reliance on networked physical and computation components. Another interesting aspect of RTS is that these systems have a continuously changing network topology due to moving locomotives. These systems, in general, are cyber-physical systems with integral but non-stationary components. The key challenge posed by non-stationarity is the evolving nature of threats and vulnerability propagation owing to dynamic network connections that form and disappear as components move.

The framework dealing with this effort is called the Risk Analysis Framework (RAF). RAF has seven major components. The first component is a modeling environment for system architecture where the ICS can be modeled with a complete component hierarchy and the communication network topology. The second component allows for modeling cyber vulnerabilities, specifying attack ports and risk mitigation actions, and risk flows across components through attack ports. It also enables creating a library of cyber exploits and mitigations. The third component provides for validation of all models. The fourth component is for vulnerability assessment that propagates the risk with the system through network connections and hierarchy composition and generates the component attack trees and system attack graphs. It also rank-orders the system vulnerabilities in decreasing order of their impact on the overall system's cyber risk. The fifth component is for the generation of code and artifacts from the risk assessments. The sixth component is a major tool for risk management planning which allows for cyber gaming various available risk mitigation actions against potential cyber exploits. The seventh component is for the visualization of results and for analysis. The research team already visualizes component attack trees and system attack trees. The work on the visualization of risk management analysis is ongoing.

The research team has been successful in modeling the dynamic network connections and integrating them into dynamic vulnerability propagation algorithms. The researchers extended the framework to incorporate cyber-gaming of exploits versus mitigations to plan for worst-case attacks and also developed methods to deal with dynamic network connections where the vulnerabilities and their propagation via changing network connectivity continually changes. They worked further on improving the methods and algorithms for dynamic risk management using cyber scenarios as well as on integrating this framework with tools that enable integrated simulation-based quantitative evaluations of the cybersecurity of CPS. Researchers have designed new methods to connect the RAF framework with the integrated simulation testbed. In one of the demonstrated scenarios, they utilized the vulnerability scores of system components in RAF to design cybersecurity assessment scenarios in the integrated simulation environment. In addition, they analyzed the security mechanisms in the simulation environment against the cyber threats that mimic the vulnerabilities modeled in RAF and developed the mitigation scores corresponding to these security mechanisms. The mitigation scores are then fed back to RAF to update the models and recalculate the update risk profile of the system. The researchers believe that this bidirectional link between the risk analysis framework and the integrated simulation testbed is a powerful new method for more accurately assessing the cybersecurity risks of CPS.

Physics-guided Learning and Surrogate Modeling - Resilient CPS Applications

The team continued experimentation work for structural design and health monitoring for CPS applications, and developed two alternative methods for auto-generating FAE static stress simulation results with relatively simple parametric CAD models (pressure vessel capsules). The generated datasets are used for developing physics-guided ML models and are also used for experimenting with a graph machine learning-based approach for FEM surrogate modeling and/or topology optimization. While the physics-guided learning approach has a broad use-case in CPS (e.g., buildings, transportation infrastructure), the team successfully applied the results in the design process of Unmanned Underwater Vehicles (UUV) as part of the DARPA Symbiotic program.

Resilient Consensus using Centerpoint Algorithm and Hashgraph Based Communication

Unmanned Aerial Vehicles (UAVs) are used for a variety of tasks, such as inspection of dangerous environments, surveillance, and pursuit of a target. These systems use distributed machine learning algorithms to cooperate towards achieving an objective and are prone to Denial of Service (DoS) and integrity attacks. The team investigated an approach for integrating a messaging mechanism and a coordination algorithm based on Stochastic Gradient Descent (SGD) in a multi-agent network for target pursuit that is resilient against such attacks. The network consists of agents sending messages containing local data and state estimates and uses the SGD algorithm to optimize the global loss by aggregating state estimates from immediate neighbors. The network can suffer from a DoS attack to disrupt the ordering of messages or an integrity attack where one agent sends arbitrary estimates to neighbors to disrupt the convergence of normal agents toward an optimal state. The messaging mechanism uses Hashgraph, a distributed ledger technology, to guarantee a correct ordering of messages. The SGD algorithm uses centerpoint-based aggregation for converging to a target in the presence of compromised agents. They evaluated the approach using scenarios of target pursuit for multi-UAV systems using simulations in Microsoft AirSim with PX4 flight controllers. The evaluation results demonstrate cases for which the multi-agent system under attack is resilient and converges to the approximate optimal state. They have finished the project for the power testbed, demonstrating it with the integration of Hashgraph and Centerpoint. Currently, the team is focused on finalizing work with the Multi-UAV system. Furthermore, they have continued to design analytics for their resilient Multi-UAV target pursuit to evaluate for consistency of data distributed to each agent and that each agent has a consistent global view of the target. They evaluated these analytics in both non-adversarial and attack scenarios and concluded that the system maintains a global-consistent view and converges to the target.

General-Purpose ML Attack Library

Based on previous work in the CPSWT framework on general-purpose cyberattack library and its use in resilience evaluation using courses-of-action, researchers started investigating the idea of creating a general-purpose ML attack library. The idea is that these ML attacks will be designed to be generic and can be quickly adapted to attack and test the resilience of different ML models flexibly with simple configurations for customization. This work is in the initial stages, and the researchers are planning to use the DeepForge platform for developing the configurable, reusable ML attack library. The DeepForge platform uses WebGME as the metamodeling environment and supports the Keras ML library for developing ML pipelines. They will use the same platform for developing the ML attack library.

Submitted by Anonymous on Wed, 02/22/2023 - 11:33