Spotlight on Lablet Research #30 - An Automated Synthesis Framework for Network Security and Resilience
Spotlight on Lablet Research #30 -
An Automated Synthesis Framework for Network Security and Resilience
Lablet: University of Illinois at Urbana-Champaign
Participating Sub-Lablet: University of Arkansas
The goal of this project is to develop the analysis methodology needed to support scientific reasoning about the resilience and security of networks, with a particular focus on network control and information/data flow.
The core of this vision is an Automated Synthesis Framework (ASF), which will automatically derive network state and repairs from a set of specified correctness requirements and security policies. ASF consists of a set of techniques for performing and integrating security and resilience analyses applied at different layers (i.e., data forwarding, network control, programming language, and application software) in a real-time and automated fashion. The ASF approach is exciting because developing it adds to the theoretical underpinnings of SoS, while using it supports the practice of SoS.
Led by Principal Investigator (PI) Matt Caesar and Co-PI Dong (Kevin) Jin, researchers continued the transfer of technology to industry through interactions with Veriflow and VMWare. Veriflow is a startup company commercializing verification technology that came out of this project's SoS Lablet funding. Recent collaborations targeted the enhancement of the verification technology to operate on real-time traffic data, as well as the development of a "high-speed" variant of the approach that can perform verification and quickly answer queries on large environments while requiring only small footprints in terms of memory and CPU. The researchers worked on approaches to parallelize computations, making them amenable to deployment across clouds and other decentralized environments, and the team is in the process of conducting performance evaluation.
Researchers studied the interdependence between the power system and the communication network to improve resilience in critical energy infrastructures, which addressed the resilient architecture hard problem and proposed a two-layer distribution system model with both power and communication components. Based on the model, the researchers formulated the Distribution Service Restoration (DSR) process as a routing problem and developed a simulation-based method to quantitatively evaluate the DSR process on large-scale power systems (e.g., IEEE 123-node system and Ckt-7 system). The experimental results show that this method improves the total restored energy up to 57.6% and reduces the recovery time up to 63% by considering the power-communication interdependency. The team also added a dysfunctional switch model to ensure system stability under new operational constraints (such as voltage and capacity limits), and formulated a stochastic optimization model to address communication uncertainties.
The researchers developed a testing platform for cyber-physical system resilience and security evaluation. The platform consists of a container-based network emulator, network/power system simulator, and real hardware for rapid prototyping of network applications with high fidelity and scalable testing environment. To overcome the statistical error in virtual time advancement within the platform due to process waiting time, including disk I/O time, network I/O time, and GPU computational time, the team formulated an analytical model of virtual time, proposed a time compensation mechanism, and implemented it in the Linux kernel to precisely control time advancement by considering the non-CPU task waiting time. They then conducted extensive experiments for error analysis and system evaluation.
Researchers developed a general and interpretable framework for analyzing PMU data in real-time; the proposed framework enables grid operators to understand changes to the current state and to identify anomalies in the PMU measurement data. They first learned an effective dynamical model to describe the current behavior of the system by applying statistical learning tools on the streaming PMU data and then used the probabilistic predictions of the learned model to principally define an efficient anomaly detection tool. Their framework produces real-time classification of the detected anomalies into common occurrence classes. They demonstrated the efficacy of their proposed framework through numerical experiments on real PMU data collected from a transmission operator.
Their collaboration with Boeing on constructing a resilient IoT platform for the battlefield continued. The team is exploring an approach that leverages deep learning to dynamically relocate drone-mounted access points to evade the adversary. They formulated a placement algorithm that leverages Model-Agnostic Machine Learning (MAML) to construct an algorithm resilient to an adversary attempting to disrupt the learning process. Researchers have developed new deep learning mechanisms that are resilient to data sets that are "constructed" by adversaries, and early simulation results show benefits to these approaches in practical settings.
The team has also developed a design and evaluation framework for a self-driving "service provider infrastructure" that leverages prior work on verification and synthesis to automatically self-configure to become resilient to attacks. The initial focus is on network and container orchestration systems, and the first implementation targets Kubernetes. The platform leverages Artificial Intelligence (AI) planning algorithms to synthesize steps the system needs to take to protect itself against incoming attacks from an intelligent adversary.
Background on this project can be found here.