Publications | Science of Security Virtual Organization

Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers

Graph-based Semi-Supervised Learning (GSSL) is a practical solution to learn from a limited amount of labelled data together with a vast amount of unlabelled data. However, due to their reliance on the known labels to infer the unknown labels, these algorithms are sensitive to data quality. It is therefore essential to study the potential threats related to the labelled data, more specifically, label poisoning. In this paper, we propose a novel data poisoning method which efficiently approximates the result of label inference to identify the inputs which, if poisoned, would produce the highest number of incorrectly inferred labels. We extensively evaluate our approach on three classification problems under 24 different experimental settings each. Compared to the state of the art, our influence-driven attack produces an average increase of error rate 50% higher, while being faster by multiple orders of magnitude. Moreover, our method can inform engineers of inputs that deserve investigation (relabelling them) before training the learning model. We show that relabelling one-third of the poisoned inputs (selected based on their influence) reduces the poisoning effect by 50%. ACM Reference Format: Adriano Franci, Maxime Cordy, Martin Gubri, Mike Papadakis, and Yves Le Traon. 2022. Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers. In 1st Conference on AI Engineering - Software Engineering for AI (CAIN’22), May 16–24, 2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3522664.3528606

Authored by Adriano Franci, Maxime Cordy, Martin Gubri, Mike Papadakis, Yves Le Traon

Poisoning Attack against Online Regression Learning with Maximum Loss for Edge Intelligence

Recent trends in the convergence of edge computing and artificial intelligence (AI) have led to a new paradigm of “edge intelligence”, which are more vulnerable to attack such as data and model poisoning and evasion of attacks. This paper proposes a white-box poisoning attack against online regression model for edge intelligence environment, which aim to prepare the protection methods in the future. Firstly, the new method selects data points from original stream with maximum loss by two selection strategies; Secondly, it pollutes these points with gradient ascent strategy. At last, it injects polluted points into original stream being sent to target model to complete the attack process. We extensively evaluate our proposed attack on open dataset, the results of which demonstrate the effectiveness of the novel attack method and the real implications of poisoning attack in a case study electric energy prediction application.

Authored by Yanxu Zhu, Hong Wen, Peng Zhang, Wen Han, Fan Sun, Jia Jia

A Survey on Data Poisoning Attacks and Defenses

With the widespread deployment of data-driven services, the demand for data volumes continues to grow. At present, many applications lack reliable human supervision in the process of data collection, which makes the collected data contain low-quality data or even malicious data. This low-quality or malicious data make AI systems potentially face much security challenges. One of the main security threats in the training phase of machine learning is data poisoning attacks, which compromise model integrity by contaminating training data to make the resulting model skewed or unusable. This paper reviews the relevant researches on data poisoning attacks in various task environments: first, the classification of attacks is summarized, then the defense methods of data poisoning attacks are sorted out, and finally, the possible research directions in the prospect.

Authored by Jiaxin Fan, Qi Yan, Mohan Li, Guanqun Qu, Yang Xiao

FIBA: Frequency-Injection based Backdoor Attack in Medical Image Analysis

In recent years, the security of AI systems has drawn increasing research attention, especially in the medical imaging realm. To develop a secure medical image analysis (MIA) system, it is a must to study possible backdoor attacks (BAs), which can embed hidden malicious behaviors into the system. However, designing a unified BA method that can be applied to various MIA systems is challenging due to the diversity of imaging modalities (e.g., X-Ray, CT, and MRI) and analysis tasks (e.g., classification, detection, and segmentation). Most existing BA methods are designed to attack natural image classification models, which apply spatial triggers to training images and inevitably corrupt the semantics of poisoned pixels, leading to the failures of attacking dense prediction models. To address this issue, we propose a novel Frequency-Injection based Backdoor Attack method (FIBA) that is capable of delivering attacks in various MIA tasks. Specifically, FIBA leverages a trigger function in the frequency domain that can inject the low-frequency information of a trigger image into the poisoned image by linearly combining the spectral amplitude of both images. Since it preserves the semantics of the poisoned image pixels, FIBA can perform attacks on both classification and dense prediction models. Experiments on three benchmarks in MIA (i.e., ISIC-2019 [4] for skin lesion classification, KiTS-19 [17] for kidney tumor segmentation, and EAD-2019 [1] for endoscopic artifact detection), validate the effectiveness of FIBA and its superiority over stateof-the-art methods in attacking MIA models and bypassing backdoor defense. Source code will be available at code.

Authored by Yu Feng, Benteng Ma, Jing Zhang, Shanshan Zhao, Yong Xia, Dacheng Tao

Detection and Mitigation of Targeted Data Poisoning Attacks in Federated Learning

Federated learning (FL) has emerged as a promising paradigm for distributed training of machine learning models. In FL, several participants train a global model collaboratively by only sharing model parameter updates while keeping their training data local. However, FL was recently shown to be vulnerable to data poisoning attacks, in which malicious participants send parameter updates derived from poisoned training data. In this paper, we focus on defending against targeted data poisoning attacks, where the attacker’s goal is to make the model misbehave for a small subset of classes while the rest of the model is relatively unaffected. To defend against such attacks, we first propose a method called MAPPS for separating malicious updates from benign ones. Using MAPPS, we propose three methods for attack detection: MAPPS + X-Means, MAPPS + VAT, and their Ensemble. Then, we propose an attack mitigation approach in which a "clean" model (i.e., a model that is not negatively impacted by an attack) can be trained despite the existence of a poisoning attempt. We empirically evaluate all of our methods using popular image classification datasets. Results show that we can achieve \textgreater 95% true positive rates while incurring only \textless 2% false positive rate. Furthermore, the clean models that are trained using our proposed methods have accuracy comparable to models trained in an attack-free scenario.

Authored by Pinar Erbil, Emre Gursoy

Robust and Resilient Federated Learning for Securing Future Networks

Machine Learning (ML) and Artificial Intelligence (AI) techniques are widely adopted in the telecommunication industry, especially to automate beyond 5G networks. Federated Learning (FL) recently emerged as a distributed ML approach that enables localized model training to keep data decentralized to ensure data privacy. In this paper, we identify the applicability of FL for securing future networks and its limitations due to the vulnerability to poisoning attacks. First, we investigate the shortcomings of state-of-the-art security algorithms for FL and perform an attack to circumvent FoolsGold algorithm, which is known as one of the most promising defense techniques currently available. The attack is launched with the addition of intelligent noise at the poisonous model updates. Then we propose a more sophisticated defense strategy, a threshold-based clustering mechanism to complement FoolsGold. Moreover, we provide a comprehensive analysis of the impact of the attack scenario and the performance of the defense mechanism.

Authored by Yushan Siriwardhana, Pawani Porambage, Madhusanka Liyanage, Mika Ylianttila

A Robust Framework for Adaptive Selection of Filter Ensembles to Detect Adversarial Inputs

Existing defense strategies against adversarial attacks (AAs) on AI/ML are primarily focused on examining the input data streams using a wide variety of filtering techniques. For instance, input filters are used to remove noisy, misleading, and out-of-class inputs along with a variety of attacks on learning systems. However, a single filter may not be able to detect all types of AAs. To address this issue, in the current work, we propose a robust, transferable, distribution-independent, and cross-domain supported framework for selecting Adaptive Filter Ensembles (AFEs) to minimize the impact of data poisoning on learning systems. The optimal filter ensembles are determined through a Multi-Objective Bi-Level Programming Problem (MOBLPP) that provides a subset of diverse filter sequences, each exhibiting fair detection accuracy. The proposed framework of AFE is trained to model the pristine data distribution to identify the corrupted inputs and converges to the optimal AFE without vanishing gradients and mode collapses irrespective of input data distributions. We presented preliminary experiments to show the proposed defense outperforms the existing defenses in terms of robustness and accuracy.

Authored by Arunava Roy, Dipankar Dasgupta

Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a particular trigger. Several works attempt to detect whether a given DNN has been injected with a specific trigger during the training. In a parallel line of research, the lottery ticket hypothesis reveals the existence of sparse sub-networks which are capable of reaching competitive performance as the dense network after independent training. Connecting these two dots, we investigate the problem of Trojan DNN detection from the brand new lens of sparsity, even when no clean training data is available. Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a “winning Trojan lottery ticket” which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated sub-network. Extensive experiments on various datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet, with different network architectures, i.e., VGG-16, ResNet-18, ResNet-20s, and DenseNet-100 demonstrate the effectiveness of our proposal. Codes are available at https://github.com/VITA-Group/Backdoor-LTH.

Authored by Tianlong Chen, Zhenyu Zhang, Yihua Zhang, Shiyu Chang, Sijia Liu, Zhangyang Wang

Remote Disaster Recovery and Backup of Rehabilitation Medical Archives Information System Construction under the Background of Big Data

Realize the same-city and remote disaster recovery of the infectious disease network direct reporting system of the China Medical Archives Information Center. Method: A three-tier B/S/DBMS architecture is used in the disaster recovery center to deploy an infectious disease network direct reporting system, and realize data-level disaster recovery through remote replication technology; realize application-level disaster recovery of key business systems through asynchronous data technology; through asynchronous the mode carries on the network direct report system disaster tolerance data transmission of medical files. The establishment of disaster recovery centers in different cities in the same city ensures the direct reporting system and data security of infectious diseases, and ensures the effective progress of continuity work. The results show that the efficiency of remote disaster recovery and backup based on big data has increased by 9.2%

Authored by Yingjue Wang, Lei Gong, Min Zhang

Design and Implementation of a Software Disaster Recovery Service for Cloud Computing-Based Aerospace Ground Systems

The data centers of cloud computing-based aerospace ground systems and the businesses running on them are extremely vulnerable to man-made disasters, emergencies, and other disasters, which means security is seriously threatened. Thus, cloud centers need to provide effective disaster recovery services for software and data. However, the disaster recovery methods for current cloud centers of aerospace ground systems have long been in arrears, and the disaster tolerance and anti-destruction capability are weak. Aiming at the above problems, in this paper we design a disaster recovery service for aerospace ground systems based on cloud computing. On account of the software warehouse, this service adopts the main standby mode to achieve the backup, local disaster recovery, and remote disaster recovery of software and data. As a result, this service can timely response to the disasters, ensure the continuous running of businesses, and improve the disaster tolerance and anti-destruction capability of aerospace ground systems. Extensive simulation experiments validate the effectiveness of the disaster recovery service proposed in this paper.

Authored by Xiao Yu, Dong Wang, Xiaojuan Sun, Bingbing Zheng, Yankai Du

Reliability and Timeliness of Servicing Requests in Infocommunication Systems, Taking into Account the Physical and Information Recovery of Redundant Storage Devices

Markov models of reliability of fault-tolerant computer systems are proposed, taking into account two stages of recovery of redundant memory devices. At the first stage, the physical recovery of memory devices is implemented, and at the second, the informational one consists in entering the data necessary to perform the required functions. Memory redundancy is carried out to increase the stability of the system to the loss of unique data generated during the operation of the system. Data replication is implemented in all functional memory devices. Information recovery is carried out using replicas of data stored in working memory devices. The model takes into account the criticality of the system to the timeliness of calculations in real time and to the impossibility of restoring information after multiple memory failures, leading to the loss of all stored replicas of unique data. The system readiness coefficient and the probability of its transition to a non-recoverable state are determined. The readiness of the system for the timely execution of requests is evaluated, taking into account the influence of the shares of the distribution of the performance of the computer allocated for the maintenance of requests and for the entry of information into memory after its physical recovery.

Authored by Vladimir Bogatyrev, Stanislav Bogatyrev, Anatoly Bogatyrev

Secure Communication Protocol for Network-on-Chip with Authenticated Encryption and Recovery Mechanism

In recent times, Network-on-Chip (NoC) has become state of the art for communication in Multiprocessor System-on-Chip due to the existing scalability issues in this area. However, these systems are exposed to security threats such as extraction of secret information. Therefore, the need for secure communication arises in such environments. In this work, we present a communication protocol based on authenticated encryption with recovery mechanisms to establish secure end-to-end communication between the NoC nodes. In addition, a selected key agreement approach required for secure communication is implemented. The security functionality is located in the network adapter of each processing element. If data is tampered with or deleted during transmission, recovery mechanisms ensure that the corrupted data is retransmitted by the network adapter without the need of interference from the processing element. We simulated and implemented the complete system with SystemC TLM using the NoC simulation platform PANACA. Our results show that we can keep a high rate of correctly transmitted information even when attackers infiltrated the NoC system.

Authored by Julian Haase, Sebastian Jaster, Elke Franz, Diana Göhringer

The Influence of the Use of Fail-Safe Archives of Magnetic Media on the Reliability Indicators of Distributed Systems

A critical property of distributed data processing systems is the high level of reliability of such systems. A practical solution to this problem is to place copies of archives of magnetic media in the nodes of the system. These archives are used to restore data destroyed during the processing of requests to this data. The paper shows the impact of the use of archives on the reliability indicators of distributed systems.

Authored by Sergey Somov, Larisa Bogatyryova

Choosing the Discipline of Restoring Computer Systems with Acceptable Degradation with Consolidation of Node Resources Saved After Failures

An approach to substantiating the choice of a discipline for the maintenance of a redundant computer system, with the possible use of node resources saved after failures, is considered. The choice is aimed at improving the reliability and profitability of the system, taking into account the operational costs of restoring nodes. Models of reliability of systems with service disciplines are proposed, providing both the possibility of immediate recovery of nodes after failures, and allowing degradation of the system when using node resources stored after failures in it. The models take into account the conditions of the admissibility or inadmissibility of the loss of information accumulated during the operation of the system. The operating costs are determined, taking into account the costs of restoring nodes for the system maintenance disciplines under consideration

Authored by Vladimir Bogatyrev, Stanislav Bogatyrev, Anatoly Bogatyrev

Research on Cooperative Black-Start Strategy of Internal and External Power Supply in the Large Power Grid

At present, the black-start mode of the large power grid is mostly limited to relying on the black-start power supply inside the system, or only to the recovery mode that regards the transmission power of tie lines between systems as the black-start power supply. The starting power supply involved in the situation of the large power outage is incomplete and it is difficult to give full play to the respective advantages of internal and external power sources. In this paper, a method of coordinated black-start of large power grid internal and external power sources is proposed by combining the two modes. Firstly, the black-start capability evaluation system is built to screen out the internal black-start power supply, and the external black-start power supply is determined by analyzing the connection relationship between the systems. Then, based on the specific implementation principles, the black-start power supply coordination strategy is formulated by using the Dijkstra shortest path algorithm. Based on the condensation idea, the black-start zoning and path optimization method applicable to this strategy is proposed. Finally, the black-start security verification and corresponding control measures are adopted to obtain a scheme of black-start cooperation between internal and external power sources in the large power grid. The above method is applied in a real large power grid and compared with the conventional restoration strategy to verify the feasibility and efficiency of this method.

Authored by Liang Guili, Zhang Dongying, Wang Wei, Gong Cheng, Cui Duo, Tian Yichun, Wang Yan

Tolerating Resource Exhaustion Attacks in the Time-Triggered Architecture

The Time-Triggered Architecture (TTA) presents a blueprint for building safe and real-time constrained distributed systems, based on a set of orthogonal concepts that make extensive use of the availability of a globally consistent notion of time and a priori knowledge of events. Although the TTA tolerates arbitrary failures of any of its nodes by architectural means (active node replication, a membership service, and bus guardians), the design of these means considers only accidental faults. However, distributed safety- and real-time critical systems have been emerging into more open and interconnected systems, operating autonomously for prolonged times and interfacing with other possibly non-real-time systems. Therefore, the existence of vulnerabilities that adversaries may exploit to compromise system safety cannot be ruled out. In this paper, we discuss potential targeted attacks capable of bypassing TTA's fault-tolerance mechanisms and demonstrate how two well-known recovery techniques - proactive and reactive rejuvenation - can be incorporated into TTA to reduce the window of vulnerability for attacks without introducing extensive and costly changes.

Authored by Mohammad Alkoudsi, Gerhard Fohler, Marcus Völp

AlphaSOC: Reinforcement Learning-based Cybersecurity Automation for Cyber-Physical Systems

Achieving agile and resilient autonomous capabilities for cyber defense requires moving past indicators and situational awareness into automated response and recovery capabilities. The objective of the AlphaSOC project is to use state of the art sequential decision-making methods to automatically investigate and mitigate attacks on cyber physical systems (CPS). To demonstrate this, we developed a simulation environment that models the distributed navigation control system and physics of a large ship with two rudders and thrusters for propulsion. Defending this control network requires processing large volumes of cyber and physical signals to coordi-nate defensive actions over many devices with minimal disruption to nominal operation. We are developing a Reinforcement Learning (RL)-based approach to solve the resulting sequential decision-making problem that has large observation and action spaces.

Authored by Ryan Silva, Cameron Hickert, Nicolas Sarfaraz, Jeff Brush, Josh Silbermann, Tamim Sookoor

A Digital Twin Based Fault Location Method for Transmission Lines Using the Recovery Information of Instrument Transformers

The parameters of transmission line vary with environmental and operating conditions, thus the paper proposes a digital twin-based transmission line model. Based on synchrophasor measurements from phasor measurement units, the proposed model can use the maximum likelihood estimation (MLE) to reduce uncertainty between the digital twin and its physical counterpart. A case study has been conducted in the paper to present the influence of the uncertainty in the measurements on the digital twin for the transmission line and analyze the effectiveness of the MLE method. The results show that the proposed digital twin-based model is effective in reducing the influence of the uncertainty in the measurements and improving the fault location accuracy.

Authored by Han Zhang, Xiaoxiao Luo, Yongfu Li, Wenxia Sima, Ming Yang

IronMask: Versatile Verification of Masking Security

This paper introduces lronMask, a new versatile verification tool for masking security. lronMask is the first to offer the verification of standard simulation-based security notions in the probing model as well as recent composition and expandability notions in the random probing model. It supports any masking gadgets with linear randomness (e.g. addition, copy and refresh gadgets) as well as quadratic gadgets (e.g. multiplication gadgets) that might include non-linear randomness (e.g. by refreshing their inputs), while providing complete verification results for both types of gadgets. We achieve this complete verifiability by introducing a new algebraic characterization for such quadratic gadgets and exhibiting a complete method to determine the sets of input shares which are necessary and sufficient to perform a perfect simulation of any set of probes. We report various benchmarks which show that lronMask is competitive with state-of-the-art verification tools in the probing model (maskVerif, scVerif, SILVEH, matverif). lronMask is also several orders of magnitude faster than VHAPS -the only previous tool verifying random probing composability and expandability- as well as SILVEH -the only previous tool providing complete verification for quadratic gadgets with nonlinear randomness. Thanks to this completeness and increased performance, we obtain better bounds for the tolerated leakage probability of state-of-the-art random probing secure compilers.

Authored by Sonia Belaïd, Darius Mercadier, Matthieu Rivain, Abdul Taleb

Default: Mutual Information-based Crash Triage for Massive Crashes

With the considerable success achieved by modern fuzzing in-frastructures, more crashes are produced than ever before. To dig out the root cause, rapid and faithful crash triage for large numbers of crashes has always been attractive. However, hindered by the practical difficulty of reducing analysis imprecision without compromising efficiency, this goal has not been accomplished. In this paper, we present an end-to-end crash triage solution Default, for accurately and quickly pinpointing unique root cause from large numbers of crashes. In particular, we quantify the “crash relevance” of program entities based on mutual information, which serves as the criterion of unique crash bucketing and allows us to bucket massive crashes without pre-analyzing their root cause. The quantification of “crash relevance” is also used in the shortening of long crashing traces. On this basis, we use the interpretability of neural networks to precisely pinpoint the root cause in the shortened traces by evaluating each basic block's impact on the crash label. Evaluated with 20 programs with 22216 crashes in total, Default demonstrates remarkable accuracy and performance, which is way beyond what the state-of-the-art techniques can achieve: crash de-duplication was achieved at a super-fast processing speed - 0.017 seconds per crashing trace, without missing any unique bugs. After that, it identifies the root cause of 43 unique crashes with no false negatives and an average false positive rate of 9.2%.

Authored by Xing Zhang, Jiongyi Chen, Chao Feng, Ruilin Li, Wenrui Diao, Kehuan Zhang, Jing Lei, Chaojing Tang

PortSec: Securing Port Knocking System using Sequence Mechanism in SDN Environment

Port knocking provides an added layer of security on top of the existing security systems of a network. A predefined port knocking sequence is used to open the ports, which are closed by the firewall by default. The server determines the valid request if the knocking sequence is correct and opens the desired port. However, this sequence poses a security threat due to its static nature. This paper presents the port knock sequence-based communication protocol in the Software Defined network (SDN). It provides better management by separating the control plane and data plane. At the same time, it causes a communication overhead between the switches and the controller. To avoid this overhead, we are using the port knocking concept in the data plane without any involvement of the SDN controller. This study proposes three port knock sequence-based protocols (static, partial dynamic, and dynamic) in the data plane. To test the protocol in SDN environment, the P4 implementation of the underlying model is done in the BMV2 (behavioral model version 2) virtual switch. To check the security of the protocols, an informal security analysis is performed, which shows that the proposed protocols are secured to be implemented in the SDN data plane.

Authored by Isha Pali, Ruhul Amin

Analysis of Dynamic Host Control Protocol Implementation to Assess DoS Attacks

Dynamic Host Control Protocol (DHCP) is a protocol which provides IP addresses and network configuration parameters to the hosts present in the network. This protocol is deployed in small, medium, and large size organizations which removes the burden from network administrator to manually assign network parameters to every host in the network for establishing communication. Every vendor who plans to incorporate DHCP service in its device follows the working flow defined in Request for Comments (RFC). DHCP Starvation and DHCP Flooding attack are Denial of Service (DoS) attacks to prevents provision of IP addresses by DHCP. Port Security and DHCP snooping are built-in security features which prevents these DoS attacks. However, novel techniques have been devised to bypass these security features which uses ARP and ICMP protocol to perform the attack. The purpose of this research is to analyze implementation of DHCP in multiple devices to verify the involvement of both ARP and ICMP in the address acquisition process of DHCP as per RFC and to validate the results of prior research which assumes ARP or ICMP are used by default in all of devices.

Authored by Shameel Syed, Faheem Khuhawar, Shahnawaz Talpur, Aftab Memon, Miquel-Angel Luque-Nieto, Sanam Narejo

A Unique Deep Intrusion Detection Approach (UDIDA) for Detecting the Complex Attacks

Intrusion Detection System (IDS) is one of the applications to detect intrusions in the network. IDS aims to detect any malicious activities that protect the computer networks from unknown persons or users called attackers. Network security is one of the significant tasks that should provide secure data transfer. Virtualization of networks becomes more complex for IoT technology. Deep Learning (DL) is most widely used by many networks to detect the complex patterns. This is very suitable approaches for detecting the malicious nodes or attacks. Software-Defined Network (SDN) is the default virtualization computer network. Attackers are developing new technology to attack the networks. Many authors are trying to develop new technologies to attack the networks. To overcome these attacks new protocols are required to prevent these attacks. In this paper, a unique deep intrusion detection approach (UDIDA) is developed to detect the attacks in SDN. Performance shows that the proposed approach is achieved more accuracy than existing approaches.

Authored by Vamsi Krishna, Venkata Matta

Behaviour Analysis of Open-Source Firewalls Under Security Crisis

Nowadays, in this COVID era, work from home is quietly more preferred than work from the office. Due to this, the need for a firewall has been increased day by day. Every organization uses the firewall to secure their network and create VPN servers to allow their employees to work from home. Due to this, the security of the firewall plays a crucial role. In this paper, we have compared the two most popular open-source firewalls named pfSense and OPNSense. We have examined the security they provide by default without any other attachment. To do this, we performed four different attacks on the firewalls and compared the results. As a result, we have observed that both provide the same security still pfSense has a slight edge when an attacker tries to perform a Brute force attack over OPNSense.

Authored by Harsh Kiratsata, Deep Raval, Payal Viras, Punit Lalwani, Himanshu Patel, Panchal D.

A Secret-Free Hypervisor: Rethinking Isolation in the Age of Speculative Vulnerabilities

In recent years, the epidemic of speculative side channels significantly increases the difficulty in enforcing domain isolation boundaries in a virtualized cloud environment. Although mitigations exist, the approach taken by the industry is neither a long-term nor a scalable solution, as we target each vulnerability with specific mitigations that add up to substantial performance penalties. We propose a different approach to secret isolation: guaranteeing that the hypervisor is Secret-Free (SF). A Secret-Free design partitions memory into secrets and non-secrets and reconstructs hypervisor isolation. It enforces that all domains have a minimal and secret-free view of the address space. In contrast to state-of-the-art, a Secret-Free hypervisor does not identify secrets to be hidden, but instead identifies non-secrets that can be shared, and only grants access necessary for the current operation, an allow-list approach. SF designs function with existing hardware and do not exhibit noticeable performance penalties in production workloads versus the unmitigated baseline, and outperform state-of-the-art techniques by allowing speculative execution where secrets are invisible. We implement SF in Xen (a Type-I hypervisor) to demonstrate that the design applies well to a commercial hypervisor. Evaluation shows performance comparable to baseline and up to 37% improvement in certain hypervisor paths compared with Xen default mitigations. Further, we demonstrate Secret-Free is a generic kernel isolation infrastructure for a variety of systems, not limited to Type-I hypervisors. We apply the same model in Hyper-V (Type-I), bhyve (Type-II) and FreeBSD (UNIX kernel) to evaluate its applicability and effectiveness. The successful implementations on these systems prove the generality of SF, and reveal the specific adaptations and optimizations required for each type of kernel.

Authored by Hongyan Xia, David Zhang, Wei Liu, Istvan Haller, Bruce Sherwin, David Chisnall