Publications | Science of Security Virtual Organization

A System for Interactive Examination of Learned Security Policies

We present a system for interactive examination of learned security policies. It allows a user to traverse episodes of Markov decision processes in a controlled manner and to track the actions triggered by security policies. Similar to a software debugger, a user can continue or or halt an episode at any time step and inspect parameters and probability distributions of interest. The system enables insight into the structure of a given policy and in the behavior of a policy in edge cases. We demonstrate the system with a network intrusion use case. We examine the evolution of an IT infrastructure’s state and the actions prescribed by security policies while an attack occurs. The policies for the demonstration have been obtained through a reinforcement learning approach that includes a simulation system where policies are incrementally learned and an emulation system that produces statistics that drive the simulation runs.

Authored by Kim Hammar, Rolf Stadler

QoE-based Situational Awareness-Centric Decision Support for Network Video Surveillance

Control room video surveillance is an important source of information for ensuring public safety. To facilitate the process, a Decision-Support System (DSS) designed for the security task force is vital and necessary to take decisions rapidly using a sea of information. In case of mission critical operation, Situational Awareness (SA) which consists of knowing what is going on around you at any given time plays a crucial role across a variety of industries and should be placed at the center of our DSS. In our approach, SA system will take advantage of the human factor thanks to the reinforcement signal whereas previous work on this field focus on improving knowledge level of DSS at first and then, uses the human factor only for decision-making. In this paper, we propose a situational awareness-centric decision-support system framework for mission-critical operations driven by Quality of Experience (QoE). Our idea is inspired by the reinforcement learning feedback process which updates the environment understanding of our DSS. The feedback is injected by a QoE built on user perception. Our approach will allow our DSS to evolve according to the context with an up-to-date SA.

Authored by Abhishek Djeachandrane, Said Hoceini, Serge Delmas, Jean-Michel Duquerrois, Abdelhamid Mellouk

Adversarial Robustness Verification and Attack Synthesis in Stochastic Systems

Probabilistic model checking is a useful technique for specifying and verifying properties of stochastic systems including randomized protocols and reinforcement learning models. However, these methods rely on the assumed structure and probabilities of certain system transitions. These assumptions may be incorrect, and may even be violated by an adversary who gains control of some system components. In this paper, we develop a formal framework for adversarial robustness in systems modeled as discrete time Markov chains (DTMCs). We base our framework on existing methods for verifying probabilistic temporal logic properties and extend it to include deterministic, memoryless policies acting in Markov decision processes (MDPs). Our framework includes a flexible approach for specifying structure-preserving and non structure-preserving adversarial models. We outline a class of threat models under which adversaries can perturb system transitions, constrained by an ε ball around the original transition probabilities. We define three main DTMC adversarial robustness problems: adversarial robustness verification, maximal δ synthesis, and worst case attack synthesis. We present two optimization-based solutions to these three problems, leveraging traditional and parametric probabilistic model checking techniques. We then evaluate our solutions on two stochastic protocols and a collection of Grid World case studies, which model an agent acting in an environment described as an MDP. We find that the parametric solution results in fast computation for small parameter spaces. In the case of less restrictive (stronger) adversaries, the number of parameters increases, and directly computing property satisfaction probabilities is more scalable. We demonstrate the usefulness of our definitions and solutions by comparing system outcomes over various properties, threat models, and case studies.

Authored by Lisa Oakley, Alina Oprea, Stavros Tripakis

ReCEIF: Reinforcement Learning-Controlled Effective Ingress Filtering

Volumetric Distributed Denial of Service attacks forcefully disrupt the availability of online services by congesting network links with arbitrary high-volume traffic. This brute force approach has collateral impact on the upstream network infrastructure, making early attack traffic removal a key objective. To reduce infrastructure load and maintain service availability, we introduce ReCEIF, a topology-independent mitigation strategy for early, rule-based ingress filtering leveraging deep reinforcement learning. ReCEIF utilizes hierarchical heavy hitters to monitor traffic distribution and detect subnets that are sending high-volume traffic. Deep reinforcement learning subsequently serves to refine hierarchical heavy hitters into effective filter rules that can be propagated upstream to discard traffic originating from attacking systems. Evaluating all filter rules requires only a single clock cycle when utilizing fast ternary content-addressable memory, which is commonly available in software defined networks. To outline the effectiveness of our approach, we conduct a comparative evaluation to reinforcement learning-based router throttling.

Authored by Hauke Heseding, Martina Zitterbart

Security Service-aware Reinforcement Learning for Efficient Network Service Provisioning

In case of deploying additional network security equipment in a new location, network service providers face difficulties such as precise management of large number of network security equipment and expensive network operation costs. Accordingly, there is a need for a method for security-aware network service provisioning using the existing network security equipment. In order to solve this problem, there is an existing reinforcement learning-based routing decision method fixed for each node. This method performs repeatedly until a routing decision satisfying end-to-end security constraints is achieved. This generates a disadvantage of longer network service provisioning time. In this paper, we propose security constraints reinforcement learning based routing (SCRR) algorithm that generates routing decisions, which satisfies end-to-end security constraints by giving conditional reward values according to the agent state-action pairs when performing reinforcement learning.

Authored by Hyeonjun Jo, Kyungbaek Kim

Time-aware Neural Trip Planning Reinforced by Human Mobility

Trip planning, which targets at planning a trip consisting of several ordered Points of Interest (POIs) under user-provided constraints, has long been treated as an important application for location-based services. The goal of trip planning is to maximize the chance that the users will follow the planned trip while it is difficult to directly quantify and optimize the chance. Conventional methods either leverage statistical analysis to rank POIs to form a trip or generate trips following pre-defined objectives based on constraint programming to bypass such a problem. However, these methods may fail to reflect the complex latent patterns hidden in the human mobility data. On the other hand, though there are a few deep learning-based trip recommendation methods, these methods still cannot handle the time budget constraint so far. To this end, we propose a TIme-aware Neural Trip Planning (TINT) framework to tackle the above challenges. First of all, we devise a novel attention-based encoder-decoder trip generator that can learn the correlations among POIs and generate trips under given constraints. Then, we propose a specially-designed reinforcement learning (RL) paradigm to directly optimize the objective to obtain an optimal trip generator. For this purpose, we introduce a discriminator, which distinguishes the generated trips from real-life trips taken by users, to provide reward signals to optimize the generator. Subsequently, to ensure the feedback from the discriminator is always instructive, we integrate an adversarial learning strategy into the RL paradigm to update the trip generator and the discriminator alternately. Moreover, we devise a novel pre-training schema to speed up the convergence for an efficient training process. Extensive experiments on four real-world datasets validate the effectiveness and efficiency of our framework, which shows that TINT could remarkably outperform the state-of-the-art baselines within short response time.

Authored by Linlang Jiang, Jingbo Zhou, Tong Xu, Yanyan Li, Hao Chen, Dejing Dou

Botnet Detection Based on Machine Learning

A botnet is a new type of attack method developed and integrated on the basis of traditional malicious code such as network worms and backdoor tools, and it is extremely threatening. This course combines deep learning and neural network methods in machine learning methods to detect and classify the existence of botnets. This sample does not rely on any prior features, the final multi-class classification accuracy rate is higher than 98.7%, the effect is significant.

Authored by Xiaoran Yang, Zhen Guo, Zetian Mai

Sequential Topology Attack of Supply Chain Networks Based on Reinforcement Learning

The robustness of supply chain networks (SCNs) against sequential topology attacks is significant for maintaining firm relationships and activities. Although SCNs have experienced many emergencies demonstrating that mixed failures exacerbate the impact of cascading failures, existing studies of sequential attacks rarely consider the influence of mixed failure modes on cascading failures. In this paper, a reinforcement learning (RL)-based sequential attack strategy is applied to SCNs with cascading failures that consider mixed failure modes. To solve the large state space search problem in SCNs, a deep Q-network (DQN) optimization framework combining deep neural networks (DNNs) and RL is proposed to extract features of state space. Then, it is compared with the traditional random-based, degree-based, and load-based sequential attack strategies. Simulation results on Barabasi-Albert (BA), Erdos-Renyi (ER), and Watts-Strogatz (WS) networks show that the proposed RL-based sequential attack strategy outperforms three existing sequential attack strategies. It can trigger cascading failures with greater influence. This work provides insights for effectively reducing failure propagation and improving the robustness of SCNs.

Authored by Lei Zhang, Jian Zhou, Yizhong Ma, Lijuan Shen

Vulnerability Modeling and Protection Strategies via Supervisory Control Theory

The paper aims to discover vulnerabilities by application of supervisory control theory and to design a defensive supervisor against vulnerability attacks. Supervisory control restricts the system behavior to satisfy the control specifications. The existence condition of the supervisor, sometimes results in undesirable plant behavior, which can be regarded as a vulnerability of the control specifications. We aim to design a more robust supervisor against this vulnerability.

Authored by Kanta Ogawa, Kenji Sawada, Kosei Sakata

Cyber Threat Analysis and Trustworthy Artificial Intelligence

Cyber threats can cause severe damage to computing infrastructure and systems as well as data breaches that make sensitive data vulnerable to attackers and adversaries. It is therefore imperative to discover those threats and stop them before bad actors penetrating into the information systems.Threats hunting algorithms based on machine learning have shown great advantage over classical methods. Reinforcement learning models are getting more accurate for identifying not only signature-based but also behavior-based threats. Quantum mechanics brings a new dimension in improving classification speed with exponential advantage. The accuracy of the AI/ML algorithms could be affected by many factors, from algorithm, data, to prejudicial, or even intentional. As a result, AI/ML applications need to be non-biased and trustworthy.In this research, we developed a machine learning-based cyber threat detection and assessment tool. It uses two-stage (both unsupervised and supervised learning) analyzing method on 822,226 log data recorded from a web server on AWS cloud. The results show the algorithm has the ability to identify the threats with high confidence.

Authored by Shuangbao Wang, Md Arafin, Onyema Osuagwu, Ketchiozo Wandji

Cyber Automated Network Resilience Defensive Approach against Malware Images

Cyber threats have been a major issue in the cyber security domain. Every hacker follows a series of cyber-attack stages known as cyber kill chain stages. Each stage has its norms and limitations to be deployed. For a decade, researchers have focused on detecting these attacks. Merely watcher tools are not optimal solutions anymore. Everything is becoming autonomous in the computer science field. This leads to the idea of an Autonomous Cyber Resilience Defense algorithm design in this work. Resilience has two aspects: Response and Recovery. Response requires some actions to be performed to mitigate attacks. Recovery is patching the flawed code or back door vulnerability. Both aspects were performed by human assistance in the cybersecurity defense field. This work aims to develop an algorithm based on Reinforcement Learning (RL) with a Convoluted Neural Network (CNN), far nearer to the human learning process for malware images. RL learns through a reward mechanism against every performed attack. Every action has some kind of output that can be classified into positive or negative rewards. To enhance its thinking process Markov Decision Process (MDP) will be mitigated with this RL approach. RL impact and induction measures for malware images were measured and performed to get optimal results. Based on the Malimg Image malware, dataset successful automation actions are received. The proposed work has shown 98% accuracy in the classification, detection, and autonomous resilience actions deployment.

Authored by Kainat Rizwan, Mudassar Ahmad, Muhammad Habib

Deep Reinforcement Learning for Video Summarization with Semantic Reward

Video summarization aims to improve the efficiency of large-scale video browsing through producting concise summaries. It has been popular among many scenarios such as video surveillance, video review and data annotation. Traditional video summarization techniques focus on filtration in image features dimension or image semantics dimension. However, such techniques can make a large amount of possible useful information lost, especially for many videos with rich text semantics like interviews, teaching videos, in that only the information relevant to the image dimension will be retained. In order to solve the above problem, this paper considers video summarization as a continuous multi-dimensional decision-making process. Specifically, the summarization model predicts a probability for each frame and its corresponding text, and then we designs reward methods for each of them. Finally, comprehensive summaries in two dimensions, i.e. images and semantics, is generated. This approach is not only unsupervised and does not rely on labels and user interaction, but also decouples the semantic and image summarization models to provide more usable interfaces for subsequent engineering use.

Authored by Haoran Sun, Xiaolong Zhu, Conghua Zhou

Towards Modern Card Games with Large-Scale Action Spaces Through Action Representation

Axie infinity is a complicated card game with a huge-scale action space. This makes it difficult to solve this challenge using generic Reinforcement Learning (RL) algorithms. We propose a hybrid RL framework to learn action representations and game strategies. To avoid evaluating every action in the large feasible action set, our method evaluates actions in a fixed-size set which is determined using action representations. We compare the performance of our method with two baseline methods in terms of their sample efficiency and the winning rates of the trained models. We empirically show that our method achieves an overall best winning rate and the best sample efficiency among the three methods.

Authored by Zhiyuan Yao, Tianyu Shi, Site Li, Yiting Xie, Yuanyuan Qin, Xiongjie Xie, Huan Lu, Yan Zhang