Probabilistic model checking is a useful technique for specifying and verifying properties of stochastic systems including randomized protocols and reinforcement learning models. However, these methods rely on the assumed structure and probabilities of certain system transitions. These assumptions may be incorrect, and may even be violated by an adversary who gains control of some system components. In this paper, we develop a formal framework for adversarial robustness in systems modeled as discrete time Markov chains (DTMCs). We base our framework on existing methods for verifying probabilistic temporal logic properties and extend it to include deterministic, memoryless policies acting in Markov decision processes (MDPs). Our framework includes a flexible approach for specifying structure-preserving and non structure-preserving adversarial models. We outline a class of threat models under which adversaries can perturb system transitions, constrained by an ε ball around the original transition probabilities. We define three main DTMC adversarial robustness problems: adversarial robustness verification, maximal δ synthesis, and worst case attack synthesis. We present two optimization-based solutions to these three problems, leveraging traditional and parametric probabilistic model checking techniques. We then evaluate our solutions on two stochastic protocols and a collection of Grid World case studies, which model an agent acting in an environment described as an MDP. We find that the parametric solution results in fast computation for small parameter spaces. In the case of less restrictive (stronger) adversaries, the number of parameters increases, and directly computing property satisfaction probabilities is more scalable. We demonstrate the usefulness of our definitions and solutions by comparing system outcomes over various properties, threat models, and case studies.
Authored by Lisa Oakley, Alina Oprea, Stavros Tripakis
Volumetric Distributed Denial of Service attacks forcefully disrupt the availability of online services by congesting network links with arbitrary high-volume traffic. This brute force approach has collateral impact on the upstream network infrastructure, making early attack traffic removal a key objective. To reduce infrastructure load and maintain service availability, we introduce ReCEIF, a topology-independent mitigation strategy for early, rule-based ingress filtering leveraging deep reinforcement learning. ReCEIF utilizes hierarchical heavy hitters to monitor traffic distribution and detect subnets that are sending high-volume traffic. Deep reinforcement learning subsequently serves to refine hierarchical heavy hitters into effective filter rules that can be propagated upstream to discard traffic originating from attacking systems. Evaluating all filter rules requires only a single clock cycle when utilizing fast ternary content-addressable memory, which is commonly available in software defined networks. To outline the effectiveness of our approach, we conduct a comparative evaluation to reinforcement learning-based router throttling.
Authored by Hauke Heseding, Martina Zitterbart
In case of deploying additional network security equipment in a new location, network service providers face difficulties such as precise management of large number of network security equipment and expensive network operation costs. Accordingly, there is a need for a method for security-aware network service provisioning using the existing network security equipment. In order to solve this problem, there is an existing reinforcement learning-based routing decision method fixed for each node. This method performs repeatedly until a routing decision satisfying end-to-end security constraints is achieved. This generates a disadvantage of longer network service provisioning time. In this paper, we propose security constraints reinforcement learning based routing (SCRR) algorithm that generates routing decisions, which satisfies end-to-end security constraints by giving conditional reward values according to the agent state-action pairs when performing reinforcement learning.
Authored by Hyeonjun Jo, Kyungbaek Kim
Trip planning, which targets at planning a trip consisting of several ordered Points of Interest (POIs) under user-provided constraints, has long been treated as an important application for location-based services. The goal of trip planning is to maximize the chance that the users will follow the planned trip while it is difficult to directly quantify and optimize the chance. Conventional methods either leverage statistical analysis to rank POIs to form a trip or generate trips following pre-defined objectives based on constraint programming to bypass such a problem. However, these methods may fail to reflect the complex latent patterns hidden in the human mobility data. On the other hand, though there are a few deep learning-based trip recommendation methods, these methods still cannot handle the time budget constraint so far. To this end, we propose a TIme-aware Neural Trip Planning (TINT) framework to tackle the above challenges. First of all, we devise a novel attention-based encoder-decoder trip generator that can learn the correlations among POIs and generate trips under given constraints. Then, we propose a specially-designed reinforcement learning (RL) paradigm to directly optimize the objective to obtain an optimal trip generator. For this purpose, we introduce a discriminator, which distinguishes the generated trips from real-life trips taken by users, to provide reward signals to optimize the generator. Subsequently, to ensure the feedback from the discriminator is always instructive, we integrate an adversarial learning strategy into the RL paradigm to update the trip generator and the discriminator alternately. Moreover, we devise a novel pre-training schema to speed up the convergence for an efficient training process. Extensive experiments on four real-world datasets validate the effectiveness and efficiency of our framework, which shows that TINT could remarkably outperform the state-of-the-art baselines within short response time.
Authored by Linlang Jiang, Jingbo Zhou, Tong Xu, Yanyan Li, Hao Chen, Dejing Dou
A botnet is a new type of attack method developed and integrated on the basis of traditional malicious code such as network worms and backdoor tools, and it is extremely threatening. This course combines deep learning and neural network methods in machine learning methods to detect and classify the existence of botnets. This sample does not rely on any prior features, the final multi-class classification accuracy rate is higher than 98.7%, the effect is significant.
Authored by Xiaoran Yang, Zhen Guo, Zetian Mai
The robustness of supply chain networks (SCNs) against sequential topology attacks is significant for maintaining firm relationships and activities. Although SCNs have experienced many emergencies demonstrating that mixed failures exacerbate the impact of cascading failures, existing studies of sequential attacks rarely consider the influence of mixed failure modes on cascading failures. In this paper, a reinforcement learning (RL)-based sequential attack strategy is applied to SCNs with cascading failures that consider mixed failure modes. To solve the large state space search problem in SCNs, a deep Q-network (DQN) optimization framework combining deep neural networks (DNNs) and RL is proposed to extract features of state space. Then, it is compared with the traditional random-based, degree-based, and load-based sequential attack strategies. Simulation results on Barabasi-Albert (BA), Erdos-Renyi (ER), and Watts-Strogatz (WS) networks show that the proposed RL-based sequential attack strategy outperforms three existing sequential attack strategies. It can trigger cascading failures with greater influence. This work provides insights for effectively reducing failure propagation and improving the robustness of SCNs.
Authored by Lei Zhang, Jian Zhou, Yizhong Ma, Lijuan Shen
The paper aims to discover vulnerabilities by application of supervisory control theory and to design a defensive supervisor against vulnerability attacks. Supervisory control restricts the system behavior to satisfy the control specifications. The existence condition of the supervisor, sometimes results in undesirable plant behavior, which can be regarded as a vulnerability of the control specifications. We aim to design a more robust supervisor against this vulnerability.
Authored by Kanta Ogawa, Kenji Sawada, Kosei Sakata
Cyber threats can cause severe damage to computing infrastructure and systems as well as data breaches that make sensitive data vulnerable to attackers and adversaries. It is therefore imperative to discover those threats and stop them before bad actors penetrating into the information systems.Threats hunting algorithms based on machine learning have shown great advantage over classical methods. Reinforcement learning models are getting more accurate for identifying not only signature-based but also behavior-based threats. Quantum mechanics brings a new dimension in improving classification speed with exponential advantage. The accuracy of the AI/ML algorithms could be affected by many factors, from algorithm, data, to prejudicial, or even intentional. As a result, AI/ML applications need to be non-biased and trustworthy.In this research, we developed a machine learning-based cyber threat detection and assessment tool. It uses two-stage (both unsupervised and supervised learning) analyzing method on 822,226 log data recorded from a web server on AWS cloud. The results show the algorithm has the ability to identify the threats with high confidence.
Authored by Shuangbao Wang, Md Arafin, Onyema Osuagwu, Ketchiozo Wandji
Cyber threats have been a major issue in the cyber security domain. Every hacker follows a series of cyber-attack stages known as cyber kill chain stages. Each stage has its norms and limitations to be deployed. For a decade, researchers have focused on detecting these attacks. Merely watcher tools are not optimal solutions anymore. Everything is becoming autonomous in the computer science field. This leads to the idea of an Autonomous Cyber Resilience Defense algorithm design in this work. Resilience has two aspects: Response and Recovery. Response requires some actions to be performed to mitigate attacks. Recovery is patching the flawed code or back door vulnerability. Both aspects were performed by human assistance in the cybersecurity defense field. This work aims to develop an algorithm based on Reinforcement Learning (RL) with a Convoluted Neural Network (CNN), far nearer to the human learning process for malware images. RL learns through a reward mechanism against every performed attack. Every action has some kind of output that can be classified into positive or negative rewards. To enhance its thinking process Markov Decision Process (MDP) will be mitigated with this RL approach. RL impact and induction measures for malware images were measured and performed to get optimal results. Based on the Malimg Image malware, dataset successful automation actions are received. The proposed work has shown 98% accuracy in the classification, detection, and autonomous resilience actions deployment.
Authored by Kainat Rizwan, Mudassar Ahmad, Muhammad Habib
Video summarization aims to improve the efficiency of large-scale video browsing through producting concise summaries. It has been popular among many scenarios such as video surveillance, video review and data annotation. Traditional video summarization techniques focus on filtration in image features dimension or image semantics dimension. However, such techniques can make a large amount of possible useful information lost, especially for many videos with rich text semantics like interviews, teaching videos, in that only the information relevant to the image dimension will be retained. In order to solve the above problem, this paper considers video summarization as a continuous multi-dimensional decision-making process. Specifically, the summarization model predicts a probability for each frame and its corresponding text, and then we designs reward methods for each of them. Finally, comprehensive summaries in two dimensions, i.e. images and semantics, is generated. This approach is not only unsupervised and does not rely on labels and user interaction, but also decouples the semantic and image summarization models to provide more usable interfaces for subsequent engineering use.
Authored by Haoran Sun, Xiaolong Zhu, Conghua Zhou
Axie infinity is a complicated card game with a huge-scale action space. This makes it difficult to solve this challenge using generic Reinforcement Learning (RL) algorithms. We propose a hybrid RL framework to learn action representations and game strategies. To avoid evaluating every action in the large feasible action set, our method evaluates actions in a fixed-size set which is determined using action representations. We compare the performance of our method with two baseline methods in terms of their sample efficiency and the winning rates of the trained models. We empirically show that our method achieves an overall best winning rate and the best sample efficiency among the three methods.
Authored by Zhiyuan Yao, Tianyu Shi, Site Li, Yiting Xie, Yuanyuan Qin, Xiongjie Xie, Huan Lu, Yan Zhang