The pervasive proliferation of digital technologies and interconnected systems has heightened the necessity for comprehensive cybersecurity measures in computer technological know-how. While deep gaining knowledge of (DL) has turn out to be a effective tool for bolstering security, its effectiveness is being examined via malicious hacking. Cybersecurity has end up an trouble of essential importance inside the cutting-edge virtual world. By making it feasible to become aware of and respond to threats in actual time, Deep Learning is a important issue of progressed security. Adversarial assaults, interpretability of models, and a lack of categorized statistics are all obstacles that want to be studied further with the intention to support DL-based totally security solutions. The protection and reliability of DL in our on-line world relies upon on being able to triumph over those boundaries. The present studies presents a unique method for strengthening DL-based totally cybersecurity, known as name dynamic adverse resilience for deep learning-based totally cybersecurity (DARDL-C). DARDL-C gives a dynamic and adaptable framework to counter antagonistic assaults by using combining adaptive neural community architectures with ensemble learning, real-time threat tracking, risk intelligence integration, explainable AI (XAI) for version interpretability, and reinforcement getting to know for adaptive defense techniques. The cause of this generation is to make DL fashions more secure and proof against the constantly transferring nature of online threats. The importance of simulation evaluation in determining DARDL-C s effectiveness in practical settings with out compromising genuine safety is important. Professionals and researchers can compare the efficacy and versatility of DARDL-C with the aid of simulating realistic threats in managed contexts. This gives precious insights into the machine s strengths and regions for improvement.
Authored by D. Poornima, A. Sheela, Shamreen Ahamed, P. Kathambari
As of 2024, the landscape of infrastructure Distributed Denial of Service (DDoS) attacks continues to evolve with increasing complexity and sophistication. These attacks are not only increasing in volume but also in their ability to evade traditional defenses due to advancements in AI, which enables adversaries to dynamically adapt their attack targets and tactics to maximize damage. The emergence of high-performance botnets utilizing virtual machines allows attackers to launch large-scale attacks with fewer resources. Consequently, defense strategies must adapt by integrating AI-driven anomaly detection and robust multi-layered defenses to keep pace with these evolving threats. In this paper, we introduce a novel deep reinforcement learning (DRL) framework for mitigating Infrastructure DDoS attacks. Our framework features an actor-critic-based DRL network, integrated with variational autoencoders (VAE) to improve learning efficiency and scalability. The VAE assesses the risk of each traffic flow by analyzing various traffic features, while the actor-critic networks use the current link load and the VAE-generated flow risk scores to determine the probability of DDoS mitigation actions, such as traffic limiting, redirecting, or sending puzzles to verify traffic sources. The puzzle inquiry results are fed back to the VAE to refine the risk assessment process.The key strengths of our framework are: (1) the VAE serves as an adaptive anomaly detector, evolving based on DRL agent actions instead of relying on static IDS rules that may quickly become outdated; (2) by separating traffic behavior characterization (handled by VAE) from action selection (handled by DRL), we significantly reduce the DRL state space, enhancing scalability; and (3) the dynamic collaboration between the DRL engine and the VAE allows for real-time adaptation to evolving attack patterns with high efficiency.We show the feasibility and effectiveness of the framework with various attack scenarios. Our approach uniquely integrates an actor-critic learning algorithm with the VAE to understand traffic flow properties and determine optimal actions through a continuous learning process. Our evaluation demonstrates that this framework effectively identifies attack traffic flows, achieving a true positive rate exceeding 95% and a false positive rate below 4%. Additionally, it learns the optimal strategy in a reasonable time, under 20,000 episodes in most experimental settings.
Authored by Qi Duan
Healthcare systems have recently utilized the Internet of Medical Things (IoMT) to assist intelligent data collection and decision-making. However, the volume of malicious threats, particularly new variants of malware attacks to the connected medical devices and their connected system, has risen significantly in recent years, which poses a critical threat to patients’ confidential data and the safety of the healthcare systems. To address the high complexity of conventional software-based detection techniques, Hardware-supported Malware Detection (HMD) has proved to be efficient for detecting malware at the processors’ micro-architecture level with the aid of Machine Learning (ML) techniques applied to Hardware Performance Counter (HPC) data. In this work, we examine the suitability of various standard ML classifiers for zero-day malware detection on new data streams in the real-world operation of IoMT devices and demonstrate that such methods are not capable of detecting unknown malware signatures with a high detection rate. In response, we propose a hybrid and adaptive image-based framework based on Deep Learning and Deep Reinforcement Learning (DRL) for online hardware-assisted zero-day malware detection in IoMT devices. Our proposed method dynamically selects the best DNN-based malware detector at run-time customized for each device from a pool of highly efficient models continuously trained on all stream data. It first converts tabular hardware-based data (HPC events) into small-size images and then leverages a transfer learning technique to retrain and enhance the Deep Neural Network (DNN) based model’s performance for unknown malware detection. Multiple DNN models are trained on various stream data continuously to form an inclusive model pool. Next, a DRL-based agent constructed with two Multi-Layer Perceptrons (MLPs) is trained (one acts as an Actor and another acts as a Critic) to align the decision of selecting the most optimal DNN model for highly accurate zero-day malware detection at run-time using a limited number of hardware events. The experimental results demonstrate that our proposed AI-enabled method achieves 99\% detection rate in both F1-score and AUC, with only 0.01\% false positive rate and 1\% false negative rate.
Authored by Zhangying He, Hossein Sayadi
The last decade has shown that networked cyber-physical systems (NCPS) are the future of critical infrastructure such as transportation systems and energy production. However, they have introduced an uncharted territory of security vulnerabilities and a wider attack surface, mainly due to network openness and the deeply integrated physical and cyber spaces. On the other hand, relying on manual analysis of intrusion detection alarms might be effective in stopping run-of-the-mill automated probes but remain useless against the growing number of targeted, persistent, and often AI-enabled attacks on large-scale NCPS. Hence, there is a pressing need for new research directions to provide advanced protection. This paper introduces a novel security paradigm for emerging NCPS, namely Autonomous Cyber-Physical Defense (ACPD). We lay out the theoretical foundations and describe the methods for building autonomous and stealthy cyber-physical defense agents that are able to dynamically hunt, detect, and respond to intelligent and sophisticated adversaries in real time without human intervention. By leveraging the power of game theory and multi-agent reinforcement learning, these self-learning agents will be able to deploy complex cyber-physical deception scenarios on the fly, generate optimal and adaptive security policies without prior knowledge of potential threats, and defend themselves against adversarial learning. Nonetheless, serious challenges including trustworthiness, scalability, and transfer learning are yet to be addressed for these autonomous agents to become the next-generation tools of cyber-physical defense.
Authored by Talal Halabi, Mohammad Zulkernine
Developing network intrusion detection systems (IDS) presents significant challenges due to the evolving nature of threats and the diverse range of network applications. Existing IDSs often struggle to detect dynamic attack patterns and covert attacks, leading to misidentified network vulnerabilities and degraded system performance. These requirements must be met via dependable, scalable, effective, and adaptable IDS designs. Our IDS can recognise and classify complex network threats by combining the Deep Q-Network (DQN) algorithm with distributed agents and attention techniques.. Our proposed distributed multi-agent IDS architecture has many advantages for guiding an all-encompassing security approach, including scalability, fault tolerance, and multi-view analysis. We conducted experiments using industry-standard datasets including NSL-KDD and CICIDS2017 to determine how well our model performed. The results show that our IDS outperforms others in terms of accuracy, precision, recall, F1-score, and false-positive rate. Additionally, we evaluated our model s resistance to black-box adversarial attacks, which are commonly used to take advantage of flaws in machine learning. Under these difficult circumstances, our model performed quite well.We used a denoising autoencoder (DAE) for further model strengthening to improve the IDS s robustness. Lastly, we evaluated the effectiveness of our zero-day defenses, which are designed to mitigate attacks exploiting unknown vulnerabilities. Through our research, we have developed an advanced IDS solution that addresses the limitations of traditional approaches. Our model demonstrates superior performance, robustness against adversarial attacks, and effective zero-day defenses. By combining deep reinforcement learning, distributed agents, attention techniques, and other enhancements, we provide a reliable and comprehensive solution for network security.
Authored by Malika Malik, Kamaljit Saini
The rise in autonomous Unmanned Aerial Vehicles (UAVs) for objectives requiring long-term navigation in diverse environments is attributed to their compact, agile, and accessible nature. Specifically, problems exploring dynamic obstacle and collision avoidance are of increasing interest as UAVs become more popular for tasks such as transportation of goods, formation control, and search and rescue routines. Prioritizing safety in the design of autonomous UAVs is crucial to prevent costly collisions that endanger pedestrians, mission success, and property. Safety must be ensured in these systems whose behavior emerges from multiple software components including learning-enabled components. Learning-enabled components, optimized through machine learning (ML) or reinforcement learning (RL) require adherence to safety constraints while interacting with the environment during training and deployment, as well as adaptation to new unknown environments. In this paper, we safeguard autonomous UAV navigation by designing agents based on behavior trees with learning-enabled components, referred to as Evolving Behavior Trees (EBTs). We learn the structure of EBTs with explicit safety components, optimize learning-enabled components with safe hierarchical RL, deploy, and update specific components for transfer to unknown environments. Safe and successful navigation is evaluated using a realistic UAV simulation environment. The results demonstrate the design of an explainable learned EBT structure, incurring near-zero collisions during training and deployment, with safe time-efficient transfer to an unknown environment.
Authored by Nicholas Potteiger, Xenofon Koutsoukos
Cyber threats have been a major issue in the cyber security domain. Every hacker follows a series of cyber-attack stages known as cyber kill chain stages. Each stage has its norms and limitations to be deployed. For a decade, researchers have focused on detecting these attacks. Merely watcher tools are not optimal solutions anymore. Everything is becoming autonomous in the computer science field. This leads to the idea of an Autonomous Cyber Resilience Defense algorithm design in this work. Resilience has two aspects: Response and Recovery. Response requires some actions to be performed to mitigate attacks. Recovery is patching the flawed code or back door vulnerability. Both aspects were performed by human assistance in the cybersecurity defense field. This work aims to develop an algorithm based on Reinforcement Learning (RL) with a Convoluted Neural Network (CNN), far nearer to the human learning process for malware images. RL learns through a reward mechanism against every performed attack. Every action has some kind of output that can be classified into positive or negative rewards. To enhance its thinking process Markov Decision Process (MDP) will be mitigated with this RL approach. RL impact and induction measures for malware images were measured and performed to get optimal results. Based on the Malimg Image malware, dataset successful automation actions are received. The proposed work has shown 98\% accuracy in the classification, detection, and autonomous resilience actions deployment.
Authored by Kainat Rizwan, Mudassar Ahmad, Muhammad Habib
The current research focuses on the physical security of UAV, while there are few studies on UAV information security. Moreover, the frequency of various security problems caused by UAV has been increasing in recent years, so research on UAV information security is urgent. In order to solve the high cost of UAV experiments, complex protocol types, and hidden security problems, we designe a UAV cyber range and analyze the attack and defense scenarios of three types of honeypot deployment. On this basis, we propose a UAV honeypot active defense strategy based on reinforcement learning. The active defense model of UAV honeypot is described of four dimensions: state, action, reward, and strategy. The simulation results show that the UAV honeypot strategy can maximize the capture of attacker data, which has important theoretical significance for the research of UAV information security.
Authored by Shangting Miao, Yang Li, Quan Pan
Dynamic Infrastructural Distributed Denial of Service (I-DDoS) attacks constantly change attack vectors to congest core backhaul links and disrupt critical network availability while evading end-system defenses. To effectively counter these highly dynamic attacks, defense mechanisms need to exhibit adaptive decision strategies for real-time mitigation. This paper presents a novel Autonomous DDoS Defense framework that employs model-based reinforcement agents. The framework continuously learns attack strategies, predicts attack actions, and dynamically determines the optimal composition of defense tactics such as filtering, limiting, and rerouting for flow diversion. Our contributions include extending the underlying formulation of the Markov Decision Process (MDP) to address simultaneous DDoS attack and defense behavior, and accounting for environmental uncertainties. We also propose a fine-grained action mitigation approach robust to classification inaccuracies in Intrusion Detection Systems (IDS). Additionally, our reinforcement learning model demonstrates resilience against evasion and deceptive attacks. Evaluation experiments using real-world and simulated DDoS traces demonstrate that our autonomous defense framework ensures the delivery of approximately 96 – 98% of benign traffic despite the diverse range of attack strategies.
Authored by Ashutosh Dutta, Ehab Al-Shaer, Samrat Chatterjee, Qi Duan
As vehicles increasingly embed digital systems, new security vulnerabilities are also being introduced. Computational constraints make it challenging to add security oversight layers on top of core vehicle systems, especially when the security layers rely on additional deep learning models for anomaly detection. To improve security-aware decision-making for autonomous vehicles (AV), this paper proposes a bi-level security framework. The first security level consists of a one-shot resource allocation game that enables a single vehicle to fend off an attacker by optimizing the configuration of its intrusion prevention system based on risk estimation. The second level relies on a reinforcement learning (RL) environment where an agent is responsible for forming and managing a platoon of vehicles on the fly while also dealing with a potential attacker. We solve the first problem using a minimax algorithm to identify optimal strategies for each player. Then, we train RL agents and analyze their performance in forming security-aware platoons. The trained agents demonstrate superior performance compared to our baseline strategies that do not consider security risk.
Authored by Dominic Phillips, Talal Halabi, Mohammad Zulkernine
Advanced Persistent Threats (APTs) have significantly impacted organizations over an extended period with their coordinated and sophisticated cyberattacks. Unlike signature-based tools such as antivirus and firewalls that can detect and block other types of malware, APTs exploit zero-day vulnerabilities to generate new variants of undetectable malware. Additionally, APT adversaries engage in complex relationships and interactions within network entities, necessitating the learning of interactions in network traffic flows, such as hosts, users, or IP addresses, for effective detection. However, traditional deep neural networks often fail to capture the inherent graph structure and overlook crucial contextual information in network traffic flows. To address these issues, this research models APTs as heterogeneous graphs, capturing the diverse features and complex interactions in network flows. Consequently, a hetero-geneous graph transformer (HGT) model is used to accurately distinguish between benign and malicious network connections. Experiment results reveal that the HGT model achieves better performance, with 100 \% accuracy and accelerated learning time, outperferming homogeneous graph neural network models.
Authored by Kazeem Saheed, Shagufta Henna
The last decade has shown that networked cyberphysical systems (NCPS) are the future of critical infrastructure such as transportation systems and energy production. However, they have introduced an uncharted territory of security vulnerabilities and a wider attack surface, mainly due to network openness and the deeply integrated physical and cyber spaces. On the other hand, relying on manual analysis of intrusion detection alarms might be effective in stopping run-of-the-mill automated probes but remain useless against the growing number of targeted, persistent, and often AI-enabled attacks on large-scale NCPS. Hence, there is a pressing need for new research directions to provide advanced protection. This paper introduces a novel security paradigm for emerging NCPS, namely Autonomous CyberPhysical Defense (ACPD). We lay out the theoretical foundations and describe the methods for building autonomous and stealthy cyber-physical defense agents that are able to dynamically hunt, detect, and respond to intelligent and sophisticated adversaries in real time without human intervention. By leveraging the power of game theory and multi-agent reinforcement learning, these selflearning agents will be able to deploy complex cyber-physical deception scenarios on the fly, generate optimal and adaptive security policies without prior knowledge of potential threats, and defend themselves against adversarial learning. Nonetheless, serious challenges including trustworthiness, scalability, and transfer learning are yet to be addressed for these autonomous agents to become the next-generation tools of cyber-physical defense.
Authored by Talal Halabi, Mohammad Zulkernine
The resource-constrained IPV6-based low power and lossy network (6LowPAN) is connected through the routing protocol for low power and lossy networks (RPL). This protocol is subject to a routing protocol attack called a rank attack (RA). This paper presents a performance evaluation where leveraging model-free reinforcement-learning (RL) algorithms helps the software-defined network (SDN) controller achieve a cost-efficient solution to prevent the harmful effects of RA. Experimental results demonstrate that the state action reward state action (SARSA) algorithm is more effective than the Q-learning (QL) algorithm, facilitating the implementation of intrusion prevention systems (IPSs) in software-defined 6LowPANs.
Authored by Christian Moreira, Georges Kaddoum
Probabilistic model checking is a useful technique for specifying and verifying properties of stochastic systems including randomized protocols and reinforcement learning models. However, these methods rely on the assumed structure and probabilities of certain system transitions. These assumptions may be incorrect, and may even be violated by an adversary who gains control of some system components.
Authored by Lisa Oakley, Alina Oprea, Stavros Tripakis
Data management systems in smart grids have to address advanced persistent threats (APTs), where malware injection methods are performed by the attacker to launch stealthy attacks and thus steal more data for illegal advantages. In this paper, we present a hierarchical deep reinforcement learning based APT detection scheme for smart grids, which enables the control center of the data management system to choose the APT detection policy to reduce the detection delay and improve the data protection level without knowing the attack model. Based on the state that consists of the size of the gathered power usage data, the priority level of the data, and the detection history, this scheme develops a two-level hierarchical structure to compress the high-dimensional action space and designs four deep dueling networks to accelerate the optimization speed with less over-estimation. Detection performance bound is provided and simulation results show that the proposed scheme improves both the data protection level and the utility of the control center with less detection delay.
Authored by Shi Yu
Privacy Policies and Measurement - The emergence of mobile edge computing (MEC) imposes an unprecedented pressure on privacy protection, although it helps the improvement of computation performance including energy consumption and computation delay by computation offloading. To this end, we propose a deep reinforcement learning (DRL)-based computation offloading scheme to optimize jointly privacy protection and computation performance. The privacy exposure risk caused by offloading history is investigated, and an analysis metric is defined to evaluate the privacy level. To find the optimal offloading strategy, an algorithm combining actor-critic, off-policy, and maximum entropy is proposed to accelerate the learning rate. Simulation results show that the proposed scheme has better performance compared with other benchmarks.
Authored by Zhengjun Gao, Guowen Wu, Yizhou Shen, Hong Zhang, Shigen Shen, Qiying Cao
Metadata Discovery Problem - Collaborative software development platforms like GitHub have gained tremendous popularity. Unfortunately, many users have reportedly leaked authentication secrets (e.g., textual passwords and API keys) in public Git repositories and caused security incidents and finical loss. Recently, several tools were built to investigate the secret leakage in GitHub. However, these tools could only discover and scan a limited portion of files in GitHub due to platform API restrictions and bandwidth limitations. In this paper, we present SecretHunter, a real-time large-scale comprehensive secret scanner for GitHub. SecretHunter resolves the file discovery and retrieval difficulty via two major improvements to the Git cloning process. Firstly, our system will retrieve file metadata from repositories before cloning file contents. The early metadata access can help identify newly committed files and enable many bandwidth optimizations such as filename filtering and object deduplication. Secondly, SecretHunter adopts a reinforcement learning model to analyze file contents being downloaded and infer whether the file is sensitive. If not, the download process can be aborted to conserve bandwidth. We conduct a one-month empirical study to evaluate SecretHunter. Our results show that SecretHunter discovers 57\% more leaked secrets than state-of-the-art tools. SecretHunter also reduces 85\% bandwidth consumption in the object retrieval process and can be used in low-bandwidth settings (e.g., 4G connections).
Authored by Elliott Wen, Jia Wang, Jens Dietrich
Measurement and Metrics Testing - In software regression testing, newly added test cases are more likely to fail, and therefore, should be prioritized for execution. In software regression testing for continuous integration, reinforcement learning-based approaches are promising and the RETECS (Reinforced Test Case Prioritization and Selection) framework is a successful application case. RETECS uses an agent composed of a neural network to predict the priority of test cases, and the agent needs to learn from historical information to make improvements. However, the newly added test cases have no historical execution information, thus using RETECS to predict their priority is more like ‘random’. In this paper, we focus on new test cases for continuous integration testing, and on the basis of the RETECS framework, we first propose a priority assignment method for new test cases to ensure that they can be executed first. Secondly, continuous integration is a fast iterative integration method where new test cases have strong fault detection capability within the latest periods. Therefore, we further propose an additional reward method for new test cases. Finally, based on the full lifecycle management, the ‘new’ additional rewards need to be terminated within a certain period, and this paper implements an empirical study. We conducted 30 iterations of the experiment on 12 datasets and our best results were 19.24\%, 10.67\%, and 34.05 positions better compared to the best parameter combination in RETECS for the NAPFD (Normalized Average Percentage of Faults Detected), RECALL and TTF (Test to Fail) metrics, respectively.
Authored by Fanliang Chen, Zheng Li, Ying Shang, Yang Yang
Information Centric Networks - Traffic in a backbone network has high forwarding rate requirements, and as the network gets larger, traffic increases and forwarding rates decrease. In a Software Defined Network (SDN), the controller can manage a global view of the network and control the forwarding of network traffic. A deterministic network has different forwarding requirements for the traffic of different priority levels. Static traffic load balancing is not flexible enough to meet the needs of users and may lead to the overloading of individual links and even network collapse. In this paper, we propose a new backbone network load balancing architecture - EDQN (Edge Deep Q-learning Network), which implements queue-based gate-shaping algorithms at the edge devices and load balancing of traffic on the backbone links. With the advantages of SDN, the link utilization of the backbone network can be improved, the delay in traffic transmission can be reduced and the throughput of traffic during transmission can be increased.
Authored by Xue Zhang, Liang Wei, Shan Jing, Chuan Zhao, Zhenxiang Chen
Resilience and antifragility under duress present significant challenges for autonomic and self-adaptive systems operating in contested environments. In such settings, the system has to continually plan ahead, accounting for either an adversary or an environment that may negate its actions or degrade its capabilities. This will involve projecting future states, as well as assessing recovery options, counter-measures, and progress towards system goals. For antifragile systems to be effective, we envision three self-* properties to be of key importance: self-exploration, self-learning and self-training. Systems should be able to efficiently self-explore – using adversarial search – the potential impact of the adversary’s attacks and compute the most resilient responses. The exploration can be assisted by prior knowledge of the adversary’s capabilities and attack strategies, which can be self-learned – using opponent modelling – from previous attacks and interactions. The system can self-train – using reinforcement learning – such that it evolves and improves itself as a result of being attacked. This paper discusses those visions and outlines their realisation in AWaRE, a cyber-resilient and self-adaptive multi-agent system.
Authored by Saad Hashmi, Hoa Dam, Peter Smet, Mohan Chhetri
Advanced metamorphic malware and ransomware use techniques like obfuscation to alter their internal structure with every attack. Therefore, any signature extracted from such attack, and used to bolster endpoint defense, cannot avert subsequent attacks. Therefore, if even a single such malware intrudes even a single device of an IoT network, it will continue to infect the entire network. Scenarios where an entire network is targeted by a coordinated swarm of such malware is not beyond imagination. Therefore, the IoT era also requires Industry-4.0 grade AI-based solutions against such advanced attacks. But AI-based solutions need a large repository of data extracted from similar attacks to learn robust representations. Whereas, developing a metamorphic malware is a very complex task and requires extreme human ingenuity. Hence, there does not exist abundant metamorphic malware to train AI-based defensive solutions. Also, there is currently no system that could generate enough functionality preserving metamorphic variants of multiple malware to train AI-based defensive systems. Therefore, to this end, we design and develop a novel system, named X-Swarm. X-Swarm uses deep policy-based adversarial reinforcement learning to generate swarm of metamorphic instances of any malware by obfuscating them at the opcode level and ensuring that they could evade even capable, adversarial-attack immune endpoint defense systems.
Authored by Mohit Sewak, Sanjay Sahay, Hemant Rathore
Achieving agile and resilient autonomous capabilities for cyber defense requires moving past indicators and situational awareness into automated response and recovery capabilities. The objective of the AlphaSOC project is to use state of the art sequential decision-making methods to automatically investigate and mitigate attacks on cyber physical systems (CPS). To demonstrate this, we developed a simulation environment that models the distributed navigation control system and physics of a large ship with two rudders and thrusters for propulsion. Defending this control network requires processing large volumes of cyber and physical signals to coordi-nate defensive actions over many devices with minimal disruption to nominal operation. We are developing a Reinforcement Learning (RL)-based approach to solve the resulting sequential decision-making problem that has large observation and action spaces.
Authored by Ryan Silva, Cameron Hickert, Nicolas Sarfaraz, Jeff Brush, Josh Silbermann, Tamim Sookoor
We present a system for interactive examination of learned security policies. It allows a user to traverse episodes of Markov decision processes in a controlled manner and to track the actions triggered by security policies. Similar to a software debugger, a user can continue or or halt an episode at any time step and inspect parameters and probability distributions of interest. The system enables insight into the structure of a given policy and in the behavior of a policy in edge cases. We demonstrate the system with a network intrusion use case. We examine the evolution of an IT infrastructure’s state and the actions prescribed by security policies while an attack occurs. The policies for the demonstration have been obtained through a reinforcement learning approach that includes a simulation system where policies are incrementally learned and an emulation system that produces statistics that drive the simulation runs.
Authored by Kim Hammar, Rolf Stadler
Control room video surveillance is an important source of information for ensuring public safety. To facilitate the process, a Decision-Support System (DSS) designed for the security task force is vital and necessary to take decisions rapidly using a sea of information. In case of mission critical operation, Situational Awareness (SA) which consists of knowing what is going on around you at any given time plays a crucial role across a variety of industries and should be placed at the center of our DSS. In our approach, SA system will take advantage of the human factor thanks to the reinforcement signal whereas previous work on this field focus on improving knowledge level of DSS at first and then, uses the human factor only for decision-making. In this paper, we propose a situational awareness-centric decision-support system framework for mission-critical operations driven by Quality of Experience (QoE). Our idea is inspired by the reinforcement learning feedback process which updates the environment understanding of our DSS. The feedback is injected by a QoE built on user perception. Our approach will allow our DSS to evolve according to the context with an up-to-date SA.
Authored by Abhishek Djeachandrane, Said Hoceini, Serge Delmas, Jean-Michel Duquerrois, Abdelhamid Mellouk