2024 Q3 | Science of Security Virtual Organization

2024 Q3

Towards Trustworthy Autonomous Cyber Defense for Dynamic Intrusion Response

Research Team Status

David Garlan, Professor, School of Computer Science (PI)
Ehab Al-Shaer, Distinguished Research Fellow, School of Computer Science (Co-PI)
Bradley Schmerl, Principal Systems Scientist, School of Computer Science (Co-PI)
Qi Duan, Research Scientist, School of Computer Science (Senior Researcher)
Ryan Wagner, PhD Student, School of Computer Science (PhD Student)
Any new collaborations with other universities/researchers?

Project Goals

What is the current project goal?

Designing a formal specification for cyber threat mitigation playbooks to enable flexible and formally verifiable intrusion response strategies. Testing and evaluating this playbook specifications using real-life use-cases.
Developing techniques and tools for verifying the correctness and evaluating the effectiveness of mitigation playbooks.
Developing new models, frameworks and techniques for autonomous cyber defense agents using Deep Reinforcement Learning (DRL) agents to enable real-time, adaptive, and scalable response against dynamic APT adversaries. Testing and evaluating our models and techniques using various use-cases such as stealthy DDoS and multi-stage exfiltration attacks.

How does the current goal factor into the long-term goal of the project?
- Establishing a formal specification for playbooks is essential, as it lays the groundwork for achieving long-term goals. This includes ensuring the accuracy and efficiency of these playbooks and enabling the real-time dynamic generation of playbooks through reinforcement learning to effectively counteract attackers.
- Developing techniques for verifying and evaluating playbooks will provide provably safe courses of action, which are crucial for autonomous cyber defense, a primary long-term goal of this project.
- Creating autonomous agents using Deep Reinforcement Learning (DRL) is vital for exploring new models of self-adaptive systems designed to deliver optimal real-time responses against dynamic attackers in large-scale environments, such as DoD networks. Extending and tailoring the existing theoretical foundation of adaptive systems, like POMDP, and DRL, to meet the demands of real-time and large-scale intrusion response is a prerequisite for developing autonomous cyber defense agents for DoD networks.

Accomplishments

Address whether project milestones were met. If milestones were not met, explain why, and what are the next steps.
We have achieved several milestones that significantly contribute to the project's various objectives:
- Defense strategies must adapt to high-performance botnets by integrating AI-driven anomaly detection and robust multi-layered defenses to keep pace with these evolving threats. We are developing a novel deep reinforcement learning (DRL) framework for mitigating Infrastructure DDoS attacks. Our framework features an actor-critic-based DRL network, integrated with variational auto-encoders (VAE) to improve learning efficiency and scalability. The VAE assesses the risk of each traffic flow by analyzing various traffic features, while the actor-critic networks use the current link load and the VAE-generated flow risk scores to determine the probability of DDoS mitigation actions, such as traffic limiting, redirecting, or sending puzzles to verify traffic sources. The puzzle inquiry results are fed back to the VAE to refine the risk assessment process.
  The key strengths of our framework are: (1) the VAE serves as an adaptive anomaly detector, evolving based on DRL agent actions instead of relying on static IDS rules that may quickly become outdated; (2) by separating traffic behavior characterization (handled by VAE) from action selection (handled by DRL), we significantly reduce the DRL state space, enhancing scalability; and (3) the dynamic collaboration between the DRL engine and the VAE allows for real-time adaptation to evolving attack patterns with high efficiency.
  Currently, our framework identifies attack traffic flows, achieving a true positive rate exceeding 95% and a false positive rate below 4%. Additionally, it learns the optimal strategy in a reasonable time, under 20,000 episodes in most experimental settings.
- Neural networks are often overconfident about their predictions, which undermines their reliability and trustworthiness, especially when recommending security interventions. We are working on a novel technique, named Error-Driven Uncertainty Aware Training (EUAT), which aims to enhance the ability of neural classifiers to estimate their uncertainty correctly, namely to be highly uncertain when they output inaccurate predictions and low uncertain when their output is accurate. The EUAT approach operates during the model’s training phase by selectively employing two loss functions depending on whether the training examples are correctly or incorrectly predicted by the model. This allows for pursuing the twofold goal of i) minimizing model uncertainty for correctly predicted inputs and ii) maximizing uncertainty for mispredicted inputs, while preserving the model’s misprediction rate.
What is the contribution to foundational cybersecurity research? Was there something discovered or confirmed?
1- Developing new foundations for model-based Deep Reinforcement Learning (DRL) that allow for the use of two player agents (defender and attacker) without relying on stochastic game theory, which often does not scale well in the context of cyber defense
2- The approach to training neural networks to be more accurate about their uncertainty in a classification will help to provide more trustworthy and accurate algorithms that can be used in an automated security response.
Impact of research
- Both PIs have used many of the materials and results generated by this project in their teaching courses and research seminars. We have taught a graduate-level course on Self-Adaptive Systems employing Deep Reinforcement Learning at the School of Computer Science at Carnegie Mellon University. Within this course, various use cases and examples from this project have been incorporated into the class material and presentations. Furthermore, one of the course's final projects involves creating a dynamic playbook tailored for advanced lateral movement attacks.

Publications and presentations

Add publication reference in the publications section below. An authors copy or final should be added in the report file(s) section. This is for NSA's review only.
Optionally, upload technical presentation slides that may go into greater detail. For NSA's review only.

Lead PI:

Ehab Al-Shaer

Co-Pi(s):

Bradley Schmerl

David Garlan

Report Materials

Publications

Error-Driven Uncertainty Aware Training

Hyper-parameter Tuning for Adversarially Robust Models

Self-adaptive Device Management for the IoT Using Constraint Solving