2024 Q2 | Science of Security Virtual Organization

2024 Q2

Towards Trustworthy Autonomous Cyber Defense for Dynamic Intrusion Response

Research Team Status

Names of researchers and position
- Ehab Al-Shaer, Distinguished Research Fellow, School of Computer Science (PI)
- David Garlan, Professor, School of Computer Science (Co-PI)
- Bradley Schmerl, Principal Systems Scientist, School of Computer Science (Co-PI)
- Qi Duan, Research Scientist, School of Computer Science (Senior Researcher)
- Ryan Wagner, PhD Student, School of Computer Science (PhD Student)
Any new collaborations with other universities/researchers?
- None yet

Project Goals

What is the current project goal?
- (1) Designing a formal specification for cyber threat mitigation playbooks to enable flexible and formally verifiable intrusion response strategies. Testing and evaluating this playbook specifications using real-life use-cases.
- (2) Developing techniques and tools for verifying the correctness and evaluating the effectiveness of mitigation playbooks.
- (3) Developing new models, frameworks and techniques for autonomous cyber defense agents using Deep Reinforcement Learning (DRL) agents to enable real-time, adaptive, and scalable response against dynamic APT adversaries. Testing and evaluating our models and techniques using various use-cases such as stealthy DDoS and multi-stage exfiltration attacks.
How does the current goal factor into the long-term goal of the project?
- Establishing a formal specification for playbooks is essential, as it lays the groundwork for achieving long-term goals. This includes ensuring the accuracy and efficiency of these playbooks and enabling the real-time dynamic generation of playbooks through reinforcement learning to effectively counteract attackers.
- Developing techniques for verifying and evaluating playbooks will provide provably safe courses of action, which are crucial for autonomous cyber defense, a primary long-term goal of this project.
- Creating autonomous agents using Deep Reinforcement Learning (DRL) is vital for exploring new models of self-adaptive systems designed to deliver optimal real-time responses against dynamic attackers in large-scale environments, such as DoD networks. Extending and tailoring the existing theoretical foundation of adaptive systems, like POMDP, and DRL, to meet the demands of real-time and large-scale intrusion response is a prerequisite for developing autonomous cyber defense agents for DoD networks.

Accomplishments

Address whether project milestones were met. If milestones were not met, explain why, and what are the next steps?
We have achieved several milestones that significantly contribute to the project's various objectives:
1. We are finalizing the design of the Playbook Formal Specification (PFS), which includes several critical criteria:
- Flexibility: It defines arbitrary courses of action for cyber defense.
- Verifiability: It includes constructs that enable the verification of the playbook's correctness.
- Adaptability: It allows courses of action to be adaptive based on system observations.
2. We are investigating the use of Generative AI and Large Language Models to produce and reason about playbooks, focusing on developing the Target Breach exemplar.
3. We have developed a new theory and architecture for self-adaptive agents using Deep Reinforcement Learning (DRL) to provide scalable and real-time cyber responses. Initially, we extended the formulation of Partially Observable Markov Decision Processes (POMDP) to consider two player agents: defender and attacker, while preserving the integrity of POMDP solvers. Subsequently, we are developing a new multi-agent architecture based on hierarchical DRL using auto-encoder GenAI techniques (see technical report named autoencoder-DRL) to distribute and scale defense actions, achieving both optimal and real-time responses.
4. We developed an Error-Driven Uncertainty Aware Training (EUAT), a novel technique designed to improve neural models' ability to estimate uncertainty accurately. EUAT operates during training by using two loss functions based on whether the model's predictions are correct or incorrect. This method aims to decrease uncertainty for accurate predictions and increase uncertainty for inaccurate ones, while maintaining the overall prediction accuracy. This can be integrated with gradient descend optimization of DRL agents to improve their efficiency during simulation and training.
5. We developed techniques to addresses optimizing system utility in Machine Learning (ML) systems by tackling ML mispredictions. It proposes a probabilistic modeling framework that evaluates the cost-benefit trade-offs of adapting ML components, such as model retraining. The key approach is to separate the estimation of performance improvement from the impact on overall system utility.
The next steps (one-year plan) induces the following:
(1) Playbook Adaptation
- Exploring Cooperative, Multi-agent (MADDPG), and Hierarchical DRL for DDoS
- Generalizing our approach for APT attacks mitigation response; Use lateral movement and exfiltration as a case study (e.g., Target Breach)
- Robust against adversarial / deceptive attacks
(2) Playbook Specification & Verification
- Completing the Playbook Specification and integrating it with existing adaptation framework (e.g., Stitch and Rainbow).
(3) Playbook Synthesis
- Exploring using Large Language Models (LLM) for generating and evaluating mitigation playbooks given network configuration and attack scenario
What is the contribution to foundational cybersecurity research? Was there something discovered or confirmed?
1- Developing new foundations for model-based Deep Reinforcement Learning (DRL) that allow for the use of two player agents (defender and attacker) without relying on stochastic game theory, which often does not scale well in the context of cyber defense
Impact of research
- Internal to the university (coursework/curriculum)
  Both PIs have used many of the materials and results generated by this project in their teaching courses and research seminars. The principal investigator (PI) has taught a new graduate-level course on Self-Adaptive Systems employing Deep Reinforcement Learning at the School of Computer Science at Carnegie Mellon University. Within this course, the PI incorporates various use cases and examples from this project into the class material and presentations. Furthermore, one of the course's final projects involves creating a dynamic playbook tailored for advanced lateral movement attacks.
- External to the university (transition to industry/government (local/federal); patents, start-ups, software, etc.)
  We developed AI models and code and generated training data that can be used by research community as well as industry and government partners. We plan to create GitHub repository for this project in the near future for distribution.
- Any acknowledgements, awards, or references in media?

Publications and presentations

Add publication reference in the publications section below. An authors copy or final should be added in the report file(s) section. This is for NSA's review only.
Optionally, upload technical presentation slides that may go into greater detail. For NSA's review only.

Lead PI:

Ehab Al-Shaer

Co-Pi(s):

Bradley Schmerl

David Garlan

Report Materials

Publications

MONDEO-Tactics5G: Multistage botnet detection and tactics for 5G/6G networks

Self-Adapting Machine Learning-based Systems via a Probabilistic Model Checking Framework

Error-Driven Uncertainty Aware Training

Hyper-parameter Tuning for Adversarially Robust Models