Adaptive Dual-Layer DDoS Mitigation using Autoencoder and Reinforcement Learning

Adaptive Dual-Layer DDoS Mitigation using Autoencoder and Reinforcement Learning
Author	Qi Duan
Abstract	As of 2024, the landscape of infrastructure Distributed Denial of Service (DDoS) attacks continues to evolve with increasing complexity and sophistication. These attacks are not only increasing in volume but also in their ability to evade traditional defenses due to advancements in AI, which enables adversaries to dynamically adapt their attack targets and tactics to maximize damage. The emergence of high-performance botnets utilizing virtual machines allows attackers to launch large-scale attacks with fewer resources. Consequently, defense strategies must adapt by integrating AI-driven anomaly detection and robust multi-layered defenses to keep pace with these evolving threats. In this paper, we introduce a novel deep reinforcement learning (DRL) framework for mitigating Infrastructure DDoS attacks. Our framework features an actor-critic-based DRL network, integrated with variational autoencoders (VAE) to improve learning efficiency and scalability. The VAE assesses the risk of each traffic flow by analyzing various traffic features, while the actor-critic networks use the current link load and the VAE-generated flow risk scores to determine the probability of DDoS mitigation actions, such as traffic limiting, redirecting, or sending puzzles to verify traffic sources. The puzzle inquiry results are fed back to the VAE to refine the risk assessment process. The key strengths of our framework are: (1) the VAE serves as an adaptive anomaly detector, evolving based on DRL agent actions instead of relying on static IDS rules that may quickly become outdated; (2) by separating traffic behavior characterization (handled by VAE) from action selection (handled by DRL), we significantly reduce the DRL state space, enhancing scalability; and (3) the dynamic collaboration between the DRL engine and the VAE allows for real-time adaptation to evolving attack patterns with high efficiency. We show the feasibility and effectiveness of the framework with various attack scenarios. Our approach uniquely integrates an actor-critic learning algorithm with the VAE to understand traffic flow properties and determine optimal actions through a continuous learning process. Our evaluation demonstrates that this framework effectively identifies attack traffic flows, achieving a true positive rate exceeding 95% and a false positive rate below 4%. Additionally, it learns the optimal strategy in a reasonable time, under 20,000 episodes in most experimental settings.
Year of Publication	2024
Conference Name	IEEE Conference on Software Engineering for Adaptive and Self-Managing Systems
Google Scholar \| BibTeX
Refereed Designation	Submitted for review.