PBP: Post-training Backdoor Purification for Malware Classifiers

ABSTRACT

In recent years, the rise of machine learning (ML) in cybersecurity has brought new challenges, including the increasing threat of backdoor poisoning attacks on ML malware classifiers. These attacks aim to manipulate model behavior when provided with a particular input trigger. For instance, adversaries could inject malicious samples into public malware repositories, contaminating the training data and potentially misclassifying malware by the ML model. Current countermeasures predominantly focus on detecting poisoned samples by leveraging disagreements within the outputs of a diverse set of ensemble models on training data points. However, these methods are not suitable for scenarios where Machine Learning-as-a-Service (MLaaS) is used or when users aim to remove backdoors from a model after it has been trained.

Addressing this scenario, we introduce PBP, a post-training defense for malware classifiers that mitigates various types of backdoor embeddings without assuming any specific backdoor embedding mechanism. Our method exploits the influence of backdoor attacks on the activation distribution of neural networks, independent of the trigger-embedding method. In the presence of a backdoor attack, the activation distribution of each layer is distorted into a mixture of distributions. By regulating the statistics of the batch normalization layers, we can guide a backdoored model to perform similarly to a clean one.

Our method demonstrates substantial advantages over several state-of-the-art methods, as evidenced by experiments on two datasets, two types of backdoor methods, and various attack configurations. Our experiments showcase that PBP can mitigate even the SOTA backdoor attacks for malware classifiers, e.g., Jigsaw Puzzle, which was previously demonstrated to be stealthy against existing backdoor defenses. Notably, our approach requires only a small portion of the training data — only 1% — to purify the backdoor and reduce the attack success rate from 100% to almost 0%, a 100-fold improvement over the baseline methods. Our code is available at https://github.com/judydnguyen/pbp-backdoor-purification-official.

BIO

Judy Nguyen headshot

Dung (Judy) Nguyen is a second-year PhD student in Computer Science at Vanderbilt University, advised by Prof. Taylor Johnson and Prof. Kevin Leach. She previously earned her B.S. in Computer Science from Hanoi University of Science and Technology (Vietnam).

Judy is passionate about building secure and trustworthy AI systems, driven by her curiosity and skepticism toward systems that lack perfect explainability. Her research explores adversarial, privacy, and ethical vulnerabilities in both centralized and decentralized ML/AI systems, with a focus on developing practical solutions to mitigate these risks. She strongly believes that AI should serve humanity by making people feel both happy and safe. Her work has been published in NeurIPS, NDSS, the Engineering Applications of Artificial Intelligence Journal, and IEEE Transaction.

Beyond research, Judy has a deep love for photography, books, and guitar, and she never says no to insightful discussions with friends and fellow academics.

License: CC-3.0

Submitted by Regan Williams on Sat, 03/29/2025 - 11:41

Hot Topics in the Science of Security Symposium (HotSoS)

PBP: Post-training Backdoor Purification for Malware Classifiers