2025 Q1 | Science of Security Virtual Organization

2025 Q1

Improving Malware Classifiers with Plausible Novel Samples

Research Team Status

Names of researchers and position
- Dung Thuy "Judy" Nguyen, PhD student
- Kailani "Cai" Lemieux-Mack , PhD student
- Yifan Zhang, PhD student
- Preston Robinette, PhD student
- Eli Jiang, undergraduate student
- Evelyn Guo, undergraduate student
Any new collaborations with other universities/researchers?
- No new collaborations to report. Previous collaboration with ASU is ongoing.

Project Goals

What is the current project goal?
- This quarter, we have primarily considered enhancements to AI model robustness in general, which in turn will apply to malware classification. We have developed defenses against adversarial and backdoor attacks to preserve generative model performance, especially in machine unlearning contexts to enhance security and privacy of such models.
How does the current goal factor into the long-term goal of the project?
- The broader focus on AI model robustness will translate to malware classifiers. We previously developed augmentation and purification techniques to enhance malware classification performance, however these techniques have been more data focused. By turning attention to the model, we can provide improvements to robustness of models more generally, in turn also affecting malware classification tasks.

Accomplishments

Address whether project milestones were met. If milestones were not met, explain why, and what are the next steps.
- We are on track to meet milestones specified in Year 2. Aside from previously-reported MalMixer and PBP techniques specific to malware classifiers, the broader techniques we have been developing this quarter contribute to an overall goal of improving AI robustness and generalizability, especially when poisoned or tampered with.
What is the contribution to foundational cybersecurity research? Was there something discovered or confirmed?
- We have been developing a machine unlearning technique for generative models, specifically focusing on sequential unlearning. Existing unlearning approaches work well only when presented with a single or batch request for unlearning a datapoint. In contrast, our approach can unlearn multiple datapoints in sequence. This is relevant for supporting usecases where a model provider or operator must service requests to unlearn data that appear sequentially in time. This can enhance privacy of user data placed in machine learning model. In the context of malware classifiers, this approach could be used to remove poisoned samples from training, providing an alternative approach to purification.
Impact of research
- Internal to the university (coursework/curriculum)
  - Findings from this quarter have been incorporated into CS6380, a graduate computer security seminar course.
- External to the university (transition to industry/government (local/federal); patents, start-ups, software, etc.)
  - None to report this quarter.
- Any acknowledgements, awards, or references in media?
  - None to report this quarter.

Publications and presentations

Add publication reference in the publications section below. An authors copy or final should be added in the report file(s) section. This is for NSA's review only.
Optionally, upload technical presentation slides that may go into greater detail. For NSA's review only.

Lead PI:

Kevin Leach

Co-Pi(s):

Taylor Johnson

Report Materials

Publications

PARDON: Privacy-Aware and Robust Federated Domain Generalization