2025 Q3 | Science of Security Virtual Organization

2025 Q3

Improving Malware Classifiers with Plausible Novel Samples

Research Team Status

Names of researchers and position
(e.g. Research Scientist, PostDoc, Student (Undergrad/Masters/PhD))

- Skyler Grandel, PhD student
- Dung Thuy "Judy" Nguyen, PhD student
- Kailani "Cai" Lemieux-Mack , PhD student
- Yifan Zhang, PhD student
- Zack Silver, undergraduate student
- Evelyn Guo, undergraduate student
Any new collaborations with other universities/researchers?
- N/A

Project Goals

What is the current project goal?
- There are two current goals for this project:
  - Improved decompilation. We are seeking to develop techniques assisted
    by large language models to improve the quality of source code
    obtained via decompilation. Specifically, we want to ensure that
    decompiled source is comprehensible and recompilable to improve its
    utility in reverse engineering pipelines (e.g., for malware analysis).
    Doing so will help improve techniques for subsequently modeling novel
    plausible samples (and downstream classifiers).
  - Improving neural network robustness. We are seeking to improve our
    ability to validate that neural networks adhere to a given
    specification with stronger guarantees of some probability. We are
    also improving how neural models can be made more resilient against
    adversarial manipulation (e.g., due to backdoor attacks). This goal
    aligns with our overall vision of improving how classifiers
    (especially security-critical malware classifiers) can be made
    withstand interference from adversaries seeking to undermine them.
How does the current goal factor into the long-term goal of the project?
- This quarter's goals are well-aligned with our overall goal of
  improving malware classifiers. Improving decompilation is critically
  relevant to comprehension and explainability, not only by humans, but
  by neural models used to generate novel variants to improve
  classification. Moreover, our foundational improvements to classifier
  robustness will help ensure that we can quantify their performance
  characteristics and provide stronger guarantees about the limits of
  their behavior.

Accomplishments

Address whether project milestones were met. If milestones were not met, explain why, and what are the next steps.
- We continue to be on track for our Year 2 goals. Our work has
  culminated in ESORICS 2025 and NeurIPS 2025 publications, new
  submissions under review, and several invited talks.
What is the contribution to foundational cybersecurity research? Was there something discovered or confirmed?
- On decompilation: we have improved re-executability of decompiled
  binaries by 10% across two different datasets using large language
  models. Typically, decompiled source is poor quality, lacking
  semantics provided in the original source code. Moreover, decompiled
  source often fails to re-compile again -- missing headers, libraries,
  linking issues, and (when LLMs are involved) syntax errors. These
  issues prevent full understanding of program behavior, especially when
  probative testing is performed (e.g., rewriting or changing portions
  of the decompiled source). Furthermore, even when decompilation
  yields re-compilable source, the re-compiled program is almost never
  semantically equivalent (i.e., it produces different outputs from the
  original program or crashes). Building upon our previously-reported
  COMCAT approach, we have leveraged in-context learning and
  retrieval-augmented generation to further refined decompiled source,
  leading to improvements in re-executability rates by as much as 10%.
- On neural network robustness and verification: we developed a new
  conformal prediction- based method that can provide probabilistic
  guarantees -- i.e., that a specification holds with a certain
  probability for a neural network. The approach is scalable and is
  nearly architecture-agnostic and has culminated in a NeurIPS'25 paper
  accepted. Based on a subsequent investigation, we
  observed that we can still find adversarial perturbations even if a
  given network has a very high probability (i.e., >99.999%) of being
  locally adversarially robust. This provides guidance about the
  coverage of the model's input space and how robustness is evaluated.
Impact of research
- Internal to the university (coursework/curriculum)
  - N/A
- External to the university (transition to industry/government (local/federal); patents, start-ups, software, etc.)
  - Improvements integrated into the Neural Network Verification tool: https://github.com/verivital/nnv/
  - PARDON source code: github.com/judydnguyen/PARDON-FedDG
  - Organized VNN-COMP colocated with CAV/SAIV 2025
- Any acknowledgements, awards, or references in media?
  - N/A

Publications and presentations

Add publication reference in the publications section below. An authors copy or final should be added in the report file(s) section. This is for NSA's review only.

- Preston Robinette, Thuy Dung Nguyen, Samuel Sasaki and Taylor T
Johnson. Trigger-Based Fragile Model Watermarking for Image
Transformation Networks. In ESORICS 2025. Preprint: https://arxiv.org/pdf/2409.19442

- Navid Hashemi, Samuel Sasaki, Ipek Oguz, Meiyi Ma, Taylor Johnson.
Scaling Data-Driven Probabilistic Robustness Analysis for Semantic
Segmentation Neural Networks. In NeurIPS 2025: https://neurips.cc/virtual/2025/poster/116265.

- Invited paper/presentation at Allerton 2025: Is Neural Network
Verification Useful and What Is Next?

- MIT LIDS Seminar, October 10, 2025: From Neural Network Verification
to Formally Verifying Neuro-Symbolic Artificial Intelligence (AI).

- RMIT SCT DSAI Seminar (online), September 30, 2025: From Neural
Network Verification to Formally Verifying Neuro-Symbolic Artificial
Intelligence (AI)

- Dagstuhl Seminar 25392, 9/21-9/26 2025, with talk: Taylor Johnson,
"Let's Verify ChatGPT: What Would We Verify & How Could We Get There?"

- Previously reported ICDCS 2025 paper, presented since last quarter:
PARDON: Privacy-Aware and Robust Federated Domain Generalization.

(NB the publication portal on sos-vo is not functioning as of this report. Please see the references above).

Optionally, upload technical presentation slides that may go into greater detail. For NSA's review only.

Lead PI:

Kevin Leach

Co-Pi(s):

Taylor Johnson