WIP: Assessing the Effects of Source Language on Binary Similarity Tools

ABSTRACT

State-of-the-art binary similarity tools have been developed to account for and are evaluated against variations in compilers, compiler flags, optimization levels, architectures, and even obfuscations. Al- though these tools aim to measure and detect binary code segments generated from similar or identical source code segments, they have yet to be evaluated on source languages other than C/C++. We present a work in progress to assess the effects of source language on modern binary similarity tools. Specifically, we provide a comparative investigation on the efficacy of BSim, a recently released component of the Ghidra framework, when comparing binaries produced by C as well as Rust. Using a benchmark of 800 binaries and more than one million functions, we investigate the overall accuracy and differentiating ability of BSim and find that the source language introduces a significant degree of imprecision not previously documented. We also provide a technical overview of the BSim utility, which provides context for our assessment results and a clear direction for addressing the shortcomings highlighted by our findings.

BIO

Landen Doty Headshot

Landen Doty is a Master's student at the University of Kansas studying Computer Science with an emphasis in Systems Software Security. He is advised by Dr. Prasad Kulkarni and is a recipient of the CyberCorps Scholarship for Service. He also works as an year-round Research and Development Intern at Sandia National Laboratories and will be joining as full-time staff in June.

License: CC-3.0

Submitted by Regan Williams on Sat, 03/29/2025 - 11:27

Hot Topics in the Science of Security Symposium (HotSoS)

WIP: Assessing the Effects of Source Language on Binary Similarity Tools