Leveraging Machine Learning for Binary Software Understanding

Co-Pi:

Abstract

The goal of our research is to achieve binary software understanding—that is, the ability to recreate the semantic meaning of the original source code as well as the intention of the developer. We envision that this will help human analysts reverse engineering software, identify vulnerabilities, decode/defang malware, and patch legacy software, as well as help automated techniques make sense of the vast amount of binary software.

We aim to significantly advance the state-of-the-art in decompilation by leveraging Machine Learning techniques to achieve semantically-equivalent decompilation for binary software. Our focus here is on generating a representative dataset of original source code and compiled binary code to train machine learning models. Our preliminary work in this area has shown that existing techniques created improper datasets, which significantly impacted the resulting models and evaluation. We also aim to fundamentally improve the evaluation of decompilation techniques.

Yan Shoshitaishvili

Yan Shoshitaishvili is an Associate Professor at Arizona State University, where he pursues parallel passions of cybersecurity research, real-world impact, and education. His research focuses on automated program analysis and vulnerability detection techniques. Aside from publishing dozens of research papers in top academic venues, Yan led Shellphish’s participation in the DARPA Cyber Grand Challenge, achieving the creation of a fully autonomous hacking system that won third place in the competition.

Underpinning much of his research is angr, the open-source program analysis framework created by Yan and his collaborators. This framework has powered hundreds of research papers, helped find thousands of security bugs, and continues to be used in research labs and companies around the world.

When he is not doing research, Yan participates in the enthusiast and educational cybersecurity communities. He is a Captain Emeritus of Shellphish, one of the oldest ethical hacking groups in the world, and a founder of the Order of the Overflow, with whom he ran DEF CON CTF, the “world championship” of cybersecurity competitions, from 2018 through 2021. Now, he helps demystify the hacking scene as a co-host of the CTF RadiOOO podcast and forge connections between the government and the hacking community through his participation on CISA’s Technical Advisory Council. In order to inspire students to pursue cybersecurity (and, ultimately, compete at DEF CON!), Yan created pwn.college, an open practice-makes-perfect learning platform that is revolutionizing cybersecurity education for aspiring hackers around the world.

Education

PhD, Computer Science, UC Santa Barbara. BS, Computer Science, Rensselaer Polytechnic Institute.

Institution: Arizona State University

Sponsor: NSA

Project Material

Reports