Capabilities Labeling

pdf

Abstract

The goal of reverse engineering (RE) is to determine the purpose and intent of software, such as legacy binaries, malware, or COTS components of unknown provenance. While RE tools have improved, the task is still daunting, especially for stripped binaries with no function or variable names. Understanding such code is a time-consuming, attention-demanding, and error-prone task, and the skills applied by experts can take years of experience to develop. Many state-of-the-art RE tools provide primarily generic information, such as entry-points or reachability. Recent advancements have used machine learning (ML) to assist the analyst with meaningful information about code functionality, but ML approaches rely on large semantically tagged training corpora. While such corpora can be inferred for source code, they are less common for binaries and far too sparse for embedded (firmware) binaries.

We are developing a form of RE guidance we call "capability labeling" which labels functions and modules with high-level semantic categories from a predefined hierarchy. Labels such as “authentication/credentials”, “buffer/copy”, “bus/modbus”, “encryption/aes”, and “network/protocol/http” (a few examples out of a list of over 180 identified to date) help the analyst understand the purpose and intent of specific portions of the binary. These labels allow engineers to focus their attention on areas of specific interest, including:

  • Cryptographic functions relevant to security assessments.
  • Communication protocols and interfaces that may be relevant fuzzing targets.
  • Library functions -- even ones that are statically linked without symbols.
  • Hardware-tied functions that must be addressed for re-hosting or emulation.

We synthesize these labels starting from static analysis results that identify constants (numerical, string, and address literals) embedded in the binary. Several heuristics then dissect and combine this information. Certain numeric constants and tables (for example, a CRC16-CCITT polynomial) are indicative of commonly used algorithms such as encryption and checksums. Hardware addresses representing MMIO operations appear in patterns which can help infer both the hardware platform, and the attached peripheral. Some string literals, including log and error messages, carry information about software functionality in natural-language form; others represent fixed portions of standards for file formats or network protocols. We also heuristically propagate this information up from the individual function to clusters (inferred from control and data flows) that represent larger functional modules.

In this presentation, we will describe several use cases for this functionality, showing how the tool can help a human analyst identify which parts of the binary they care most about. We will demonstrate to HCSS attendees how the tool can help with assessments of correctness, reliability, and integrity of software, including cyber-physical systems (CPS) even when no source code is available. We will elaborate on how we extract information for individual functions and integrate the function-level information to understand larger software modules. We will also show how we have validated our technology, by asking a human engineer to manually annotate a subset of functions in a real-world CPS binary (ArduCopter autopilot) with capability labels. To simplify the task and achieve better accuracy, we allowed the engineer to use symbol information as well as to reference the source code.

In 30 hours, the human analyst was able to review the source code and assign labels for only a sample (11%) of the functions; in contrast, the automated approach evaluated every function in the binary in only 44 minutes. We then compared the results against those produced by our automated analysis, revealing interesting insights about the errors made by both human and automated analysis, which we will also share. After the human analyst was given an opportunity to correct his errors, the agreement between human and tool was 71%, which compares favorably to what might be expected if two different humans were assigned to do this analysis (inter-rater reliability).

Authors

Greg Nelson is a Software Engineer at GrammaTech, In. He has over 25 years of experience in embedded system and IoT device design, including network-connected high-speed imaging devices, Bluetooth 4.0 networked dosimetry for nuclear power, and HVAC-integrated building protection systems. He is an inventor of six patents on devices relating to nuclear energy, several of which target industrial and IoT devices. Previously, he was Vice President of Research and Development at PGT Instruments of Princeton, NJ. He received his MS in computer science from Carnegie Mellon with a specialization in machine learning (focused on speech and language understanding), and a BS in engineering from Princeton University in computer science. In his spare time, he likes to build things including net-zero-energy homes.

Denis Gopan is a senior scientist at GrammaTech, Inc. His research interests focus on applying formal methods and program analysis techniques to the fields of software engineering and computer security. At GrammaTech, he has worked on various aspects of machine-code analysis, high-level system modeling, and configuration security. He has been a technical lead on a number of DoD sponsored research projects. Gopan received a B.Sc. in Computer Science from the University of Wisconsin-Milwaukee (1996) and a Ph.D. in Computer Science from the University of Wisconsin-Madison (2007). His PhD thesis focused on numeric program analysis.

Tags:
License: CC-2.5
Submitted by Anonymous on