Verifiable Binary Lifting
Presented as part of the 2021 HCSS conference.
ABSTRACT
This talk describes work at Galois on building a verifiable decompiler from machine code to LLVM. The goal is to support new uses of binary analysis such as program recompilation for optimization and security improvements, security patching, and retrofitting legacy applications for new architectures or mission requirements. These new applications have different requirements than traditional uses of decompilation tools such as reverse engineering or malware analysis. In particular, these new applications place less emphasis on helping people understand binaries, and more on helping compilers and verification tools analyze binaries. In particular with these new applications, new binaries may be generated that are intended to run critical programs, and it is imperative that the decompilation framework does no harm by introducing new bugs. Without the source and support engineering team, it will be very difficult to recertify a translated program, and skipping certification is not acceptable in safety critical environments. In this talk, we describe our experience building our decompiler to LLVM and how we are working on providing assured binary lifting. In our experience, there are always improvements and changes to be made in terms of supporting new features for ISAs, operating systems, and compilers, and it would be extremely difficult to build and maintain a verified decompiler with proofs robust to changes. Instead, we have focused on a two phased approach:
- Our decompiler produces untrusted annotations that relate the machine code to the LLVM.
- We have an independently written decompiler verifier that uses the annotations and a compositional strategy and SMT solving to automatically discharge a large number of small program equivalence checks.
We will describe our compositional approach for automated verification of the generated programs, and how we have modified our decompiler to record the relationships between machine code and LLVM at the level of individual basic blocks. This allows us to decompose proofs about overall decompilation of a program into many efficiently solvable SMT-queries about operations within individual basic blocks. An additional benefit of our approach is that the decompiler and verifier are written independently. The decompiler and verifier are written in different languages and use machine code semantics obtained from different sources. This provides an informal form of additional assurance that a mistake in either semantics must be present in precisely the same way in the other to successfully invalidate a successful verification result. Our verifier is written within the Lean theorem prover, and the eventual goal is to formally show that the compositional strategy used is sound for all programs. The techniques described in this presentation are implemented in the Galois open source reopt program recompilation tool. We will show this tool, and discuss how the HCSS 2021 composition and proof robustness themes have been addressed in the design of the reopt. The project depicted is sponsored by the Office of Naval Research under Contract No. N68335-17-C-0558. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Office of Naval Research.
Dr. Joe Hendrix is a Principal Researcher at Galois where he focuses on helping transition formal methods technologies to address real world problems. His technical areas of interest include binary analysis, programing language semantics, decision procedures and interactive theorem proving.