Security Analysis of LLVM Bitcode Files for Mobile
Presented as part of the 2013 HCSS conference.
ABSTRACT:
The LLVM bitcode file format contains an abstract representation of code and data that is at a level below high-level code formats like Java bytecodes and above low-level formats like x86 binaries. It is an attractive option for distributing binary executables for native applications intended to run on a range of platforms. This is especially true for mobile platforms that support both web applications and native applications. While web applications offer several benefits from the viewpoints of developer productivity and platform security, there are important cases when ISVs prefer to deploy native code instead. Distribution of a native mobile application in LLVM bitcode format allows ISVs to tap into native capabilities and platform vendors to retarget the same binary to a range of devices including smartphones, tablets, game consoles and set-top boxes. A real-world motivation for this approach is Samsung's Tizen mobile platform that uses LLVM bitcode as the distribution format for C/C++ applications. There are multiple benefits that arise when using LLVM bitcode for this purpose. First, it offers the Tizen store the ability to compile LLVM bitcode to machine code in different ways for different devices using device profile information. Second, it offers an opportunity for security analysis at the LLVM bitcode level in the Tizen Store.
In this talk, we focus on the second opportunity, and summarize work under way at Rice University to develop security analysis tools for mobile applications distributed as LLVM bitcode files. The foundation for these tools is a rule-based static analysis framework to assist security experts in determining whether or not a given native application is "safe" or "unsafe". The formal definition of "safety" is embodied in a set of rules (encoded in XML format) that capture critical security properties. The static analysis framework is implemented in LLVM, and leverages a taint analysis approach to analyze information flow. The initial demonstration of these rules is for Privacy Leak analysis, in which certain APIs are modeled as "sources" of privacy information and certain APIs as "sinks". Our framework tracks control and data flow paths for information flow, and allows for certain APIs to be flagged as "sanitizers".
A key aspect of our approach is efficient static analysis using SSA form and the control dependence graph, so that large numbers of applications can be analyzed quickly with precise taint propagation through control and data flow. Another aspect is to perform a ranking on privacy leak reports so that false positive and false negative rates can be controlled by setting appropriate thresholds on the ranking. Finally, a unique aspect of our approach is extension of SSA form to support effective static analysis in the presence of aliased objects, aliased arrays and virtual method calls.
This work was done in collaboration with Michael Burke, Deepak Majeti, Dragos Sbirlea, Dan Wallach, Jisheng Zhao at Rice University.
BIO:
Vivek Sarkar conducts research in multiple aspects of parallel software including programming languages, program analysis, compiler optimizations and runtimes for parallel and high performance computer systems. He currently leads the Habanero Multicore Software Research project at Rice University, and serves as Associate Director of the NSF-funded Center for Domain-Specific Computing. Prior to joining Rice in July 2007, Vivek was Senior Manager of Programming Technologies at IBM Research. His responsibilities at IBM included leading IBM's research efforts in programming model, tools, and productivity in the PERCS project during 2002- 2007 as part of the DARPA High Productivity Computing System program. His past projects include the X10programming language, the Jikes Research Virtual Machine for the Java language, the ASTI optimizer used in IBM's XL Fortran product compilers, the PTRAN automatic parallelization system, and profile-directed partitioning and scheduling of Sisal programs. In 1997, he was on sabbatical as a visiting associate professor at MIT, where he was a founding member of the MIT Raw multicore project. Vivek became a member of the IBM Academy of Technology in 1995, the E.D. Butcher Chair in Engineering at Rice University in 2007, and was inducted as anACM Fellow in 2008. He holds a B.Tech. degree from the Indian Institute of Technology, Kanpur, an M.S. degree from University of Wisconsin-Madison, and a Ph.D. from Stanford University. Vivek has been serving as a member of the US Department of Energy's Advanced ScientificComputing Advisory Committee (ASCAC) since 2009.