A String, Regular Expression, and Integer Solver for Bug-finding and Security
Presented as part of the 2016 HCSS conference.
ABSTRACT
In recent years, string solvers have become an essential component in many formal-verification, security-analysis and bug-finding tools. Such solvers typically support a theory of string equations, the length function as well as the regular-expression membership predicate. These enable considerable expressive power, which comes at the cost of slow solving time, and in some cases even nontermination. We present two techniques, designed for word-based SMT string solvers, to mitigate these problems: (i) sound and complete detection of overlapping variables, which is essential to avoiding common cases of nontermination; and (ii) pruning of the search space via bi-directional integration between the string and integer theories, enabling new cross-domain heuristics. We have implemented both techniques atop the Z3-str solver, resulting in a significantly more robust and efficient solver, dubbed Z3str2, for the quantifier-free theory of string equations, the regular-expression membership predicate and linear arithmetic over the length function. We report on a series of experiments over four sets of challenging real-world benchmarks, where we compared Z3str2 with five different string solvers: S3, CVC4, Kaluza, PISA and Stranger. Each of these tools utilizes a different solving strategy and/or string representation (based e.g. on words, bit vectors or automata). The results point to the efficacy of our proposed techniques, which yield dramatic performance improvement. We argue that the techniques presented here are of broad applicability, and can be integrated into other SMT-backed string solvers to improve their performance.
BIO
Yunhui Zheng is a Research Staff Member at the IBM T.J. Watson Research Center. His interest lies in program analysis of the web and mobile applications for testing, debugging, verification and vulnerability detection. He is also interested in string analysis that integrates (string) constraint modeling and solving techniques into program analysis. He is the main author of the string constraint solver Z3-str/Z3str2.
He received his PhD in Computer Science from Purdue University. For his thesis, he investigated techniques for static web application analysis and string constraint solving.