Combining Property-based Testing and Fuzzing

pdf

Presented as part of the 2019 HCSS conference.

Property-based random testing, popularized by QuickCheck [2], is a form of random testing in which the developer writes properties about input/output pairs that the system under test (SUT) should satisfy. Properties can be fairly sophisticated. For example, suppose we have a function f :: Int -> Int-> (Int, Int) that takes a high (secret) and low (public) argument and produces a pair of high,low outputs. QuickCheck can check whether f satisfies the noninterference property, which is expressed as a function p_nonintf that returns true if and only if two calls to f produce indistinguishable outputs when given indistinguishable inputs. Any two high values are indistinguishable, while only equal low values are. QuickCheck will repeatedly generate pairs of inputs (each input is, itself, a pair) and using them call p_nonintf to confirm that f maps indistinguishable inputs to indistinguishable outputs.

QuickCheck produces random inputs by repeatedly calling a generator function. Where does the generator come from? Sometimes it is easily written manually, or even completely automated [5]. However, for interesting target programs and properties, automated generators fail to find bugs, and good ones written by hand are labor-intensive. Oftentimes, the root of the problem is that random data of the required type fails to satisfy certain well-formedness conditions assumed by the property or program. For example, a function that operates on lists might implicitly assume it never receives an empty list. To test p_nonintf, the generator must produce indistinguishable pairs of high and low inputs. But doing so is very unlikely: it requires randomly generating two low inputs that are exactly the same.

We have been working on what appears to be a promising solution to this problem: exploit recent advances in the area of fuzz testing [8]. Fuzz testing works for file-processing programs and focuses on the property that the SUT does not crash due to an OS- level exception. Fuzzing has the same basic problem as any form of random testing: some target programs may expect inputs with structural constraints that random binary data are very unlikely to satisfy. Modern fuzzers, such as AFL [1], address this problem by employing coverage-guided, mutation-based input generation (CBMBIG). Roughly speaking, control- flow coverage information gathered during each run is used to determine whether an input is “interesting”— whether it has executed previously unreached parts of the program—and if so that input is used as the basis of subsequent mutation. CBMBIG can allow the fuzzer to discover fairly complicated input formats [7].

We wonder: Can CBMBIG be applied to property testing in a way that reduces or eliminates the need to manually write good generators?

To answer that, we have produced QCFuzz, which extends QuickChick [3], a modern variant of QuickCheck for the Coq proof assistant [6], with CBMBIG support. QCFuzz’s basic idea is to use both automatically derived QuickChick generators to produce inputs, as well as high-level mutation operators to slightly “fuzz” such inputs. These mutation operators are also derived automatically, adapting the style of AFL’s bit-level mutators to particular types. Coverage information is then used to decide whether to keep inputs for subsequent mutation, or whether to resort to standard, from-scratch input generation.

We evaluated QCFuzz by using it to automatically test two Coq developments of secure machines which aim to enforce flavors of noninterference. We systematically introduced bugs in the security checks these machines perform. For property testing to find such bugs would require generating a pair of indistinguishable programs and seeing that their outputs are not indistinguishable. But such programs have many well-formedness assumptions that are tedious to ensure by hand, in a generator. With QCFuzz, we hope that automated generators and mutators will be sufficient when coupled with coverage information.

We compared QCFuzz against QuickChick by measuring mean time to failure, to see which finds bugs fastest (if at all). In both cases we used fully automatic generators, and also by-hand optimized ones. We found that the fuzzing-assisted approach dramatically outperforms random testing when using the naive, automatically derived generators to produce testing inputs. However, if we spend the time and effort to write a significantly smarter generator by hand (like the ones that were considered a research contribution on their own in [4]), bugs can be discovered faster still. The open question: Can we further advance automation to achieve the same performance?

References:

[1] American fuzzing lop (afl). http://lcamtuf. coredump.cx/afl/, 2019.

[2] Koen Claessen and John Hughes. QuickCheck:a lightweight tool for random testing of Haskell programs. In 5th ACM SIGPLAN International Conference on Functional Programming (ICFP), pages 268–279. ACM, 2000.

[3] Maxime Dénès, C˘at˘alin Hri¸tcu, Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C. Pierce. QuickChick: Property-based testing for Coq. The Coq Workshop, July 2014.

[4] Cătălin Hriţcu, Leonidas Lampropoulos, Antal Spector-Zabusky, Arthur Azevedo de Amorim, Maxime Dénès, John Hughes, Benjamin C. Pierce, and Dimitrios Vytiniotis. Testing noninterference, quickly. Journal of Functional Programming (JFP); Special issue for ICFP 2013, 26:e4 (62 pages), April 2016. Technical Report available as arXiv:1409.0393.

[5] Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C. Pierce. Generating good generators for inductive relations. PACMPL, 2(POPL):45:1–45:30, 2018.

[6] Leonidas Lampropoulos and Benjamin C. Pierce. QuickCHick: Property-Based Testing In Coq. Software Foundations series, volume 4. Electronic textbook, August 2018.

[7] lcamtuf. afl-generated, minimized image test sets (partial). http://lcamtuf.coredump.cx/afl/ demo/, April 2019.

[8] Barton P. Miller, Louis Fredriksen, and Bryan So. An empirical study of the reliability of unix utilities. Commun. ACM, 33(12):32–44, December 1990.

Benjamin Pierce is Henry Salvatori Professor of Computer and Information Science at the University of Pennsylvania and a Fellow of the ACM. His research interests include programming languages, type systems, language-based security, computer-assisted formal verification, differential privacy, and synchronization technologies. He is the author of the widely used graduate textbooks Types and Programming Languages and Software Foundations. He has served as co-Editor in Chief of the Journal of Functional Programming, as Managing Editor for Logical Methods in Computer Science, and as editorial board member of Mathematical Structures in Computer Science, Formal Aspects of Computing, and ACM Transactions on Programming Languages and Systems. He holds a doctorate honoris causa from Chalmers University. He is also the lead designer of the popular Unison file synchronizer.

Tags:
License: CC-2.5
Submitted by Katie Dey on