Formalizing and Evaluating Checked C
Presented as part of the 2021 HCSS conference.
ABSTRACT
The C programming language remains extremely popular despite the emergence of new, modern languages. Unfortunately, C programs lack spatial memory safety, which has long made them susceptible to a host of devastating vulnerabilities, including buffer overflows and out-of-bounds reads/writes. Checked C, developed open-source by Microsoft starting in 2015, extends the C language with new types and annotations whose use can ensure a program’s spatial safety. Importantly, Checked C supports development that is incremental and compositional. Individual code regions (e.g., functions or whole files) designated as checked are sure to enforce spatial safety, a property which is preserved via composition with other checked regions. But not all regions must be checked: Checked C’s checked pointers are binary-compatible with legacy pointers, and may coexist in the same code, which permits a deliberate refactoring process. In prior work, we developed a core formalization of Checked C in which we proved that checked code cannot be blamed: any spatial safety violation can only be attributed to code that is not yet fully checked.
Recently, we have been developing a NULL-terminated (NT) checked pointer type in Checked C, whose syntax is nt_array_ptr<T> (for any type T ). Programmers familiar with C’s standard string library will be quite familiar with NT pointers. But spatially safe NT pointers are not present in modern type-safe languages (such as Java, Scala, Haskell, Rust, etc.), which typically pair dynamically-sized buffers with metadata to indicate the buffer’s length and capacity. While some prior safe-C implementations (such as CCured, Cyclone, and Deputy) have supported NT pointers, no prior work of which we are aware has formalized and proved them safe, nor has it assessed their performance against non-NT alternatives. We fill these gaps in our work.
Normal pointers to arrays have a fixed length, e.g., array_ptr<char> p:count(n) indicates that p has length n (where n is another variable). Declaration nt_array_ptr<char> p:count(n) indicates that p’ s length is at least n, but further capacity is present if p[n] != NULL. As such, in statements if (*p) { … } and p’ s type is nt_array_ptr<T> count(0), then the compiler type-checks the branch … with p typed as nt_array_ptr<T> count(1), i .e., 1 more than it was, since the character at the current-known length is non-NULL. Such flow sensitivity supports checking common idioms involving pointer arithmetic, e.g., as in the implementation of strlen. An nt_array_ptr<T> count(n) can be safely converted to an array_ptr<T> count(n-1), for better library code reuse, as well as the reuse of the bounds-check elimination optimization in the Checked C compiler itself.
In our work we extend our earlier formalism with checked NT pointers and re-prove our compositional proof of blame (done i n the Coq proof assistant); our formalism also includes lightweight dependent and function types, and other extensions. We have also implemented checked NT pointers in the Checked C compiler, and evaluated their effectiveness on a series of benchmarks; we use these benchmarks to show the advantages of our support compared to refactoring the code to use normal arrays. Our implementation, code, and benchmarks are (or will be) available i n the Checked C Github repository.
Michael W. Hicks is a Professor in the Computer Science Department at the University of Maryland, where he has been since 2002, and the CTO of Correct Computation, Inc. (CCI), a role he has held since late 2018.
He is a leading member of the software security community, having carried out diverse and award-winning research published in top venues covering computer security, programming languages, systems, and software engineering. He was the first Director of the University of Maryland's Cybersecurity Center (MC2), and was the elected Chair of ACM SIGPLAN, the Special Interest Group on Programming Languages; he now edits its blog, PL Perspectives, at https://blog.sigplan.org. Over his career, he has published more than 125 peer-reviewed scientific articles, and his research has twice won the NSA’s Best Scientific Cybersecurity Paper competition; he is the only two-time winner. A key recent focus at UMD and CCI has been to explore how to make legacy software written in low-level languages, like C, more secure. He has been collaborating on the Checked C project, carrying on a nearly 20-year arc of work that started with the Cyclone safe programming language, which itself was a strong influence on Rust, a next-generation language. He has also currently looking at synergies between cryptography and programming languages; techniques for better random (fuzz) testing and probabilistic reasoning; and high-assurance tools and languages for quantum computing.