We present work performed at GrammaTech on semi-automated migration of legacy C++ code to Rust. A widely used systems programming language, C++ offers a historically rich set of language features, including low-level memory manipulation capabilities inherited from C. While such features can give rise to efficient programs, memory managed freely by the programmer often results in hard-to-debug program crashes and vulnerabilities.
The Rust programming language regulates memory manipulation by the programmer. It targets a sweet spot between eliminating the potential for memory corruption and retaining the usability of the language. A key concept is that of ownership: each piece of heap-allocated memory is managed by exactly one variable, whose lifetime determines the validity of that memory. Rust’s memory safety guarantees make automated garbage collection unnecessary, paving the way for safe and efficient systems programming of the future. The question remains how to deal with legacy C++ code in active use. Manually rewriting all such code into Rust is not practical.
To address this apparent impasse, GrammaTech is developing an open-source tool called CRAM for the semi-automated migration of C++ code bases into Rust. The goal is to generate Rust code that is not only functionally equivalent to the C++ source, but also human-readable and in fact Rust-idiomatic, enabling further development by human programmers. This combination of goals makes language migration a particularly hard problem. To ensure feasibility, we assume well-designed C++ code as input, where form reflects—not disguises—intent. We solicit assistance from the user in cases where the extraction of intent from the C++ code is ambiguous. Finally, we are developing methods to assure the programmer of the correctness of the migration, ranging from migrated test cases to automated equivalence proofs for code fragments.
CRAM divides the migration process into two stages. The first stage remains entirely in C++ and refactors the code so as to enforce Rust’s memory usage rules in C++. For example, CRAM breaks up nests of mutable (non-const) aliases to the same memory location in C++, preparing the code for migration into a language where such nests are forbidden. These refactorings implement our vision that, while C++ does not enforce safe programming, it does not prohibit it either. Experienced C++ programmers will avoid reference nests. Rust elevates this recommendation to a rule, which we now enforce on the C++ side. Pre-migration code refactoring has significant additional advantages:
During the second stage, migration libraries developed as part of the CRAM project map formal descriptions of programmer intent lifted from C++ code segments to human-readable, idiomatic Rust code. We envision a programmatic but language-agnostic notion of intent. For example, CRAM recognizes loops iterating over data containers, how many iterators they use, whether they iterate forward or backward, etc. Such intent representations can easily be retargeted idiomatically to a language like Rust.
CRAM currently handles a subset of (well-designed) C++, in a largely automated fashion. In experiments, both the refactorings of the C++ code and the migration into Rust have resulted in efficiency improvements of the code. Notably, one Rust program obtained via manual migration by an expert exhibited about the same performance as the CRAM-generated Rust program, although the two programs were rather different.
Thomas Wahl is a Principal Researcher at GrammaTech, Inc., which he joined in 2021, after a long career in academia. Thomas served as a faculty member at Northeastern University in Boston from 2011. He moved to Boston from Oxford/UK, where he worked as a Research Officer in the Computing Laboratory (now Department of Computer Science). Prior to Oxford, Thomas held a postdoctoral position at the Swiss Federal Institute of Technology (ETH) in Zurich. He obtained a PhD degree in Computer Science from the University of Texas at Austin in 2007.
Thomas' interests are in the automated analysis of software, especially using mathematically rigorous model-theoretic or proof-theoretic techniques (AKA Formal Methods), in order to identify critical program errors or vulnerabilities.
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-22-C0025. The views, opinions, and/or findings expressed in this document are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
DISTRIBUTION STATEMENT A. Approved for Public Release. Distribution Unlimited.