Verified Data Structures for Trusted Autonomy: A Compilation Approach

pdf

Presented as part of the 2017 HCSS conference.

ABSTRACT

Problem statement.

Verification of data structures utilized in autonomous systems presents a significant challenge. Autonomy algorithms, such as route planning, inference, and object recognition, are based on complex datatypes, such as directed graphs and algebraic data types.  Proof techniques for these data structures exist, but are oriented to unbounded, functional data types.  Functional data structure implementations are not typically efficient in either space or time. Autonomous systems, on the other hand, generally limit the space and time allocations for any given function, and require that algorithms deliver results within a finite time, or suffer a watchdog timeout. Furthermore, high-assurance design rules, such as mandated by RTCA DO-178C Level A, frown on dynamic memory allocation, preferring simple array-based data structure implementations.

In order to provide efficient implementations of high-level data structures used in autonomous systems with the high assurance needed for accreditation, we have been developing verification techniques that support the “natural†functional proof style, but yet apply to more efficient data structure implementations. Applications discussed include path planning on high-level graphs, and the toolchain features code generation to mainstream programming languages, as well as GPU-based and hardware-based realizations.

The bridge between functional and imperative is hard to traverse.

Our past experience with the formal verification of array-based implementation of algebraic data types indicates that these proofs are extremely difficult, whether one takes a theorem proving, model checking, or symbolic execution approach.  The former approach, while yielding a proof for arbitrary array size, easily gets bogged down in supplementary details (even when using list/array "bridging" constructs such as ACL2 single-threaded objects), whereas the latter two approaches suffer from lack of scalability, leading to timeouts except for very small array sizes -- and unsurprisingly, most practical algorithms operate routinely on hundreds to thousands of data elements.  Therefore, we need a different approach.

Our research approach.

We have taken a verifying compiler approach to this problem.

Algorithms are expressed in a system-level language called DASL, employing a mostly-functional style (we have also experimented with subsets of modern programming languages, such as Swift and Rust), with additional annotations indicating maximum array sizes, etc.  While DASL provides many features found in modern functional languages its design has been primarily influenced by our "middle-end" reasoning goals. Thus our focus has been on realizing a proof-of-concept proof-producing toolchain but not, for example, on providing the ultimate in functional language expressiveness.

Toolchain overview.

An Xtext/Eclipse front-end produces a parse of the DASL input, resulting in an Abstract Syntax Tree (AST).  The compiler middle-end, implemented in Higher-Order Logic (specifically, HOL4), compiles the AST for the functional algebraic datatype into a linearized form requiring no heap allocation or deallocation (NB: allocation/deallocation may be added for systems that need it) by a series of rewrites in the logic.  In the end, the verified compiler produces proofs that compiled data structure operations on the array-based form are equivalent to the same operations on the high-level functional form.  Certain source-level strictures must be obeyed in order to enable these sorts of proofs; for example, the compiler cannot prove that in-place updates are equivalent to functional (copying) updates if old copies of the functional data structure are allowed to exist (similar to the restrictions placed on ACL2 stobj users); nevertheless, experience has shown that these restrictions do not significantly restrict our ability to specify and successfully compile algebraic data types to efficient, array-based implementations.

The compiler output is source text for a choice of conventional programming languages, such as Ada, Java, VHDL, C or ML (we have done some preliminary experimentation with GPU languages such as CUDA and OpenCL, but do not support them yet).  This source is then further compiled to machine code or an FPGA bit file using conventional tools. Since we compile to an array-based implementation, supporting hardware is particularly straightforward (as opposed to the general case, in which we would have to provide a hardware version of a memory allocator and other related runtime functions).  This approach also yields extremely speedy software and hardware-based implementations of functionally-specified algorithms.  Furthermore, if the toolchain user selects ML code generation, we are able to offer a complete verified toolchain and verified execution environment down to the machine code level, using technologies from the CakeML project.  CakeML provides mechanisms to convert HOL4 logic formulas to a subset of Standard ML, as well as a verified compiler from that ML subset to x86 machine code.  We thus can readily perform a verified translation of a DASL program to x86 machine code, and execute that machine code on x86-based hardware.

Presentation.

We will present examples of DASL code, as well as the operation of our toolchain on those examples, producing high-speed implementations of common autonomy data structures with full proofs of correctness.

--

Dr. David S. Hardin has made contributions in the areas of formal methods, computer architecture for High Assurance systems, as well as real-time and embedded Java. He is currently a Principal Research Engineer in the Advanced Technology Center at Rockwell Collins, where he has served as Principal Investigator for several U.S. DoD Research and Development programs. He is the editor of the book Design and Verification of Microprocessor Systems for High-Assurance Applications (Springer 2010), and a co-author of The Real-Time Specification for Java, as well as the Java 2 Micro Edition Connected Device Configuration/Foundation Profile standards. He is author or co-author of more than 40 peer-reviewed publications, and is a co-inventor on eleven U.S. patents. In 1999, Dr. Hardin co-founded aJile Systems, a startup company focused on real-time and embedded Java technology, and served as aJile's Chief Technical Officer from 1999 to 2003. Dr. Hardin was selected as a Rockwell Engineer of the Year for 1997 for his contributions to the development of the world's first Java microprocessor. His academic career includes BSEE and MSEE degrees from the University of Kentucky, and a Ph.D. in Electrical and Computer Engineering from Kansas State University, in 1989. Dr. Hardin is a proud native of the Commonwealth of Kentucky, and is a Kentucky Colonel.

Dr. Konrad Slind has been a member of the Rockwell Collins Advanced Technology Center since 2009. He previously worked at Cambridge University Computer Lab (post-doc), and University of Utah School of Computing (faculty).  His research focuses on the theory and application of formal methods. He has published in a variety of areas: implementation of higher order logic theorem provers, verifying compilation, specification of multiprocessor memory models, verification of cryptographic block ciphers, verification of functional programs, and introduction of formal methods into certification processes.

Recent work has explored formal approaches to fast verified network guards. Slind's current research is on runtime systems for graph-based algorithms intended to execute in autonomous vehicles.

Tags:
License: CC-2.5
Submitted by Katie Dey on