Static Analysis of Programmatically Generated Network Software: Challenges and Synergies
Authors: Jonathan Myers1, Raymond McDowell1, Christopher Rouff1, Doug Williams2, Daniel Bennett3
1JHU/APL, 2Leidos, 3Shavano Systems
Abstract
Code generated programmatically should be better-suited to analysis and verification than hand-written code. Instead, we find that existing static analysis tools are ill-suited to working with programmatically-generated code; paradigms used by the code generators may be misunderstood by static analyzers, leading to numerous false positives. More importantly, the types of code errors introduced through the use of programmatically-generated code may not be identified by current static analysis tools, leading to false negatives. To address this, we have developed a code generation system in tandem with development of an analysis system. The generated code is created with static analysis in mind, and the analyzers are tuned to the code generation system. This allows rigorous analysis of generated code and high confidence in its quality.
We demonstrate that rigorous analysis of programmatically-generated code is possible through our work on AGNES: Automatic Generation of Network Essential Software. In the AGNES system, a network protocol implementation is generated in the C language through a custom-made code generator. As network software is a common target of malicious attack, we seek to generate C code which is demonstrably free of common weaknesses taken from the CWE database. We present an initial set of 13 CWEs, the strategies used to ensure their absence in our generated code, and our experience extending the Clang Static Analyzer, and using KLEE and Frama-C to perform verification.
By carefully choosing and restricting the coding patterns used in code generation and tuning them to our software analysis system, we generate code which is designed for rigorous analysis. We examine the trade-offs between complexity of verification and complexity of code generation.
We demonstrate the use of programmatically generated code contracts as a means for the communication of intent between the code generator and the code verification system. We find that many properties of C code are difficult to infer from the source code alone; code contracts provide a powerful tool for informing analysis, and greatly simplify the process of verification. When writing C code by hand, the addition of code contracts can be burdensome, but programmatically-generated code contracts can be added into our system with minimal additional complexity.
We demonstrate that rigorous validation of programmatically-generated source code is be possible. Through concurrent and integrated development of code generation and static analysis systems, it is possible to achieve deep synergy leading to extremely high-quality analysis and high confidence in the generated code.