Continuous Integration and Formal Methods with Muse, Affix, and 5C
Presented as part of the 2021 HCSS conference.
ABSTRACT
Software systems now pervade most aspects of modern life, and as such their security and reliability are increasingly important: penetration or failure of a buggy component could precipitate the downfall of an entire system. Static program analysis, a kind of formal method, is becoming increasingly effective at identifying bugs in source code, and this success has spurred an interest in greater use of static analysis during development. While some analyzers can be run during code authoring (e.g., as part of an IDE), a “deep” static analysis, which looks closely at a program’s semantics and not just surface-level patterns, is too slow to be used interactively. Work at Facebook found that deploying such analyzers in a “batch mode” to run overnight was ineffective, with bug reports essentially ignored by developers. They found that integrating static analysis i nto a continuous integration (CI) process worked far better. In such a CI process, a developer checks i n their code for review, and the static analyzer flags issues for the developer to consider, alongside i ssues flagged by human reviewers. Facebook found that with such a process, 70% of flagged errors were fixed, whereas 0% were fixed when analysis was run in a batch mode, outside of the standard CI workflow. While Facebook’s experience shows that integrating static analysis tools into a CI-based code review system works well, static analysis is not the only valuable automated reasoning (i.e., formal methods) tool. We might like to also integrate annotation inference tools, which propose changes to the underlying source code. Doing so is different from leaving the code alone and submitting comments for review. We might also like to employ binary analysis tools to reason about binary-only libraries, to assist source code analyzers so they can better reason about how a project’s source code uses those libraries. What might be the best ways to integrate these sorts of tools into the development workflow, alongside static analysis? In this presentation, we report on work to explore integration of tools such as these into the CI-based development process. In particular, we consider how to integrate annotation-inference and binary-analysis tools into the GitHub pull request system using the Muse platform to manage interactions with the user. We describe new APIs that allow a dialog between the developer and the tool that: 1) is stateful and context-aware, 2) allows the developer to stay entirely within the GitHub UI, 3) lowers requirements of the developer’s tool knowledge to account for the fact that most developers will not be formal methods experts and will be only sporadically interacting with these tools. As two case studies we present ongoing work to integrate Muse with (1) 5C, an annotation inference tool that automates conversion of C code to Microsoft’s Checked C extension; and (2) Affix, a model generation tool for binary code. These integrations provide access to advanced formal methods-based capabilities without requiring developers to directly install, configure, or maintain the tools and without intrusive changes to developers’ workflows. Doing so expands the set of developers that can benefit from these technologies and expands the set of interactions for using the tools effectively.
Dr. Stephen Magill was the CEO and co-founder of MuseDev, and is now VP of Product Innovation at Sonatype. He has spent his career developing tools to help developers identify errors, gauge code quality, and detect security issues. Stephen's research has focused on automated program analysis and has spanned the spectrum from analyses for proving security of low-level code to privacy analyses for high-level query languages. Stephen has led multiple large-scale research initiatives including DARPA projects on privacy, security, and code quality. He also served as research lead for the 2020 and 2021 State of the Software Supply Chain reports. Dr. Magill earned his Ph.D. in CS from Carnegie Mellon University, and his BS from the University of Tulsa. He is a member of the University of Tulsa Industry Advisory Board and has served on numerous program committees and funding panels.