Grading Open Source Software Development Practices
ABSTRACT
One of the most critical decisions a developer makes is the choice of which open source libraries to include in their application. Each library brings important functionality, but also imposes a maintenance burden. 85% of the vulnerabilities in open source software projects originate in the project’s dependencies rather than the project code itself. As such, choosing libraries that prioritize security can make a substantial difference in the number of security issues a project has to deal with. OpenSSF’s Security Scorecard project attempts to bring transparency to security practices so that consumers of open source can make informed decisions about library choice. An open source project’s Scorecard measures which of 17 different security best practices that project implements and includes an aggregate score that summarizes this information. In this work, we examine the scorecard system and show that while the aggregate Scorecard score does not have a statistically significant relationship with vulnerability history, we can train a machine learning model to predict vulnerability status based on the Scorecard checks with 85% accuracy. Adding a measure of how quickly a project updates its dependencies increases accuracy to 92%. These machine learning based models can then be used to derive a new metric that tracks vulnerability status much more closely than the Scorecard score does. This gives developers an easy to consume assessment of the likelihood that a given project will be a source of security issues. The strong connection with vulnerability status makes this metric useful for decisions about which library to use for a given purpose (e.g. “should we use Jackson or gson for JSON parsing?”)
Additionally, we can use these ML models to determine which Scorecard checks have the greatest impact on security. Not surprisingly, code review emerges as the most important of the included security practices. Code review has long been recognized by developers as a high-value practice, but to our knowledge this is the largest-scale study to quantitatively demonstrate the impact of code review on security (our dataset for this work includes over 35k open source projects). Other high-impact practices include pinning dependencies, keeping binaries out of the repository, and enabling branch protection.
In this talk we will discuss this work, present details of the datasets and ML models, and discuss open problems in the area of software quality assessment
Author
Dr. Stephen Magill was the CEO and co-founder of MuseDev, and is now VP of Product Innovation at Sonatype. He has spent his career developing tools to help developers identify errors, gauge code quality, and detect security issues. Stephen has published extensively on the topics of program analysis, privacy, and machine learning and has led multiple large-scale research initiatives including DARPA projects on privacy, security, and code quality. He has also served as a research lead for the Sonatype State of the Software Supply Chain report since 2019. Dr. Magill earned his Ph.D. in Computer Science from Carnegie Mellon University, and his BS from the University of Tulsa.