Can We Use Software Bug Reports to Identify Vulnerability Discovery Strategies?

pdf
Daily horror stories related to software vulnerabilities necessitates the understanding of how vulnerabilities are discovered. Identifi-cation of data sources that can be leveraged to understand how vulnerabilities are discovered could aid cybersecurity researchers to characterize exploitation of vulnerabilities. The goal of the paper is to help cybersecurity researchers in characterizing vulnerabilities by conducting an empirical study of software bug reports. We apply qual-itative analysis on 729, 908, and 5336 open source software (OSS) bug reports respectively, collected from Gentoo, LibreOffice, and Mozilla to investigate if bug reports include vulnerability discovery strategies i.e. sequences of computation and/or cognitive activities that an attacker performs to discover vulnerabilities, where the vulnerability is indexed by a credible source, such as the National Vulnerability Database (NVD).

We evaluate two approaches namely, text feature-based approach and regular expression-based approach to automatically identify bug reports that include vulnerability dis-covery strategies. We observe the Gentoo, LibreOffice, and Mozilla bug reports to include vulnerability discovery strategies. Using text feature-based prediction models, we observe the highest prediction per-formance for the Mozilla dataset with a recall of 0.78. Using the regular expression-based approach we observe recall to be 0.83 for the same dataset. Findings from our paper provide the ground-work for cybersecurity researchers to use OSS bug reports as a data source for advancing the science of vulnerabilities.

Raunak Shakya is a graduate student at the Department of Computer Science, Tennessee Technological University, currently working towards a MS in Computer Science. He graduated with a Bachelor's degree in Engineering from Institute of Engineering, Tribhuvan University, Nepal in 2013. Then he worked as a software developer for more than three and a half years in various software companies based in my hometown Kathmandu, Nepal. His areas of interest include (but not limited to) software engineering, machine learning, data analysis, web application development and business and finance.

Tags:
License: CC-2.5
Submitted by Akond Rahman on