Sequential Graph Neural Networks for Source Code Vulnerability Identification
ABSTRACT
Vulnerability identification constitutes a task of high importance for cyber security. It is quite helpful for locating and fixing vulnerable functions in large applications. However, this task is rather challenging owing to the absence of reliable and adequately managed datasets and learning models. Existing solutions typically rely on human expertise to annotate datasets or specify features, which is prone to error. In addition, the learning models have a high rate of false positives. To bridge this gap, in this paper, we present a properly curated C/C++ source code vulnerability dataset, denoted as CVEFunctionGraphEmbeddings (CVEFGE), to aid in developing models. CVEFGE is automatically crawled from the CVE database, which contains authentic and publicly disclosed source code vulnerabilities. We also propose a learning framework based on graph neural networks, denoted SEquential Graph Neural Network (SEGNN) for learning a large number of code semantic representations. SEGNN consists of a sequential learning module, graph convolution, pooling, and fully connected layers. Our evaluations on two datasets and four baseline methods in a graph classification setting demonstrate state-of-the-art results.
BIO
Anwar Said is a postdoctoral research scholar at Institute for Software Integrated Systems, Department of Computer Science, Vanderbilt University. He received his Ph.D. from the Department of Computer Science at the Information Technology University, Lahore, Pakistan, in 2022. He obtained his MPhil (2016) degree in Computer Science from Quaid-i- Azam University and my MSc degree with distinction from the University of Swat, Pakistan. His research belongs to the area of graph machine learning (GML), an emerging field of research with extensive applications in various domains, including recommendation, forecasting, drug discovery & development, and optimization. In particular, he works on the design of GML approaches to enhance their performance and implement them to solve various real-world problems. He's developed a number of GML approaches and used them to solve problems including circuit design completion, link prediction in Ethereum data, and fraud detection in social networks. In addition, he works in data science, graph theory, and network science. More recently, he has expanded his research to include GML applications in electronic design automation and graph transformers.