Fine Tuning Large Language Model for Secure Code Generation
Author
Abstract

AI pair programmers, such as GitHub s Copilot, have shown great success in automatic code generation. However, such large language model-based code generation techniques face the risk of introducing security vulnerabilities to codebases. In this work, we explore the direction of fine-tuning large language models for generating more secure code. We use real-world vulnerability fixes as our fine-tuning dataset. We craft a code-generation scenario dataset (C/C++) for evaluating and comparing the pre-trained and fine-tuned models. Our experiments on GPT-J show that the fine-tuned GPT-J achieved 70.4\% and 64.5\% ratios of non-vulnerable code generation for C and C++, respectively, which has a 10\% increase for C and a slight increase for C++ compared with the pre-trained large language model.

Year of Publication
2024
Date Published
apr
URL
https://ieeexplore.ieee.org/document/10599549
DOI
10.1145/3650105.3652299
Google Scholar | BibTeX | DOI