Fine Tuning Large Language Model for Secure Code Generation | |
---|---|
Author | |
Abstract |
AI pair programmers, such as GitHub s Copilot, have shown great success in automatic code generation. However, such large language model-based code generation techniques face the risk of introducing security vulnerabilities to codebases. In this work, we explore the direction of fine-tuning large language models for generating more secure code. We use real-world vulnerability fixes as our fine-tuning dataset. We craft a code-generation scenario dataset (C/C++) for evaluating and comparing the pre-trained and fine-tuned models. Our experiments on GPT-J show that the fine-tuned GPT-J achieved 70.4\% and 64.5\% ratios of non-vulnerable code generation for C and C++, respectively, which has a 10\% increase for C and a slight increase for C++ compared with the pre-trained large language model. |
Year of Publication |
2024
|
Date Published |
apr
|
URL |
https://ieeexplore.ieee.org/document/10599549
|
DOI |
10.1145/3650105.3652299
|
Google Scholar | BibTeX | DOI |