LLM Honeypot: Leveraging large language models as advanced interactive honeypot systems

ABSTRACT

The rapid evolution of cyber threats necessitates innovative solutions for detecting and analyzing malicious activity. Honeypots, which are decoy systems designed to lure and interact with attackers, have emerged as a critical component in cybersecurity. In this paper, we present a novel approach to creating realistic and interactive honeypot systems using Large Language Models (LLMs). By fine-tuning a pre-trained open-source language model on a diverse dataset of attacker-generated commands and responses, we developed a honeypot capable of sophisticated engagement with attackers. Our methodology involved several key steps: data collection and processing, prompt engineering, model selection, and supervised fine-tuning to optimize the model's performance. Evaluation through similarity metrics and live deployment demonstrated that our approach effectively generates accurate and informative responses. The results highlight the potential of LLMs to revolutionize honeypot technology, providing cybersecurity professionals with a powerful tool to detect and analyze malicious activity, thereby enhancing overall security infrastructure.

BIO

Hakan T. Otal is a PhD student in Information Science at the University at Albany, SUNY, majoring in Data Analytics with a minor in Cybersecurity. His research focuses on quantifying and assessing the risks of Artificial Intelligence and Generative AI systems. He has published papers at many conferences on topics including GenAI safety, LLM-based honeypot systems, and AI applications on various domains. Alongside his doctoral research, he has industry experience developing AI and software tools for cyber risk engines and cyber insurance market.

Submitted by Katie Dey on