HE-LoRA:- Privacy-Preserving Network Intrusion Detection via Homomorphic Encryption LoRA Fine-Tuning of BERT
ABSTRACT With the advancement of large language models, intelligent data analysis has been transformed through context-aware reasoning across diverse domains. In cybersecurity, these capabilities enable automated intrusion detection and anomaly classification from massive network traffic logs. However, fine-tuning language models on raw network data expose critical privacy risks, as logs may contain sensitive information such as IP addresses, payload signatures, and internal communication parameters. To address these challenges, we propose a privacy-preserving fine-tuning framework that integrates Low-Rank Adaptation (LoRA) with fully homomorphic encryption (HE), enabling model adaptation through rank-decomposed trainable matrices while keeping sensitive network data encrypted throughout training. We introduce IDSFeatureTokenizer — a novel tokenization scheme that converts tabular network flow statistics into BERT-compatible token sequences via robust quantization because network data comprised of numerical values. A 2-layer BERT model (hidden size 768, 6 attention heads) is pre-trained on the Pile corpus and fine-tuned on the CIC-IDS 2017 dataset containing 180,596 labeled network flows. Using LoRA fine-tuning with rank 8, updating only 1.11% of model parameters, the plaintext model achieves 99.85% accuracy, a detection rate of 99.81%, and a false alarm rate of just 0.10%, establishing a strong performance upper bound. Building on the HELLM framework [Rho et al., 2024], we then extend this to privacy-preserving training by performing all forward and backward passes directly on homomorphically encrypted data, ensuring that sensitive network traffic is never exposed in plaintext at any stage of fine-tuning. Unlike the original HELLM setup which required 8 GPUs, our implementation runs on a single NVIDIA RTX 6000 Ada GPU, significantly lowering the hardware barrier for practical deployment. Training on a subset of 45,056 samples under full homomorphic encryption, the encrypted model achieves 70% classification accuracy. While this reflects a reduction compared to the plaintext baseline, the gap is expected given the computational constraints of homomorphic encryption — including limited training data, polynomial approximations of activation functions, and CKKS noise accumulation — and demonstrates that meaningful intrusion detection is feasible even under strict end-to-end data confidentiality guarantees. To the best of our knowledge, this represents the first demonstration of private LoRA fine-tuning for network intrusion detection on the CIC-IDS 2017 dataset, establishing a practical framework for privacy-preserving cybersecurity intelligence. |
| Raushan Kumar Pandit is a second-year PhD student in Computer Science and Engineering at the University of North Texas. |