WiP: COG-IMMUNE Toward a Cognitive Immune System for Large Language Models
ABSTRACT Tool-using and agentic systems built around large language models can be attacked over time through long-context manipulation, indirect prompt injection, and unsafe tool use. This work presents COG-IMMUNE, an immune-inspired security architecture that treats LLM safety as a closed-loop assurance problem. The framework combines continuous sensing over execution-time signals, graded containment policies that restrict actions under rising risk, and compact evidence objects that provide tamper-evident auditability under constrained links. The paper defines a threat model for interactive deployments, introduces measurable security metrics including time-to-detect, time-to-contain, and unsafe-action rate, and outlines an evaluation plan spanning jailbreaks, indirect prompt injection, and tool-use attack scenarios. The approach is motivated by autonomy settings with delayed oversight, including safety-critical and space systems, where secure operation depends on both safe fallback behavior and verifiable telemetry. |
| Sylvester Kaczmarek conducted PhD research at Imperial College London in secure, adaptive, and interpretable anomaly detection and runtime assurance for autonomous systems. His work focuses on trustworthy AI, agentic security, and assurance mechanisms for safety-critical robotics and space applications. |