"Microsoft's 'AI Watchdog' Defends Against New LLM Jailbreak Method"

Microsoft has discovered a new method for jailbreaking Large Language Model (LLM) Artificial Intelligence (AI) tools and has revealed its continued efforts to improve LLM safety and security. Microsoft described the "Crescendo" LLM jailbreak method in a recent paper, delving into how an attacker can send a series of seemingly benign prompts to gradually lead a chatbot, such as OpenAI's ChatGPT, Google's Gemini, Meta's LlaMA, or Anthropic's Claude, to deliver output that the LLM model would normally filter and refuse. According to Microsoft researchers, a successful attack can typically be completed in fewer than ten interaction turns, and some versions have a 100 percent success rate against the tested models. This article continues to discuss the new LLM jailbreak method and the defense against it.

SC Media reports "Microsoft's 'AI Watchdog' Defends Against New LLM Jailbreak Method"

Submitted by grigby1

Submitted by Gregory Rigby on Tue, 04/16/2024 - 15:48