"The Hacking of ChatGPT Is Just Getting Started"

Alex Polyakov, CEO of the security company Adversa, only needed a couple of hours to break GPT-4. In March, when OpenAI released the latest version of its text-generating Artificial Intelligence (AI)-driven chatbot, Polyakov started entering prompts into the chatbot designed to circumvent OpenAI's safety systems. He eventually had GPT-4 making inappropriate remarks, writing phishing emails, and supporting violence. Polyakov is among a handful of security researchers, technologists, and computer scientists who are devising jailbreaks and prompt injection attacks against ChatGPT and other generative AI systems. The jailbreaking process seeks to create prompts that enable the chatbots to bypass restrictions on producing hateful content or writing about illegal acts. Prompt injection attacks can covertly insert malicious data or instructions into AI models. The attacks are a form of hacking that exploits system vulnerabilities with carefully crafted and refined sentences rather than code. Although the attack types are primarily used to circumvent content filters, security researchers warn that the rush to deploy generative AI systems increases the risk of cybercriminals stealing data and wreaking havoc on the web. Polyakov has developed a "universal" jailbreak that is effective against multiple Large Language Models (LLMs), such as GPT-4, Microsoft's Bing chat system, Google's Bard, and Anthropic's Claude. This article continues to discuss security researchers' work on jailbreaking LLMs to demonstrate the avoidance of safety rules.

Wired reports "The Hacking of ChatGPT Is Just Getting Started"

Submitted by Anonymous on Fri, 04/14/2023 - 09:25