"Researchers Automated Jailbreaking of LLMs With Other LLMs"

Artificial Intelligence (AI) security researchers from Robust Intelligence and Yale University have developed a Machine Learning (ML) method that can quickly jailbreak Large Language Models (LLMs) and do so in an automated way. According to Robust Intelligence researchers, the Tree of Attacks with Pruning (TAP) method can be used to induce sophisticated models such as GPT-4 and Llama-2 to generate hundreds of harmful responses to a user query in minutes. Their findings imply that this vulnerability is widespread in LLM technology, but they do not see an obvious solution. This article continues to discuss the TAP technique.

Help Net Security reports "Researchers Automated Jailbreaking of LLMs With Other LLMs"

Submitted by grigby1

Submitted by Gregory Rigby on Thu, 12/07/2023 - 11:11