"Boffins Fool AI Chatbot Into Revealing Harmful Content – With 98 Percent Success Rate"
"Boffins Fool AI Chatbot Into Revealing Harmful Content – With 98 Percent Success Rate"
Purdue University researchers have developed a method for interrogating Large Language Models (LLMs) in a way that almost always breaks their etiquette training. LLMs such as Bard, ChatGPT, and Llama are trained on large datasets that may contain questionable or harmful information. Artificial Intelligence (AI) giants like Google, OpenAI, and Meta try to "align" their models using "guardrails" to prevent chatbots based on these models from generating harmful content.