"Can AI Really Be Protected from Text-Based Attacks?"
Microsoft's Bing Chat, an Artificial Intelligence (AI)-powered chatbot co-developed with OpenAI, was not available for long before users devised ways to break it. Users got it to declare love, threaten harm, and more by providing carefully crafted inputs. The question is whether or not AI can ever be protected from these malicious prompts. It was triggered by malicious prompt engineering, or when an AI, such as Bing Chat, that uses text-based instructions or prompts to do tasks, is deceived by adversarial prompts. Bing Chat was not designed to write neo-Nazi propaganda, but because it was trained on enormous volumes of material from the Internet, it is prone to sliding into undesirable patterns. Adam Hyland, a Ph.D. student in the Human Centered Design and Engineering program at the University of Washington, compared prompt engineering to an escalation of privilege attack. In a privilege escalation attack, a hacker gains access to resources, such as memory, typically restricted to them because an audit did not capture all possible exploits. According to Hyland, the behavior of Large Language Models (LLMs) such as Bing Chat is not well understood. The interaction being exploited is the LLM's response to text input. The models are designed to continue text sequences. An LLM such as Bing Chat or ChatGPT generates the expected response based on the data provided by the designer and the user's prompt string. Some of the prompts resemble social engineering hacks, as if one were attempting to mislead a human into divulging their secrets. This article continues to discuss the protection of AI from text-based attacks.
TechCrunch reports "Can AI Really Be Protected from Text-Based Attacks?"