"LLMs Open to Manipulation Using Doctored Images, Audio"

According to Cornell University researchers, attackers could manipulate responses to user prompts from Large Language Models (LLMs) behind Artificial Intelligence (AI) chatbots like ChatGPT by hiding malicious instructions in strategically placed images and audio clips online. Adversaries could use "indirect prompt injection" attacks to redirect users to malicious URLs, collect personal information from users, deliver payloads, and perform other malicious actions. As LLMs become more multimodal or capable of responding contextually to inputs that combine text, audio, pictures, and even video, such attacks may become a significant issue. The researchers developed an attack that injects instructions into multimodal LLMs via images and sounds, causing the model to output attacker-specified text and instructions. This article continues to discuss the possible manipulation of LLMs using hidden instructions in images and audio.

Dark Reading reports "LLMs Open to Manipulation Using Doctored Images, Audio"

Submitted by grigby1

Submitted by Gregory Rigby on Wed, 12/06/2023 - 09:21