"Diffusion Models Can Be Contaminated with Backdoors, Study Finds"
In the past year, interest has increased in generative Artificial Intelligence (AI) - deep learning models that can generate text, images, and other forms of content. However, like with every technological breakthrough, generative AI poses new security threats. Researchers from IBM, Taiwan's National Tsing Hua University, and the Chinese University of Hong Kong have demonstrated in a new study that malicious actors can implant backdoors in diffusion models with minimum resources. Diffusion is the Machine Learning (ML) architecture used by DALL-E 2 and open-source text-to-image models such as Stable Diffusion. The attack, dubbed BadDiffusion, illustrates the broader security problems of generative AI, which is progressively being integrated into various applications. In a BadDiffusion attack, an adversary modifies the training data and diffusion processes to make the model sensitive to a hidden trigger. When the trained model is presented with the trigger pattern, it generates the desired output intended by the attacker. For example, an attacker can use the backdoor to evade any content restrictions that developers may have implemented for diffusion models. This article continues to discuss the BadDiffusion attack and the researchers' exploration of various methods to detect and remove backdoors from diffusion models.
VB reports "Diffusion Models Can Be Contaminated with Backdoors, Study Finds"