"With AI Watermarking, Creators Strike Back: Backdoor Attacks Regulate Unauthorized Uses of Copyrighted or Restricted Data"
Artificial Intelligence (AI) models rely on massive data sets to train their complex algorithms, but the use of these data sets for training purposes can sometimes violate the rights of the data owners. However, proving that a model used a data set without authorization is difficult. In a new study published in IEEE Transactions on Information Forensics and Security, researchers present a method for preventing the unauthorized use of data sets by embedding digital watermarks within them. The technique could give data owners greater control over who can train AI models with their data. Restricting their use, such as with encryption, is the simplest method for protecting data sets but doing so would also make it difficult for authorized users to access these data sets. According to the study's lead author, Yiming Li, the researchers instead focused on determining whether a given AI model was trained using a particular data set. The data owner can flag models discovered to have been impermissibly trained on a data set for follow-up. Li stated that the technique is applicable to a wide variety of Machine Learning (ML) problems, although the study focuses on classification models, including image classification. This article continues to discuss the new method aimed at protecting data sets from unauthorized use by embedding digital watermarks into them.