"Honeypot Security Technique Can Also Stop Attacks in Natural Language Processing"
The growing sophistication of online fake news detectors and spam filters is accompanied by the increasing advancement of attacker's methods for tricking them. These methods include attacks through the "universal trigger." This learning-based method involves using a phrase or set of words to fool an indefinite number of inputs. A successful universal trigger-based attack could result in the distribution of more fake news on social media feeds and more spam in email inboxes. Researchers at the Penn State College of Information Sciences and Technology have developed a Machine Learning (ML) framework capable of defending against these types of attacks in natural language processing applications 99 percent of the time. The researchers borrowed a technique commonly used in cybersecurity to defend against universal trigger-based attacks in the development of the ML framework. The model, called DARCY, uses a honeypot to bait and detect potential attacks on natural language processing languages, such as fake news detectors and spam filters. The honeypot lures attackers using the words and phrases they are targeting in their attempted hack. DARCY searches and injects multiple trapdoors into the technology that handles natural language processing applications, known as a textual neural network, in order to catch and filter out malicious content created by universal trigger-based attacks. DARCY is believed to be the first work that employs the honeypot concept from the realm of cybersecurity to defend textual neural network models against adversarial attacks. The researchers tested DARCY on four different text classification datasets. They used the framework to defend against six different potential attack scenarios, and it outperformed five existing adversarial detection algorithms that served as defensive baselines. This article continues to discuss the concept, testing, and effectiveness of DARCY.