On Trojans in Refined Language Models

George Kesidis, David J. Miller, and Jayaram Raghuram
Anomalee Inc., State College, PA & Penn State University, University Park, PA

A Trojan in a language model can be inserted when the model is refined for a particular application such as determining the sentiment of product reviews. In this paper, we clarify and empirically explore variations of the data-poisoning threat model. We then empirically assess two simple defenses each for a different defense scenario. Finally, we provide a brief survey of related attacks and defenses.

George Kesidis received his MS (1990, neural networks and stochastic optimization) and PhD (1992, performance evaluation and networking) in EECS from UC Berkeley. Following eight years as a professor of ECE at the University of Waterloo, he has been a professor of EE and CSE at the Pennsylvania State University since 2000. In the past, his research has been supported by DARPA, DHS, ONR, AFOSR and over a dozen NSF grants, and eight research gifts from Cisco. His current research interests include cloud computing, caching, and secure and robust ML/AI with applications. In 2012, he co-founded a start-up in the AI/ML area.

License: CC-3.0

Submitted by Amy Karns on Fri, 09/06/2024 - 09:42