Learning a Privacy Incidents Database

pdf

ABSTRACT: A repository of privacy incidents is essential for understanding the attributes of products and policies that lead to privacy incidents. We describe our vision for a novel privacy incidents database and our progress toward building a prototype. Key challenges in gathering such a database include bootstrapping and sustainability. We propose a semi-automated framework that can recognize privacy incidents and related information from various online sources such as news, blogs, and social media. The crux of our framework is an incident classifier that identifies whether a piece of text in natural language is related to a privacy incident or not. We curate a dataset consisting of 1324 news articles of which 543 articles are about one or more privacy incidents. We train the incident classifier on this dataset, considering a variety of feature engineering, feature selection, and classification techniques. We find that our incident classifier yields an F1 measure of 93.1%, which is about 12% higher than the keyword search-based baselines we adopt.

Pradeep Murukannaiah is an Assistant Professor in Software Engineering at Rochester Institute of Technology. Pradeep received a PhD in Computer Science from NC State. His research interests include software engineering, data analytics, and usable privacy and security. Pradeep is a member of the ACM and IEEE.

Chinmaya Dabral is a PhD student in Computer Science at NC State. His research interests lie in the application of machine learning to privacy-related problems.

Karthik Sheshadri is a PhD student in Computer Science at NC State. His research interests lie in the application of machine learning and natural language processing to problems in privacy and social computing.

Esha Sharma is pursuing her PhD in Computer Science at NC State. Her research interests include privacy and data analytics.

Jessica Staddon is an Associate Professor of Computer Science and Director of Privacy at NC State. Before joining NCSU in August of 2015, she was a research scientist and manager at Google, an area manager at Xerox PARC, and a research scientist at Bell Labs and RSA Labs. Her interests include usable security and privacy tools, trends in privacy-related attitudes and methods for measuring and predicting privacy-related behaviors, attitudes, and risks. She serves regularly on the program committees of ACM and IEEE sponsored security/privacy conferences and is on the editorial boards of the IEEE Security and Privacy Magazine, Journal of Computer Security and the International Journal of Information and Computer Security and the privacy advisory board of the DARPA’s Information Processing Techniques Office (IPTO). Jessica holds a PhD in Mathematics from U. C. Berkeley.

Tags:
License: CC-2.5
Submitted by Jessica Staddon on