Privacy Policy Analysis with Sentence Classification
Author
Abstract

Privacy Policies - Privacy policies inform users of the data practices and access protocols employed by organizations and their digital counterparts. Research has shown that users often feel that these privacy policies are lengthy and complex to read and comprehend. However, it is critical for people to be aware of the data access practices employed by the organizations. Hence, much research has focused on automatically extracting privacy-specific artifacts from the policies, predominantly by using natural language classification tools. However, these classification tools are designed primarily for the classification of paragraphs or segments of the policies. In this paper, we report on our research where we identify the gap in classifying policies at a segment level, and provide an alternate definition of segment classification using sentence classification. To this aid, we train and evaluate sentence classifiers for privacy policies using BERT and XLNet. Our approach demonstrates improvements in prediction quality of existing models and hence, surpasses the current baselines for classification models, without requiring additional parameter and model tuning. Using our sentence classifiers, we also study topical structures in Alexa top 5000 website policies, in order to identify and quantify the diffusion of information pertaining to privacy-specific topics in a policy.

Year of Publication
2022
Date Published
aug
Publisher
IEEE
Conference Location
Fredericton, NB, Canada
ISBN Number
978-1-66547-398-9
URL
https://ieeexplore.ieee.org/document/9851977/
DOI
10.1109/PST55820.2022.9851977
Google Scholar | BibTeX | DOI