Spotlight on Lablet Research #18 - Scalable Privacy Analysis
Spotlight on Lablet Research #18 -
Project: Scalable Privacy Analysis
Lablet: International Computer Science Institute (ICSI)
ICSI researchers led by Principal Investigator (PI) Serge Egelman and Co-PI Narseo Vallina-Rodriguez have constructed a toolchain that allows them to automatically perform dynamic analysis on mobile apps to monitor what sensitive personal information the apps attempt to access, and then to whom they transmit it. This is allowing the researchers to perform large-scale studies of the privacy behaviors of the mobile app ecosystem, as well as devise new methods of protecting user privacy.
Governments and private organizations codify expectations of privacy into enforceable policy. These policies have taken such forms as legislation, contracts, and best practices, among others. Common to these rules are definitions of what constitutes private information, and which uses of that information are appropriate or inappropriate. Additionally, policies might place restrictions on what pieces of data may be collected, for what purposes it may be used, how long that data may be retained for yet-unspecified future applications, and under which circumstances (if any) are disclosure and dissemination to other parties permitted.
Different motivations drive different policies. There are procedures and restrictions meant to maintain strategic advantages for holders of sensitive information. The United States government, for instance, routinely classifies information based on the amount of harm to national interests its disclosure would bring. Other policies on data usage seek to protect vulnerable populations by establishing rules limiting how information from those individuals is collected and used: the Family Educational Rights and Privacy Act (FERPA) requires appropriate consent before an individual's educational records are disclosed; the Health Insurance Portability and Accountability Act (HIPAA) regulates the use of Protected Health Information (PHI) by defining what is considered PHI and how individual patients should be de-identified in records prior to aggregation for research purposes; and the Children's Online Privacy Protection Act (COPPA) prohibits the collection of personal information (e.g., contact information and audio/visual recordings) by online services from users under 13 years of age.
The problem is that the constraints for data usage stated in policies—be they stated privacy practices, regulations, or laws—cannot easily be compared against the technologies that they govern. To that end, ICSI researchers propose a framework to automatically compare policy against practice. Broadly, this involves identifying the relevant data usage policies and practices in a given domain, then measuring the real-world exchanges of data restricted by those rules. The results of such a method will then be used to measure and predict the harms brought onto the data's subjects and holders in the event of its unauthorized usage. In doing so, researchers will be able to infer which specific protected pieces of information, individual prohibited operations on that data, and aggregations thereof pose the highest risks compared to other items covered by the policy. This will shed light on the relationship between the unwanted collection of data, its usage and dissemination, and resulting negative consequences. Researchers are currently building a taxonomy of the ways in which apps attempt to detect whether or not they are being monitored (specifically, whether they're running on jailbroken/rooted devices).
The PIs have given numerous talks and media interviews about this project, specifically how apps are tracking users. In 2020, for example, PI Egelman was interviewed by publications including the Washington Post, Consumer Reports, and CNET.
Additional details on the project can be found here.