One major shortcoming of the current "notice and consent" privacy framework is that the constraints for data usage stated in policies—be they stated privacy practices, regulation, or laws—cannot easily be compared against the technologies that they govern. To that end, we are developing a framework to automatically compare policy against practice. Broadly, this involves identifying the relevant data usage policies and practices in a given domain, then measuring the real-world exchanges of data restricted by those rules. The results of such a method will then be used to measure and predict the harms brought onto the data’s subjects and holders in the event of its unauthorized usage. In doing so, we will be able to infer which specific protected pieces of information, individual prohibited operations on that data, and aggregations thereof pose the highest risks compared to other items covered by the policy. This will shed light on the relationship between the unwanted collection of data, its usage and dissemination, and resulting negative consequences.
We have built infrastructure into the Android operating system, whereby we have heavily instrumented both the permission-checking APIs and included network-monitory functionality. This allows us to monitor when an application attempts to access protected data (e.g., PII, persistent identifiers, etc.) and what it does with it. Unlike static analysis techniques, which only detect the potential for certain behaviors (e.g., data exfiltration), executing applications with our instrumentation yields real-time observations of actual privacy violations. The only drawback, however, is that applications need to be executed, and broad code coverage is desired. To date, we have demonstrated that many privacy violations are detectable when application user interfaces are “fuzzed” using random input. However, there are many open research questions about how we can yield better code coverage to detect a wider range of privacy- related events, while doing so in a scalable manner. Towards that end, we plan to virtualize our privacy testbed and integrate crowd-sourcing. By doing this, we will develop new methods for performing privacy experiments that are repeatable, rigorous, and gen- eralizable. The results of these experiments can then be used to implement data-driven privacy controls, address gaps in regulation, and enforce existing regulations.