This research aims at aiding administrators of virtualized computing infrastructures in making services more resilient to security attacks through applying machine learning to reduce both security and functionality risks in software patching by continually monitoring patched and unpatched software to discover vulnerabilities and triggering proper security updates.
According to Nissenbaum’s theory of contextual integrity (CI), protecting privacy means ensuring that personal information flows appropriately; it does not mean that no information flows (e.g., confidentiality), or that it flows only if the information subject allows it (e.g., control). Flow is appropriate if it conforms to legitimate, contextual informational norms. Contextual informational norms prescribe information flows in terms of five parameters: actors (sender, subject, recipient), information types, and transmission principles. Actors and information types range over respective contextual ontologies. Transmission principles (a term introduced by the theory) range over the conditions or constraints under which information flows, for example, whether confidentially, mandated by law, with notice, with consent, in accordance with subject's preference, and so on. The theory holds that our privacy expectations are a product of informational norms, meaning that people will judge particular information flows as respecting or violating privacy according to whether or not—in the first approximation—they conform to contextual informational norms. If so, we say contextual integrity has been preserved.
The theory has been recognized in policy arenas, has been formalized, has guided empirical social science research, and has shaped system development. Yet, despite resolving many longstanding privacy puzzles and its promising potential in practical realms, its direct application to pressing needs of design and policy has proven challenging. One challenge is that the theory requires knowledge of data flows, and in practice, systems may not be able to provide this, particularly once data leaves a device. The challenge of bridging theory and practice, in this case, grounding scientific research and design practice in the theory of CI, is not only tractable, but with sufficient effort devoted to operationalizing the relevant concepts, could enhance our methodological toolkit for studying individuals’ understandings and valuations of privacy in relation to data-intensive technologies and principles to guide design.
In our view, capturing people’s complex attitudes toward privacy, including expectations and preferences in situ, will require methodological innovation and new techniques that apply the theory of contextual integrity. These methodologies and techniques have to accommodate the five independent parameters of contextual norms, scale to diverse contexts in which privacy decision-making takes place, and be sensitive not only to the variety of preferences and expectations within respective contexts, but to distinguish preferences from expectations. What we learn about privacy attitudes by following such methods and techniques should serve in the discovery and identification of contextual information norms, and yield results that are sufficiently rigorous to serve as a foundation for the design of effective privacy interfaces. The first informs public policy and law with information about what people generally expect and what is generally viewed as objectionable; the second informs designers not only about mechanisms to help people to make informed decisions, but also what substantive constraints on flow should or could be implemented within design. Instead of ubiquitous “notice and choice” regimes, the project will aim to identify situations where clear norms, for example, those identified through careful study, can be embedded in technology (systems, applications, platforms) as constraints on flow and where no such norms emerge, variations may be selected according to user preferences. Thus, this project will yield a set of practical, usable, and scalable technologies and tools that can be applied to both existing and future technologies, thereby providing a scientific basis for future privacy research.
Serge Egelman is the Research Director of the Usable Security and Privacy group at the International Computer Science Institute (ICSI), which is an independent research institute affiliated with the University of California, Berkeley. He is also Chief Scientist and co-founder of AppCensus, Inc., which is commercializing his research by performing on-demand privacy analysis of mobile apps for compliance purposes. He conducts research to help people make more informed online privacy and security decisions, and is generally interested in consumer protection. This has included improvements to web browser security warnings, authentication on social networking websites, and most recently, privacy on mobile devices. Seven of his research publications have received awards at the ACM CHI conference, which is the top venue for human-computer interaction research; his research on privacy on mobile platforms has received the Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies, the USENIX Security Distinguished Paper Award, and privacy research awards from two different European data protection authorities, CNIL and AEPD. His research has been cited in numerous lawsuits and regulatory actions, as well as featured in the New York Times, Washington Post, Wall Street Journal, Wired, CNET, NBC, and CBS. He received his PhD from Carnegie Mellon University and has previously performed research at Xerox Parc, Microsoft, and NIST.
Privacy governance for Big Data is challenging—data may be rich enough to allow the inference of private information that has been removed, redacted, or minimized. We must protect against both malicious and accidental inference, both by data analysts and by automated systems. To do this, we are extending existing methods for controlling the inference risks of common analysis tools (drawn from literature on the related problem of nondiscriminatory data analysis). We are coupling these methods with auditing tools such as verifiably integral audit logs. Robust audit logs hold analysts accountable for their investigations, reducing exposure to malicious sensitive inference. Further, tools for controlling information flow and leakage in analytical models eliminate many types of accidental sensitive inference. Together, the analytical guarantees of inference-sensitive data mining technologies and the record-keeping functions of logs can create a methodology for truly accountable systems for holding and processing private data.
This project will deliver a data governance methodology enabling more expressive policies that rely on accountability to enable exploration of the privacy consequences of Big Data analysis. This methodology will combine known techniques from computer science—including verification using formal methods—with principles from the study of privacy-by-design and with accountability mechanisms. Systems subject to this methodology will generate evidence both before they operate (in the form of guarantees about when and how data can be analyzed and disclosed – e.g., "differential privacy in this database implies that no analysis can determine the presence or absence of a single row.") and as they operate (in the form of audit log entries that describe how the system's capabilities were actually used – e.g. "the marketing application submitted a query Q which returned a set of rows R" – and records this information in a way that demonstrates that these audit materials could not have been altered or forged). While the techniques here are not novel, they have not been previously synthesized into an actionable methodology for practical data governance, especially as it applies to Big Data. Information gleaned by examining this evidence can be used to inform the development of traditional-style data access policies that support full personal accountability for data access.
To demonstrate that the output methodology is actionable, this project will also produce a set of generalizable design patterns applying the methodology to real data analysis scenarios drawn from interviews with practitioners in industry and government. These patterns will then inform practical use of the methodology. Together, the new methodology and design patterns will provide the start of a new generation of data governance approaches that function properly in the era of Big Data, machine learning, and automated decision-making. This project will relate the emerging science of privacy to data analysis practice, governance, and compliance and show how to make the newest technologies relevant, actionable, and useful.
Methods, approaches, and tools to identify the correct conceptualization of privacy early in the design and engineering process are important. For example, early whole body imaging technology for airport security were analyzed by the Department of Homeland Security through a Privacy Impact Assessment, focusing on the collection of personally identifiable information finding that the images of persons’ individual bodies were not detailed enough to constitute PII, and would not pose a privacy problem. Nevertheless, many citizens, policymakers, and organizations subsequently voiced strong privacy ob- jections: the conception of privacy as being about the collection of PII did not cover the types of privacy concerns raised by stakeholders, leading to expensive redesigns to ad- dress the correct concepts of privacy (such as having the system display an outline of a generic person rather than an image of the specific person being scanned). In this project, we will investigate current tools, methods and approaches being utilized by engineers and designers to identify and address privacy risks and harms.
To help address gaps and shortcomings that we find in current tools and approaches, we are adapting design research techniques—traditionally used to help designers and engineers to explore and define problem spaces in grounded, inductive, and generative ways—to specifically address privacy. This builds on a tradition of research termed "values in design," which seeks to identify values and create systems that better recognize and address them. Design methods, including card activities, design scenarios, design workbooks, and design probes, can be used by engineers or designers of systems, and/or can be used with other stakeholders of systems (such as end-users). These methods help foster discussion of values, chart the problem space of values, and are grounded by specific contexts or systems. These methods can be deployed during early ideation stages of a design process, during or after the design process as an analytical tool, or as part of training and educating. We suggest that design approaches can help explore and define the problem space of privacy and identify and define privacy risks (including, but also going beyond unauthorized use of data), leveraging the contextual integrity framework.
As part of this project, we are creating, testing, validating, and deploying a set of privacy-focused tools and approaches that can be used to help train engineers and designers to identify, define and analyze the privacy risks that need to be considered when designing a system, as part of privacy engineering.
- Select [Edit]
- Please add the project description here.
- Add the names of the PIs, Co-PIs, and Researchers below (they must have accounts on the VO)
- Select the correct hard problem from the vocabulary terms on the right
This project considers models for secure collaboration and contracts in a decentralized environment among parties that have not established trust. A significant example of this is blockchain programming, with platforms such as Ethereum and HyperLedger. There are many documented defects in secure collaboration mechanisms, and some have been exploited to steal money. Our approach builds two kinds of models to address these defects: typestate models to mitigate re-entrancy-related vulnerabilities, and linear types to model and statically detect an important class of errors involving money and other transferrable resources.
The project research will include both technical and usability assessment of these two ideas. The technical assessment addresses the feasibility of sound and composable static analyses to support these two semantic innovations. The usability assessment focuses on the ability of programmers to use Obsidian effectively to write secure programs with little training. A combined assessment would focus on whether programmers are more likely to write correct, safe code with Obsidian than with Solidity, and with comparable or improved productivity.
Jonathan Aldrich is an Associate Professor of the School of Computer Science. He does programming languages and software engineering research focused on developing better ways of expressing and enforcing software design within source code, typically through language design and type systems. Jonathan works at the intersection of programming languages and software engineering. His research explores how the way we express software affects our ability to engineer software at scale. A particular theme of much of his work is improving software quality and programmer productivity through better ways to express structural and behavioral aspects of software design within source code. Aldrich has contributed to object-oriented typestate verification, modular reasoning techniques for aspects and stateful programs, and new object-oriented language models. For his work specifying and verifying architecture, he received a 2006 NSF CAREER award and the 2007 Dahl-Nygaard Junior Prize. Currently, Aldrich excited to be working on the design of Wyvern, a new modularly extensible programming language.
Effective response to security attacks often requires a combination of both automated and human-mediated actions. Currently we lack adequate methods to reason about such human-system coordination, including ways to determine when to allocate tasks to each party and how to gain assurance that automated mechanisms are appropriately aligned with organizational needs and policies. In this project, we develop a model-based approach to (a) reason about when and how systems and humans should cooperate with each other, (b) improve human understanding and trust in automated behavior through self-explanation, and (c) provide mechanisms for humans to correct a system’s automated behavior when it is inappropriate. We will explore the effectiveness of the techniques in the context of coordinated system-human approaches for mitigating advanced persistent threats (APTs).
Building on prior work that we have carried out in this area, we will show how probabilistic models and model checkers can be used both to synthesize complex plans that involve a combination of human and automated actions, as well as to provide human understandable explanations of mitigation plans proposed or carried out by the system. Critically, these models capture an explicit value system (in a multi-dimensional utility space) that forms the basis for determining courses of action. Because the value system is explicit we believe that it will be possible to provide a rational explanation of the principles that led to a given system plan. Moreover, our approach will allow the user to make corrective actions to that value system (and hence, future decisions) when it is misaligned. This will be done without a user needing to know the mathematical form of the revised utility reward function.
David Garlan is a Professor in the School of Computer Science at Carnegie Mellon University. His research interests include:
- software architecture
- self-adaptive systems
- formal methods
- cyber-physical system
Dr. Garlan is a member of the Institute for Software Research and Computer Science Department in the School of Computer Science.
He is a Professor of Computer Science in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. from Carnegie Mellon in 1987 and worked as a software architect in industry between 1987 and 1990. His research interests include software architecture, self-adaptive systems, formal methods, and cyber-physical systems. He is recognized as one of the founders of the field of software architecture, and, in particular, formal representation and analysis of architectural designs. He is a co-author of two books on software architecture: "Software Architecture: Perspectives on an Emerging Discipline", and "Documenting Software Architecture: Views and Beyond." In 2005 he received a Stevens Award Citation for “fundamental contributions to the development and understanding of software architecture as a discipline in software engineering.” In 2011 he received the Outstanding Research award from ACM SIGSOFT for “significant and lasting software engineering research contributions through the development and promotion of software architecture.” In 2016 he received the Allen Newell Award for Research Excellence. In 2017 he received the IEEE TCSE Distinguished Education Award and also the Nancy Mead Award for Excellence in Software Engineering Education He is a Fellow of the IEEE and ACM.
Machine-learning algorithms, especially classifiers, are becoming prevalent in safety and security-critical applications. The susceptibility of some types of classifiers to being evaded by adversarial input data has been explored in domains such as spam filtering, but with the rapid growth in adoption of machine learning in multiple application domains amplifies the extent and severity of this vulnerability landscape. We propose to (1) develop predictive metrics that characterize the degree to which a neural-network-based image classifier used in domains such as face recognition (say, for surveillance and authentication) can be evaded through attacks that are both practically realizable and inconspicuous, and (2) develop methods that make these classifiers, and the applications that incorporate them, robust to such interference. We will examine how to manipulate images to fool classifiers in various ways, and how to do so in a way that escapes the suspicion of even human onlookers. Armed with this understanding of the weaknesses of popular classifiers and their modes of use, we will develop explanations of model behavior to help identify the presence of a likely attack; and generalize these explanations to harden models against future attacks.
Lujo Bauer is an Associate Professor in the Electrical and Computer Engineering Department and in the Institute for Software Research at Carnegie Mellon University. He received his B.S. in Computer Science from Yale University in 1997 and his Ph.D., also in Computer Science, from Princeton University in 2003.
Dr. Bauer's research interests span many areas of computer security and privacy, and include building usable access-control systems with sound theoretical underpinnings, developing languages and systems for run-time enforcement of security policies on programs, and generally narrowing the gap between a formal model and a practical, usable system. His recent work focuses on developing tools and guidance to help users stay safer online and in examining how advances in machine learning can lead to a more secure future.
Dr. Bauer served as the program chair for the flagship computer security conferences of the IEEE (S&P 2015) and the Internet Society (NDSS 2014) and is an associate editor of ACM Transactions on Information and System Security.
Critical infrastructure is increasingly comprised of distributed, inter--‐dependent components and information that is vulnerable to sophisticated, multi--‐stage cyber--‐attacks. These attacks are difficult to understand as isolated incidents, and thus to improve understanding and response, organizations must rapidly share high quality threat, vulnerability and exploit--‐related, cyber--‐security information. However, pervasive and ubiquitous computing has blurred the boundary between work--‐related and personal data. This includes both the use of workplace computers for personal purposes, and the increase in publicly available, employee information that can be used to gain unauthorized access to systems through attacks targeted at employees.
To address this challenge, we envision a two part solution that includes: (1) the capability to assign information category tags to data “in transit” and “at rest” using an ontology that describes what information is personal and non--‐personal; and (2) a scoring algorithm that computes the “privacy risk” of some combination of assigned tags.
Dr. Breaux is the Director of the CMU Requirements Engineering Lab, where his research program investigates how to specify and design software to comply with policy and law in a trustworthy, reliable manner. His work historically concerned the empirical extraction of legal requirements from policies and law, and has recently studied how to use formal specifications to reason about privacy policy compliance, how to measure and reason over ambiguous and vague policies, and how security and privacy experts and novices estimate the risk of system designs.
To learn more, read about his ongoing research projects or contact him.