Synopses of Presentations: Quarterly Lablet meeting NCSU 1-2 Feb 2017
The winter 2016 quarterly Science of Security (SoS) Lablet meeting was held at North Carolina State University on February 1 and 2, 2017. In addition to speakers from the Lablets and NSA, corporate speakers provided insights into the problems of privacy and security. Presentations of current research and interim findings stimulated thought and discussion. A synopsis of each invited talk and lablet presentation is provided here.
Invited Speakers:
Tomas Vagoun (NITRD) "Federal Privacy R&D Priorities"
The modern definition of privacy expands the old definition from a right to be left alone to new concerns about large scale data collection, analysis and algorithmic decision-making. Privacy concerns are the effects of authorized PII processing. Current federal priorities for privacy research include multidisciplinary approaches to privacy research and solutions; understanding and measuring privacy desires and impacts; developing system design methods that incorporate privacy desires, requirements, and controls; increasing transparency in data collection, sharing, use, and retention; assuring that information flows and use are consistent with privacy rules; developing approaches for remediation and recovery; and reducing privacy risks of analytical algorithms. An NSF survey identified the last two topics as areas of major research gaps.
David Hoffman, (Intel) "It Takes Data to Protect Data"
This talk addressed the relationship between privacy and security. In his presentation he said that security and privacy are neither tradeoffs nor a zero sum game. Rather, the two should be thought of as needing to be in balance. It should be a process of adding to the other when one is increased. Risks are radically changing—new technologies have been created that allow a small group to inflict extreme harm on a large number of people using drones, germs, robots, and hackers—the threat has become asymmetric. “Good cybersecurity is good for privacy,” he concluded.
David Marcos, (NSA) "Researching the Science of Privacy"
In his view, the Science of Privacy is a principled and methodological approach to privacy risk addressing the following research challenge questions: Can it be considered? Can a mathematical method be developed to evaluate privacy risk? How can a privacy accountability framework be built for Big Data? Can we apply current advances in engineering such as digital rights management, differential privacy, homomorphic encryption, and secure multi-party computation? How can the effectiveness of current privacy frameworks and associated controls be evaluated?
Jennifer Cowley (CERT) "Why Can't I Put Down My Phone? The Paradox of Computing in Modern Work
Environments"
This study suggests there are unintended effects of technology. The problem is that deep technical work seems to be an impossibility in a computing work environment, one that leads to work dissatisfaction. There are longstanding antecedents including underlying digital addictions. We now observe increases in quantity of workplace distractions and increased surveillance, which further erode the capacity to work effectively. We are creating our own workplace threats through the environments we are working in. Research indicates that video conferencing -in person is better at planning and info exchange (DeMeyer, 1991; Galegher & Kraut, 1994); that virtual teams are stymied by virtual meetings in trust, cohesion and job satisfaction but this can be remedied with team training; and that face to face has more social relations and team satisfaction (Warkentin et al., 1997) but in some studies, this wasn’t the case and often offset by heterogeneous gender composition (dePillis & Furumo, 2006)
Lablet Presentations:
Bill Scherlis, (CMU) "Safety and Control for AI-based Systems"
He described a conference CMU hosted in the summer of 2016 on "Safety and Control for AI-based Systems." Artificial Intelligence is now embedded in critical infrastructure and has a big impact on security. We need assurance judgments about AI systems and for them to become reliable and trustworthy. AI safety is multidimensional and must be addressed in the mission context.
Giulia Fanti (UIUC) "Anonymity in the Bitcoin P2P Network"
In this work, the theorem that maximum-likelihood probabilities of detection for diffusion and trickle are asymptotically identical in d. The work shows diffusion does not have (significantly) better properties than trickle. The goal is to design a distributed flooding protocol that minimizes the maximum precision and recall achievable by a computationally-unbounded adversary. Her preliminary conclusions are Bitcoin P2P anonymity is poor, that moving from trickle to diffusion did not help, and that Dandelion may be a lightweight solution for certain classes of adversaries.
Dave Roberts (NCSU) "A Control-theoretic View of AI for Security"
This talk is about securing systems through making careful manipulations that produce predictable and measurable changes in user behavior. Using a game approach, he says “things that we do on computers serve as a window to the mind” Games are a way of describing the use of AI for security proofs and a way of contextualizing the technical challenges to realizing those proofs. The goal is to think about securing systems through making careful manipulations that produce predictable and measureable changes in user behavior. He concludes that one fundamental task in AI for security is to develop control-theoretic methods that enable systems to use analytics to reason about how users complete tasks and identify evidence of departures from normal behavior using control feedback to influence task conditions.
Jessica Staddon (NCSU) "Privacy Incidents, Privacy News and News about Incidents"
This work includes building database of privacy incidents and discusses its vision and status, vetting the definition and data, partially-automating database maintenance and some of the trends in sentiment, entities and keywords. Security has a lot of incident databases and data breach incidents are typically both security and privacy incidents, so we can leverage those databases. But there are many areas of privacy that are not represented. Feature engineering addresses text processing; applying PoS tagging to retain ‘content’ words; nouns, verbs, adjectives, adverbs; Lemmatization; unigrams and bigrams; dimension reduction; reducing feature space with mutual information, top k keywords for a variety of values of k; and training Naïve Bayes, SVM and Random Forest-based classifiers.
Nitin Vaidya (UIUC) "Security and Privacy in Machine Learning"
This work looks at how to optimize machine learning systems to maximize accuracy in distributed machine learning systems. One key challenge is the need to filter bad information and to define “outliers” appropriately. He concludes that achieving privacy and security in machine learning is non-trivial, that there has been some promising progress, but there is plenty to keep us busy for a while.
Özgür Kafali (NCSU) "How Good is a Security Policy against Breaches?"
This work asks how to formalize security policies and breaches to bring out their mutual correspondence, what the commonalities and differences between concepts in security policies and breach descriptions are, and how they correspond to gaps in between. The analysis looks at breaches from HHS incidents that led to disclosure of patient records in violation of HIPAA. Using a Semaver method, he identified 1,577 breaches reported by HHS, and concluded that hacking and theft contain malicious misuses including loss, unauthorized disclosure, and improper disposal and accidental misuses. The method accounts for 68% accidental misuses and 13% malicious misuses. Overall there were 44% accidental misuses and 56% malicious misuses.
Travis Breaux (CMU) "Discovering a Natural Language Semantics for Privacy"
This project applies formal methods to look at privacy policy information from gaming, health, news, shopping, and telecom. The approach is to use semantics. Its conclusions are that Hearst patterns can get general categories, but are difficult to encode and have about a 17% false positives rate. Future work in this area should improve the search for non-lexical hypernyms and meronyms; correlate information type and data purpose; and score information types by privacy risk with context.