Publications | Science of Security Virtual Organization

Sensitivity Support in Data Privacy Algorithms

Personal data privacy is a great concern by governments across the world as citizens generate huge amount of data continuously and industries using this for betterment of user centric services. There must be a reasonable balance between data privacy and utility of data. Differential privacy is a promise by data collector to the customer’s personal privacy. Centralised Differential Privacy (CDP) is performing output perturbation of user’s data by applying required privacy budget. This promises the inclusion or exclusion of individual’s data in data set not going to create significant change for a statistical query output and it offers -Differential privacy guarantee. CDP is holding a strong belief on trusted data collector and applying global sensitivity of the data. Local Differential Privacy (LDP) helps user to locally perturb his data and there by guaranteeing privacy even with untrusted data collector. Many differential privacy algorithms handles parameters like privacy budget, sensitivity and data utility in different ways and mostly trying to keep trade-off between privacy and utility of data. This paper evaluates differential privacy algorithms in regard to the privacy support it offers according to the sensitivity of the data. Generalized application of privacy budget is found ineffective in comparison to the sensitivity based usage of privacy budget.

Authored by Geocey Shejy, Pallavi Chavan

DP-BEGAN: A Generative Model of Differential Privacy Algorithm

In recent years, differential privacy has gradually become a standard definition in the field of data privacy protection. Differential privacy does not need to make assumptions about the prior knowledge of privacy adversaries, so it has a more stringent effect than existing privacy protection models and definitions. This good feature has been used by researchers to solve the in-depth learning problem restricted by the problem of privacy and security, making an important breakthrough, and promoting its further large-scale application. Combining differential privacy with BEGAN, we propose the DP-BEGAN framework. The differential privacy is realized by adding carefully designed noise to the gradient of Gan model training, so as to ensure that Gan can generate unlimited synthetic data that conforms to the statistical characteristics of source data and does not disclose privacy. At the same time, it is compared with the existing methods on public datasets. The results show that under a certain privacy budget, this method can generate higher quality privacy protection data more efficiently, which can be used in a variety of data analysis tasks. The privacy loss is independent of the amount of synthetic data, so it can be applied to large datasets.

Authored by Er-Mei Shi, Jia-Xi Liu, Yuan-Ming Ji, Liang Chang

Differential Privacy under Incalculable Sensitivity

Differential privacy mechanisms have been proposed to guarantee the privacy of individuals in various types of statistical information. When constructing a probabilistic mechanism to satisfy differential privacy, it is necessary to consider the impact of an arbitrary record on its statistics, i.e., sensitivity, but there are situations where sensitivity is difficult to derive. In this paper, we first summarize the situations in which it is difficult to derive sensitivity in general, and then propose a definition equivalent to the conventional definition of differential privacy to deal with them. This definition considers neighboring datasets as in the conventional definition. Therefore, known differential privacy mechanisms can be applied. Next, as an example of the difficulty in deriving sensitivity, we focus on the t-test, a basic tool in statistical analysis, and show that a concrete differential privacy mechanism can be constructed in practice. Our proposed definition can be treated in the same way as the conventional differential privacy definition, and can be applied to cases where it is difficult to derive sensitivity.

Authored by Tomoaki Mimoto, Masayuki Hashimoto, Hiroyuki Yokoyama, Toru Nakamura, Takamasa Isohara, Ryosuke Kojima, Aki Hasegawa, Yasushi Okuno

Differential Privacy High-dimensional Data Publishing Method Based on Bayesian Network

Ensuring high data availability while realizing privacy protection is a research hotspot in the field of privacy-preserving data publishing. In view of the instability of data availability in the existing differential privacy high-dimensional data publishing methods based on Bayesian networks, this paper proposes an improved MEPrivBayes privacy-preserving data publishing method, which is mainly improved from two aspects. Firstly, in view of the structural instability caused by the random selection of Bayesian first nodes, this paper proposes a method of first node selection and Bayesian network construction based on the Maximum Information Coefficient Matrix. Then, this paper proposes a privacy budget elastic allocation algorithm: on the basis of pre-setting differential privacy budget coefficients for all branch nodes and all leaf nodes in Bayesian network, the influence of branch nodes on their child nodes and the average correlation degree between leaf nodes and all other nodes are calculated, then get a privacy budget strategy. The SVM multi-classifier is constructed with privacy preserving data as training data set, and the original data set is used as input to evaluate the prediction accuracy in this paper. The experimental results show that the MEPrivBayes method proposed in this paper has higher data availability than the classical PrivBayes method. Especially when the privacy budget is small (noise is large), the availability of the data published by MEPrivBayes decreases less.

Authored by Xiaotian Lu, Chunhui Piao, Jianghe Han

RDP-WGAN: Image Data Privacy Protection Based on Rényi Differential Privacy

In recent years, artificial intelligence technology based on image data has been widely used in various industries. Rational analysis and mining of image data can not only promote the development of the technology field but also become a new engine to drive economic development. However, the privacy leakage problem has become more and more serious. To solve the privacy leakage problem of image data, this paper proposes the RDP-WGAN privacy protection framework, which deploys the Rényi differential privacy (RDP) protection techniques in the training process of generative adversarial networks to obtain a generative model with differential privacy. This generative model is used to generate an unlimited number of synthetic datasets to complete various data analysis tasks instead of sensitive datasets. Experimental results demonstrate that the RDP-WGAN privacy protection framework provides privacy protection for sensitive image datasets while ensuring the usefulness of the synthetic datasets.

Authored by Xuebin Ma, Ren Yang, Maobo Zheng

Improved differential privacy K-means clustering algorithm for privacy budget allocation

In the differential privacy clustering algorithm, the added random noise causes the clustering centroids to be shifted, which affects the usability of the clustering results. To address this problem, we design a differential privacy K-means clustering algorithm based on an adaptive allocation of privacy budget to the clustering effect: Adaptive Differential Privacy K-means (ADPK-means). The method is based on the evaluation results generated at the end of each iteration in the clustering algorithm. First, it dynamically evaluates the effect of the clustered sets at the end of each iteration by measuring the separation and tightness between the clustered sets. Then, the evaluation results are introduced into the process of privacy budget allocation by weighting the traditional privacy budget allocation. Finally, different privacy budgets are assigned to different sets of clusters in the iteration to achieve the purpose of adaptively adding perturbation noise to each set. In this paper, both theoretical and experimental results are analyzed, and the results show that the algorithm satisfies e-differential privacy and achieves better results in terms of the availability of clustering results for the three standard datasets.

Authored by Liquan Han, Yushan Xie, Di Fan, Jinyuan Liu

Differential Privacy Protection Algorithm Based on Zero Trust Architecture for Industrial Internet

The Zero Trust Architecture is an important part of the industrial Internet security protection standard. When analyzing industrial data for enterprise-level or industry-level applications, differential privacy (DP) is an important technology for protecting user privacy. However, the centralized and local DP used widely nowadays are only applicable to the networks with fixed trust relationship and cannot cope with the dynamic security boundaries in Zero Trust Architecture. In this paper, we design a differential privacy scheme that can be applied to Zero Trust Architecture. It has a consistent privacy representation and the same noise mechanism in centralized and local DP scenarios, and can balance the strength of privacy protection and the flexibility of privacy mechanisms. We verify the algorithm in the experiment, that using maximum expectation estimation method it is able to obtain equal or even better result of the utility with the same level of security as traditional methods.

Authored by Yuning Song, Liping Ding, Xuehua Liu, Mo Du

Differential Privacy Techniques for Healthcare Data

This paper analyzes techniques to enable differential privacy by adding Laplace noise to healthcare data. First, as healthcare data contain natural constraints for data to take only integral values, we show that drawing only integral values does not provide differential privacy. In contrast, rounding randomly drawn values to the nearest integer provides differential privacy. Second, when a variable is constructed using two other variables, noise must be added to only one of them. Third, if the constructed variable is a fraction, then noise must be added to its constituent private variables, and not to the fraction directly. Fourth, the accuracy of analytics following noise addition increases with the privacy budget, ϵ, and the variance of the independent variable. Finally, the accuracy of analytics following noise addition increases disproportionately with an increase in the privacy budget when the variance of the independent variable is greater. Using actual healthcare data, we provide evidence supporting the two predictions on the accuracy of data analytics. Crucially, to enable accuracy of data analytics with differential privacy, we derive a relationship to extract the slope parameter in the original dataset using the slope parameter in the noisy dataset.

Authored by Rishabh Subramanian

Privacy-Preserving Cloud Data Model based on Differential Approach

With the variety of cloud services, the cloud service provider delivers the machine learning service, which is used in many applications, including risk assessment, product recommen-dation, and image recognition. The cloud service provider initiates a protocol for the classification service to enable the data owners to request an evaluation of their data. The owners may not entirely rely on the cloud environment as the third parties manage it. However, protecting data privacy while sharing it is a significant challenge. A novel privacy-preserving model is proposed, which is based on differential privacy and machine learning approaches. The proposed model allows the various data owners for storage, sharing, and utilization in the cloud environment. The experiments are conducted on Blood transfusion service center, Phoneme, and Wilt datasets to lay down the proposed model's efficiency in accuracy, precision, recall, and Fl-score terms. The results exhibit that the proposed model specifies high accuracy, precision, recall, and Fl-score up to 97.72%, 98.04%, 97.72%, and 98.80%, respectively.

Authored by Rishabh Gupta, Ashutosh Singh

Localized Differential Location Privacy Protection Scheme in Mobile Environment

When users request location services, they are easy to expose their privacy information, and the scheme of using a third-party server for location privacy protection has high requirements for the credibility of the server. To solve these problems, a localized differential privacy protection scheme in mobile environment is proposed, which uses Markov chain model to generate probability transition matrix, and adds Laplace noise to construct a location confusion function that meets differential privacy, Conduct location confusion on the client, construct and upload anonymous areas. Through the analysis of simulation experiments, the scheme can solve the problem of untrusted third-party server, and has high efficiency while ensuring the high availability of the generated anonymous area.

Authored by Liu Kai, Wang Jingjing, Hu Yanjing

Text Analytics for Security

Authored by Tao Xie, William Enck

Understanding and Accounting for Human Behavior

Since computers are machines, it's tempting to think of computer security as purely a technical problem. However, computing systems are created, used, and maintained by humans, and exist to serve the goals of human and institutional stakeholders. Consequently, effectively addressing the security problem requires understanding this human dimension. In this tutorial, we discuss this challenge and survey principal research approaches to it.

Authored by Jim Blythe, Sean Smith

Modelling User Availability in Workflow Resiliency Analysis

Workflows capture complex operational processes and include security constraints limiting which users can perform which tasks. An improper security policy may prevent cer- tain tasks being assigned and may force a policy violation. Deciding whether a valid user-task assignment exists for a given policy is known to be extremely complex, especially when considering user unavailability (known as the resiliency problem). Therefore tools are required that allow automatic evaluation of workflow resiliency. Modelling well defined workflows is fairly straightforward, however user availabil- ity can be modelled in multiple ways for the same workflow. Correct choice of model is a complex yet necessary concern as it has a major impact on the calculated resiliency. We de- scribe a number of user availability models and their encod- ing in the model checker PRISM, used to evaluate resiliency. We also show how model choice can affect resiliency computation in terms of its value, memory and CPU time.

Authored by John Mace, Charles Morisset, Aad Van Moorsel

Detecting Monitor Compromise using Evidential Reasoning

Stealthy attackers often disable or tamper with system monitors to hide their tracks and evade detection. In this poster, we present a data-driven technique to detect such monitor compromise using evidential reasoning. Leveraging the fact that hiding from multiple, redundant monitors is difficult for an attacker, to identify potential monitor compromise, we combine alerts from different sets of monitors by using Dempster-Shafer theory, and compare the results to find outliers. We describe our ongoing work in this area.

Authored by Uttam Thakore, Ahmed Fawaz, William Sanders

On the Tradeoff Between Privacy and Utility in Collaborative Intrusion Detection Systems-A Game Theoretical Approach

Intrusion Detection Systems (IDSs) are crucial security mechanisms widely deployed for critical network protection. However, conventional IDSs become incompetent due to the rapid growth in network size and the sophistication of large scale attacks. To mitigate this problem, Collaborative IDSs (CIDSs) have been proposed in literature. In CIDSs, a number of IDSs exchange their intrusion alerts and other relevant data so as to achieve better intrusion detection performance. Nevertheless, the required information exchange may result in privacy leakage, especially when these IDSs belong to different self-interested organizations. In order to obtain a quantitative understanding of the fundamental tradeoff between the intrusion detection accuracy and the organizations' privacy, a repeated two-layer single-leader multi-follower game is proposed in this work. Based on our game-theoretic analysis, we are able to derive the expected behaviors of both the attacker and the IDSs and obtain the utility-privacy tradeoff curve. In addition, the existence of Nash equilibrium (NE) is proved and an asynchronous dynamic update algorithm is proposed to compute the optimal collaboration strategies of IDSs. Finally, simulation results are shown to validate the analysis.

Authored by Richeng Jin, Xiaofan He, Huaiyu Dai

Towards Privacy-Preserving Mobile Apps: A Balancing Act

Authored by Dengfeng Li, Wing Lam, Wei Yang, Zhengkai Wu, Xusheng Xiao, Tao Xie

Learning Factor Graphs for Preempting Multi-Stage Attacks in Cloud Infrastructure

Authored by Phuong Cao, Alex Withers, Zbigniew Kalbarczyk, Ravishankar Iyer

Factors for Differentiating Human from Automated Attacks

Authored by Kelly Greeling, Alex Withers, Masooda Bashir

FARM: Finding the Appropriate level of Realism for Modeling

Authored by Jim Blythe, Sean Smith, Ross Koppel, Christopher Novak, Vijay Kothari

Multi-Agent System for Detecting False Data Injection Attacks Against the Power Grid

Authored by Esther Amullen, Hui Lin, Zbigniew Kalbarczyk, Lee Keel