Cyber threats can cause severe damage to computing infrastructure and systems as well as data breaches that make sensitive data vulnerable to attackers and adversaries. It is therefore imperative to discover those threats and stop them before bad actors penetrating into the information systems.Threats hunting algorithms based on machine learning have shown great advantage over classical methods. Reinforcement learning models are getting more accurate for identifying not only signature-based but also behavior-based threats. Quantum mechanics brings a new dimension in improving classification speed with exponential advantage. The accuracy of the AI/ML algorithms could be affected by many factors, from algorithm, data, to prejudicial, or even intentional. As a result, AI/ML applications need to be non-biased and trustworthy.In this research, we developed a machine learning-based cyber threat detection and assessment tool. It uses two-stage (both unsupervised and supervised learning) analyzing method on 822,226 log data recorded from a web server on AWS cloud. The results show the algorithm has the ability to identify the threats with high confidence.
Authored by Shuangbao Wang, Md Arafin, Onyema Osuagwu, Ketchiozo Wandji
Cyber-Physical System (CPS) represents systems that join both hardware and software components to perform real-time services. Maintaining the system's reliability is critical to the continuous delivery of these services. However, the CPS running environment is full of uncertainties and can easily lead to performance degradation. As a result, the need for a recovery technique is highly needed to achieve resilience in the system, with keeping in mind that this technique should be as green as possible. This early doctorate proposal, suggests a game theory solution to achieve resilience and green in CPS. Game theory has been known for its fast performance in decision-making, helping the system to choose what maximizes its payoffs. The proposed game model is described over a real-life collaborative artificial intelligence system (CAIS), that involves robots with humans to achieve a common goal. It shows how the expected results of the system will achieve the resilience of CAIS with minimized CO2 footprint.
Authored by Diaeddin Rimawi
Deep learning models rely on single word features and location features of text to achieve good results in text relation extraction tasks. However, previous studies have failed to make full use of semantic information contained in sentence dependency syntax trees, and data sparseness and noise propagation still affect classification models. The BERT(Bidirectional Encoder Representations from Transformers) pretrained language model provides a better representation of natural language processing tasks. And entity enhancement methods have been proved to be effective in relation extraction tasks. Therefore, this paper proposes a combination of the shortest dependency path and entity-enhanced BERT pre-training language model for model construction to reduce the impact of noise terms on the classification model and obtain more semantically expressive feature representation. The algorithm is tested on SemEval-2010 Task 8 English relation extraction dataset, and the F1 value of the final experiment can reach 0. 881.
Authored by Zeyu Sun, Chi Zhang
Steady advancement in Artificial Intelligence (AI) development over recent years has caused AI systems to become more readily adopted across industry and military use-cases globally. As powerful as these algorithms are, there are still gaping questions regarding their security and reliability. Beyond adversarial machine learning, software supply chain vulnerabilities and model backdoor injection exploits are emerging as potential threats to the physical safety of AI reliant CPS such as autonomous vehicles. In this work in progress paper, we introduce the concept of AI supply chain vulnerabilities with a provided proof of concept autonomous exploitation framework. We investigate the viability of algorithm backdoors and software third party library dependencies for applicability into modern AI attack kill chains. We leverage an autonomous vehicle case study for demonstrating the applicability of our offensive methodologies within a realistic AI CPS operating environment.
Authored by Daniel Williams, Chelece Clark, Rachel McGahan, Bradley Potteiger, Daniel Cohen, Patrick Musau
Data secure deletion operation in storage media is an important function of data security management. The internal physical properties of SSDs are different from hard disks, and data secure deletion of disks can not apply to SSDs directly. Copyback operation is used to improve the data migration performance of SSDs but is rarely used due to error accumulation issue. We propose a data securely deletion algorithm based on copyback operation, which improves the efficiency of data secure deletion without affecting the reliability of data. First, this paper proves that the data secure delete operation takes a long time on the channel bus, increasing the I/O overhead, and reducing the performance of the SSDs. Secondly, this paper designs an efficient data deletion algorithm, which can process read requests quickly. The experimental results show that the proposed algorithm can reduce the response time of read requests by 21% and the response time of delete requests by 18.7% over the existing algorithm.
Authored by Rongzhen Zhu, Yuchen Wang, Pengpeng Bai, Zhiming Liang, Weiguo Wu, Lei Tang
At present, cloud service providers control the direct management rights of cloud data, and cloud data cannot be effectively and assured deleted, which may easily lead to security problems such as data residue and user privacy leakage. This paper analyzes the related research work of cloud data assured deletion in recent years from three aspects: encryption key deletion, multi-replica association deletion, and verifiable deletion. The advantages and disadvantages of various deletion schemes are analysed in detail, and finally the prospect of future research on assured deletion of cloud data is given.
Authored by Bin Li, Yu Fu, Kun Wang
Missing values are an unavoidable problem for classification tasks of machine learning in medical data. With the rapid development of the medical system, large scale medical data is increasing. Missing values increase the difficulty of mining hidden but useful information in these medical datasets. Deletion and imputation methods are the most popular methods for dealing with missing values. Existing studies ignored to compare and discuss the deletion and imputation methods of missing values under the row missing rate and the total missing rate. Meanwhile, they rarely used experiment data sets that are mixed-type and large scale. In this work, medical data sets of various sizes and mixed-type are used. At the same time, performance differences of deletion and imputation methods are compared under the MCAR (Missing Completely At Random) mechanism in the baseline task using LR (Linear Regression) and SVM (Support Vector Machine) classifiers for classification with the same row and total missing rates. Experimental results show that under the MCAR missing mechanism, the performance of two types of processing methods is related to the size of datasets and missing rates. As the increasing of missing rate, the performance of two types for processing missing values decreases, but the deletion method decreases faster, and the imputation methods based on machine learning have more stable and better classification performance on average. In addition, small data sets are easily affected by processing methods of missing values.
Authored by Lijuan Ren, Tao Wang, Aicha Seklouli, Haiqing Zhang, Abdelaziz Bouras
With the rapid development of artificial intelligence, video target tracking is widely used in the fields of intelligent video surveillance, intelligent transportation, intelligent human-computer interaction and intelligent medical diagnosis. Deep learning has achieved remarkable results in the field of computer vision. The development of deep learning not only breaks through many problems that are difficult to be solved by traditional algorithms, improves the computer's cognitive level of images and videos, but also promotes the progress of related technologies in the field of computer vision. This paper combines the deep learning algorithm and target tracking algorithm to carry out relevant experiments on basketball motion detection video, hoping that the experimental results can be helpful to basketball motion detection video target tracking.
Authored by Tieniu Xia
The growing number of cybersecurity incidents and the always increasing complexity of cybersecurity attacks is forcing the industry and the research community to develop robust and effective methods to detect and respond to network attacks. Many tools are either built upon a large number of rules and signatures which only large third-party vendors can afford to create and maintain, or are based on complex artificial intelligence engines which, in most cases, still require personalization and fine-tuning using costly service contracts offered by the vendors.This paper introduces an open-source network traffic monitoring system based on the concept of cyberscore, a numerical value that represents how a network activity is considered relevant for spotting cybersecurity-related events. We describe how this technique has been applied in real-life networks and present the result of this evaluation.
Authored by Luca Deri, Alfredo Cardigliano
In recent years, differential privacy has gradually become a standard definition in the field of data privacy protection. Differential privacy does not need to make assumptions about the prior knowledge of privacy adversaries, so it has a more stringent effect than existing privacy protection models and definitions. This good feature has been used by researchers to solve the in-depth learning problem restricted by the problem of privacy and security, making an important breakthrough, and promoting its further large-scale application. Combining differential privacy with BEGAN, we propose the DP-BEGAN framework. The differential privacy is realized by adding carefully designed noise to the gradient of Gan model training, so as to ensure that Gan can generate unlimited synthetic data that conforms to the statistical characteristics of source data and does not disclose privacy. At the same time, it is compared with the existing methods on public datasets. The results show that under a certain privacy budget, this method can generate higher quality privacy protection data more efficiently, which can be used in a variety of data analysis tasks. The privacy loss is independent of the amount of synthetic data, so it can be applied to large datasets.
Authored by Er-Mei Shi, Jia-Xi Liu, Yuan-Ming Ji, Liang Chang
Ensuring high data availability while realizing privacy protection is a research hotspot in the field of privacy-preserving data publishing. In view of the instability of data availability in the existing differential privacy high-dimensional data publishing methods based on Bayesian networks, this paper proposes an improved MEPrivBayes privacy-preserving data publishing method, which is mainly improved from two aspects. Firstly, in view of the structural instability caused by the random selection of Bayesian first nodes, this paper proposes a method of first node selection and Bayesian network construction based on the Maximum Information Coefficient Matrix. Then, this paper proposes a privacy budget elastic allocation algorithm: on the basis of pre-setting differential privacy budget coefficients for all branch nodes and all leaf nodes in Bayesian network, the influence of branch nodes on their child nodes and the average correlation degree between leaf nodes and all other nodes are calculated, then get a privacy budget strategy. The SVM multi-classifier is constructed with privacy preserving data as training data set, and the original data set is used as input to evaluate the prediction accuracy in this paper. The experimental results show that the MEPrivBayes method proposed in this paper has higher data availability than the classical PrivBayes method. Especially when the privacy budget is small (noise is large), the availability of the data published by MEPrivBayes decreases less.
Authored by Xiaotian Lu, Chunhui Piao, Jianghe Han
In recent years, artificial intelligence technology based on image data has been widely used in various industries. Rational analysis and mining of image data can not only promote the development of the technology field but also become a new engine to drive economic development. However, the privacy leakage problem has become more and more serious. To solve the privacy leakage problem of image data, this paper proposes the RDP-WGAN privacy protection framework, which deploys the Rényi differential privacy (RDP) protection techniques in the training process of generative adversarial networks to obtain a generative model with differential privacy. This generative model is used to generate an unlimited number of synthetic datasets to complete various data analysis tasks instead of sensitive datasets. Experimental results demonstrate that the RDP-WGAN privacy protection framework provides privacy protection for sensitive image datasets while ensuring the usefulness of the synthetic datasets.
Authored by Xuebin Ma, Ren Yang, Maobo Zheng
In the differential privacy clustering algorithm, the added random noise causes the clustering centroids to be shifted, which affects the usability of the clustering results. To address this problem, we design a differential privacy K-means clustering algorithm based on an adaptive allocation of privacy budget to the clustering effect: Adaptive Differential Privacy K-means (ADPK-means). The method is based on the evaluation results generated at the end of each iteration in the clustering algorithm. First, it dynamically evaluates the effect of the clustered sets at the end of each iteration by measuring the separation and tightness between the clustered sets. Then, the evaluation results are introduced into the process of privacy budget allocation by weighting the traditional privacy budget allocation. Finally, different privacy budgets are assigned to different sets of clusters in the iteration to achieve the purpose of adaptively adding perturbation noise to each set. In this paper, both theoretical and experimental results are analyzed, and the results show that the algorithm satisfies e-differential privacy and achieves better results in terms of the availability of clustering results for the three standard datasets.
Authored by Liquan Han, Yushan Xie, Di Fan, Jinyuan Liu
When users request location services, they are easy to expose their privacy information, and the scheme of using a third-party server for location privacy protection has high requirements for the credibility of the server. To solve these problems, a localized differential privacy protection scheme in mobile environment is proposed, which uses Markov chain model to generate probability transition matrix, and adds Laplace noise to construct a location confusion function that meets differential privacy, Conduct location confusion on the client, construct and upload anonymous areas. Through the analysis of simulation experiments, the scheme can solve the problem of untrusted third-party server, and has high efficiency while ensuring the high availability of the generated anonymous area.
Authored by Liu Kai, Wang Jingjing, Hu Yanjing