Publications | Science of Security Virtual Organization

Detecting Malware Using Graph Embedding and DNN

Malware Analysis and Graph Theory - Nowadays, the popularity of intelligent terminals makes malwares more and more serious. Among the many features of application, the call graph can accurately express the behavior of the application. The rapid development of graph neural network in recent years provides a new solution for the malicious analysis of application using call graphs as features. However, there are still problems such as low accuracy. This paper established a large-scale data set containing more than 40,000 samples and selected the class call graph, which was extracted from the application, as the feature and used the graph embedding combined with the deep neural network to detect the malware. The experimental results show that the accuracy of the detection model proposed in this paper is 97.7\%; the precision is 96.6\%; the recall is 96.8\%; the F1-score is 96.4\%, which is better than the existing detection model based on Markov chain and graph embedding detection model.

Authored by Rui Wang, Jun Zheng, Zhiwei Shi, Yu Tan

Comparison Of Different Machine Learning Methods Applied To Obesity Classification

Machine Learning - Estimation for obesity levels is always an important topic in medical field since it can provide useful guidance for people that would like to lose weight or keep fit. The article tries to find a model that can predict obesity and provides people with the information of how to avoid overweight. To be more specific, this article applied dimension reduction to the data set to simplify the data and tried to Figure out a most decisive feature of obesity through Principal Component Analysis (PCA) based on the data set. The article also used some machine learning methods like Support Vector Machine (SVM), Decision Tree to do prediction of obesity and wanted to find the major reason of obesity. In addition, the article uses Artificial Neural Network (ANN) to do prediction which has more powerful feature extraction ability to do this. Finally, the article found that family history of obesity is the most decisive feature, and it may because of obesity may be greatly affected by genes or the family eating diet may have great influence. And both ANN and Decision tree’s accuracy of prediction is higher than 90\%.

Authored by Zhenghao He

Research on the Algorithm of Situational Element Extraction of Internet of Vehicles Security based on Optimized-FOA-PNN

Internet-scale Computing Security - The scale of the intelligent networked vehicle market is expanding rapidly, and network security issues also follow. A Situational Awareness (SA) system can detect, identify, and respond to security risks from a global perspective. In view of the discrete and weak correlation characteristics of perceptual data, this paper uses the Fly Optimization Algorithm (FOA) based on dynamic adjustment of the optimization step size to improve the convergence speed, and optimizes the extraction model of security situation element of the Internet of Vehicles (IoV), based on Probabilistic Neural Network (PNN), to improve the accuracy of element extraction. Through the comparison of experimental algorithms, it is verified that the algorithm has fast convergence speed, high precision and good stability.

Authored by Xuan Chen, Fei Li

Research on the Algorithm of Situational Element Extraction of Internet of Vehicles Security based on Optimized-FOA-PNN

Internet of Vehicles Security - The scale of the intelligent networked vehicle market is expanding rapidly, and network security issues also follow. A Situational Awareness (SA) system can detect, identify, and respond to security risks from a global perspective. In view of the discrete and weak correlation characteristics of perceptual data, this paper uses the Fly Optimization Algorithm (FOA) based on dynamic adjustment of the optimization step size to improve the convergence speed, and optimizes the extraction model of security situation element of the Internet of Vehicles (IoV), based on Probabilistic Neural Network (PNN), to improve the accuracy of element extraction. Through the comparison of experimental algorithms, it is verified that the algorithm has fast convergence speed, high precision and good stability.

Authored by Xuan Chen, Fei Li

Optimization and Prediction of Intelligent Tourism Data

Intelligent Data and Security - Tourism is one of the main sources of income in Australia. The number of tourists will affect airlines, hotels and other stakeholders. Predicting the arrival of tourists can make full preparations for welcoming tourists. This paper selects Queensland Tourism data as intelligent data. Carry out data visualization around the intelligent data, establish seasonal ARIMA model, find out the characteristics and predict. In order to improve the accuracy of prediction. Based on the tourism data around Queensland, build a 10 layer Back Propagation neural network model. It is proved that the network shows good performance for the data prediction of this paper.

Authored by Luoyifan Zhong

IReF: Improved Residual Feature For Video Frame Deletion Forensics

Information Forensics - Frame deletion forensics has been a major area of video forensics in recent years. The detection effect of current deep neural network-based methods outperforms previous traditional detection methods. Recently, researchers have used residual features as input to the network to detect frame deletion and have achieved promising results. We propose an IReF (Improved Residual Feature) by analyzing the effect of residual features on frame deletion traces. IReF preserves the main motion features and edge information by denoising and enhancing the residual features, making it easier for the network to identify the tampered features. And the sparse noise reduction reduces the storage requirement. Experiments show that under the 2D convolutional neural network, the accuracy of IReF compared with residual features is increased by 3.81 \%, and the storage space requirement is reduced by 78\%. In the 3D convolutional neural network with video clips as feature input, the accuracy of IReF features is increased by 5.63\%, and the inference efficiency is increased by 18\%.

Authored by Huang Gong, Feng Hui, Bai Dan

GNN-Based Malicious Network Entities Identification In Large-Scale Network Data

A reliable database of Indicators of Compromise (IoC’s) is a cornerstone of almost every malware detection system. Building the database and keeping it up-to-date is a lengthy and often manual process where each IoC should be manually reviewed and labeled by an analyst. In this paper, we focus on an automatic way of identifying IoC’s intended to save analysts’ time and scale to the volume of network data. We leverage relations of each IoC to other entities on the internet to build a heterogeneous graph. We formulate a classification task on this graph and apply graph neural networks (GNNs) in order to identify malicious domains. Our experiments show that the presented approach provides promising results on the task of identifying high-risk malware as well as legitimate domains classification.

Authored by Stepan Dvorak, Pavel Prochazka, Lukas Bajer

CFGExplainer: Explaining Graph Neural Network-Based Malware Classification from Control Flow Graphs

With the ever increasing threat of malware, extensive research effort has been put on applying Deep Learning for malware classification tasks. Graph Neural Networks (GNNs) that process malware as Control Flow Graphs (CFGs) have shown great promise for malware classification. However, these models are viewed as black-boxes, which makes it hard to validate and identify malicious patterns. To that end, we propose CFG-Explainer, a deep learning based model for interpreting GNN-oriented malware classification results. CFGExplainer identifies a subgraph of the malware CFG that contributes most towards classification and provides insight into importance of the nodes (i.e., basic blocks) within it. To the best of our knowledge, CFGExplainer is the first work that explains GNN-based mal-ware classification. We compared CFGExplainer against three explainers, namely GNNExplainer, SubgraphX and PGExplainer, and showed that CFGExplainer is able to identify top equisized subgraphs with higher classification accuracy than the other three models.

Authored by Jerome Herath, Priti Wakodikar, Ping Yang, Guanhua Yan

Mal-Bert-GCN: Malware Detection by Combining Bert and GCN

With the dramatic increase in malicious software, the sophistication and innovation of malware have increased over the years. In particular, the dynamic analysis based on the deep neural network has shown high accuracy in malware detection. However, most of the existing methods only employ the raw API sequence feature, which cannot accurately reflect the actual behavior of malicious programs in detail. The relationship between API calls is critical for detecting suspicious behavior. Therefore, this paper proposes a malware detection method based on the graph neural network. We first connect the API sequences executed by different processes to build a directed process graph. Then, we apply Bert to encode the API sequences of each process into node embedding, which facilitates the semantic execution information inside the processes. Finally, we employ GCN to mine the deep semantic information based on the directed process graph and node embedding. In addition to presenting the design, we have implemented and evaluated our method on 10,000 malware and 10,000 benign software datasets. The results show that the precision and recall of our detection model reach 97.84\% and 97.83\%, verifying the effectiveness of our proposed method.

Authored by Zhenquan Ding, Hui Xu, Yonghe Guo, Longchuan Yan, Lei Cui, Zhiyu Hao

Detecting Malware Using Graph Embedding and DNN

Nowadays, the popularity of intelligent terminals makes malwares more and more serious. Among the many features of application, the call graph can accurately express the behavior of the application. The rapid development of graph neural network in recent years provides a new solution for the malicious analysis of application using call graphs as features. However, there are still problems such as low accuracy. This paper established a large-scale data set containing more than 40,000 samples and selected the class call graph, which was extracted from the application, as the feature and used the graph embedding combined with the deep neural network to detect the malware. The experimental results show that the accuracy of the detection model proposed in this paper is 97.7\%; the precision is 96.6\%; the recall is 96.8\%; the F1-score is 96.4\%, which is better than the existing detection model based on Markov chain and graph embedding detection model.

Authored by Rui Wang, Jun Zheng, Zhiwei Shi, Yu Tan

Comparison Of Different Machine Learning Methods Applied To Obesity Classification

Estimation for obesity levels is always an important topic in medical field since it can provide useful guidance for people that would like to lose weight or keep fit. The article tries to find a model that can predict obesity and provides people with the information of how to avoid overweight. To be more specific, this article applied dimension reduction to the data set to simplify the data and tried to Figure out a most decisive feature of obesity through Principal Component Analysis (PCA) based on the data set. The article also used some machine learning methods like Support Vector Machine (SVM), Decision Tree to do prediction of obesity and wanted to find the major reason of obesity. In addition, the article uses Artificial Neural Network (ANN) to do prediction which has more powerful feature extraction ability to do this. Finally, the article found that family history of obesity is the most decisive feature, and it may because of obesity may be greatly affected by genes or the family eating diet may have great influence. And both ANN and Decision tree’s accuracy of prediction is higher than 90\%.

Authored by Zhenghao He

EVAX: Towards a Practical, Pro-active & Adaptive Architecture for High Performance & Security

This paper provides an end-to-end solution to defend against known microarchitectural attacks such as speculative execution attacks, fault-injection attacks, covert and side channel attacks, and unknown or evasive versions of these attacks. Current defenses are attack specific and can have unacceptably high performance overhead. We propose an approach that reduces the overhead of state-of-art defenses by over 95%, by applying defenses only when attacks are detected. Many current proposed mitigations are not practical for deployment; for example, InvisiSpec has 27% overhead and Fencing has 74% overhead while protecting against only Spectre attacks. Other mitigations carry similar performance penalties. We reduce the overhead for InvisiSpec to 1.26% and for Fencing to 3.45% offering performance and security for not only spectre attacks but other known transient attacks as well, including the dangerous class of LVI and Rowhammer attacks, as well as covering a large set of future evasive and zero-day attacks. Critical to our approach is an accurate detector that is not fooled by evasive attacks and that can generalize to novel zero-day attacks. We use a novel Generative framework, Evasion Vaccination (EVAX) for training ML models and engineering new security-centric performance counters. EVAX significantly increases sensitivity to detect and classify attacks in time for mitigation to be deployed with low false positives (4 FPs in every 1M instructions in our experiments). Such performance enables efficient and timely mitigations, enabling the processor to automatically switch between performance and security as needed.

Authored by Samira Ajorpaz, Daniel Moghimi, Jeffrey Collins, Gilles Pokam, Nael Abu-Ghazaleh, Dean Tullsen

INTERACTION: A Generative XAI Framework for Natural Language Inference Explanations

XAI with natural language processing aims to produce human-readable explanations as evidence for AI decision-making, which addresses explainability and transparency. However, from an HCI perspective, the current approaches only focus on delivering a single explanation, which fails to account for the diversity of human thoughts and experiences in language. This paper thus addresses this gap, by proposing a generative XAI framework, INTERACTION (explain aNd predicT thEn queRy with contextuAl CondiTional varIational autO-eNcoder). Our novel framework presents explanation in two steps: (step one) Explanation and Label Prediction; and (step two) Diverse Evidence Generation. We conduct intensive experiments with the Transformer architecture on a benchmark dataset, e-SNLI [1]. Our method achieves competitive or better performance against state-of-the-art baseline models on explanation generation (up to 4.7% gain in BLEU) and prediction (up to 4.4% gain in accuracy) in step one; it can also generate multiple diverse explanations in step two.

Authored by Jialin Yu, Alexandra Cristea, Anoushka Harit, Zhongtian Sun, Olanrewaju Aduragba, Lei Shi, Noura Moubayed

Analysis of Efficient Network Security using Machine Learning in Convolutional Neural Network Methods

Several excellent devices can communicate without the need for human intervention. It is one of the fastest-growing sectors in the history of computing, with an estimated 50 billion devices sold by the end of 2020. On the one hand, IoT developments play a crucial role in upgrading a few simple, intelligent applications that can increase living quality. On the other hand, the security concerns have been noted to the cross-cutting idea of frameworks and the multidisciplinary components connected with their organization. As a result, encryption, validation, access control, network security, and application security initiatives for gadgets and their inherent flaws cannot be implemented. It should upgrade existing security measures to ensure that the ML environment is sufficiently protected. Machine learning (ML) has advanced tremendously in the last few years. Machine insight has evolved from a research center curiosity to a sensible instrument in a few critical applications.

Authored by Amit Pandey, Assefa Genale, Vijaykumar Janga, Barani Sundaram, Desalegn Awoke, P. Karthika

Intelligent fault diagnosis technology of power transformer based on Artificial Intelligence

Transformer is the key equipment of power system, and its stable operation is very important to the security of power system In practical application, with the progress of technology, the performance of transformer becomes more and more important, but faults also occur from time to time in practical application, and the traditional manual fault diagnosis needs to consume a lot of time and energy. At present, the rapid development of artificial intelligence technology provides a new research direction for timely and accurate detection and treatment of transformer faults. In this paper, a method of transformer fault diagnosis using artificial neural network is proposed. The neural network algorithm is used for off-line learning and training of the operation state data of normal and fault states. By adjusting the relationship between neuron nodes, the mapping relationship between fault characteristics and fault location is established by using network layer learning, Finally, the reasoning process from fault feature to fault location is realized to realize intelligent fault diagnosis.

Authored by Li Feng, Ye Bo

Towards Black-Box Adversarial Attacks on Interpretable Deep Learning Systems

Recent works have empirically shown that neural network interpretability is susceptible to malicious manipulations. However, existing attacks against Interpretable Deep Learning Systems (IDLSes) all focus on the white-box setting, which is obviously unpractical in real-world scenarios. In this paper, we make the first attempt to attack IDLSes in the decision-based black-box setting. We propose a new framework called Dual Black-box Adversarial Attack (DBAA) which can generate adversarial examples that are misclassified as the target class, yet have very similar interpretations to their benign cases. We conduct comprehensive experiments on different combinations of classifiers and interpreters to illustrate the effectiveness of DBAA. Empirical results show that in all the cases, DBAA achieves high attack success rates and Intersection over Union (IoU) scores.

Authored by Yike Zhan, Baolin Zheng, Qian Wang, Ningping Mou, Binqing Guo, Qi Li, Chao Shen, Cong Wang

Fostering The Robustness Of White-Box Deep Neural Network Watermarks By Neuron Alignment

The wide application of deep learning techniques is boosting the regulation of deep learning models, especially deep neural networks (DNN), as commercial products. A necessary prerequisite for such regulations is identifying the owner of deep neural networks, which is usually done through the watermark. Current DNN watermarking schemes, particularly white-box ones, are uniformly fragile against a family of functionality equivalence attacks, especially the neuron permutation. This operation can effortlessly invalidate the ownership proof and escape copyright regulations. To enhance the robustness of white-box DNN watermarking schemes, this paper presents a procedure that aligns neurons into the same order as when the watermark is embedded, so the watermark can be correctly recognized. This neuron alignment process significantly facilitates the functionality of established deep neural network watermarking schemes.

Authored by Fang-Qi Li, Shi-Lin Wang, Yun Zhu

The Application of 1D-CNN in Microsoft Malware Detection

In the computer field, cybersecurity has always been the focus of attention. How to detect malware is one of the focuses and difficulties in network security research effectively. Traditional existing malware detection schemes can be mainly divided into two methods categories: database matching and the machine learning method. With the rise of deep learning, more and more deep learning methods are applied in the field of malware detection. Deeper semantic features can be extracted via deep neural network. The main tasks of this paper are as follows: (1) Using machine learning methods and one-dimensional convolutional neural networks to detect malware (2) Propose a machine The method of combining learning and deep learning is used for detection. Machine learning uses LGBM to obtain an accuracy rate of 67.16%, and one-dimensional CNN obtains an accuracy rate of 72.47%. In (2), LGBM is used to screen the importance of features and then use a one-dimensional convolutional neural network, which helps to further improve the detection result has an accuracy rate of 78.64%.

Authored by Da Huo, Xiaoyong Li, Linghui Li, Yali Gao, Ximing Li, Jie Yuan

Comparative Analysis Of Crime Hotspot Detection And Prediction Using Convolutional Neural Network Over Support Vector Machine with Engineered Spatial Features Towards Increase in Classifier Accuracy

The major aim of the study is to predict the type of crime that is going to happen based on the crime hotspot detected for the given crime data with engineered spatial features. crime dataset is filtered to have the following 2 crime categories: crime against society, crime against person. Crime hotspots are detected by using the Novel Hierarchical density based Spatial Clustering of Application with Noise (HDBSCAN) Algorithm with the number of clusters optimized using silhouette score. The sample data consists of 501 crime incidents. Future types of crime for the given location are predicted by using the Support Vector Machine (SVM) and Convolutional Neural Network (CNN) algorithms (N=5). The accuracy of crime prediction using Support Vector Machine classification algorithm is 94.01% and Convolutional Neural Network algorithm is 79.98% with the significance p-value of 0.033. The Support Vector Machine algorithm is significantly better in accuracy for prediction of type of crime than Convolutional Neural Network (CNN).

Authored by T. Sravani, M.Raja Suguna

Deep Learning enabled Channel Secrecy Codes for Physical Layer Security of UAVs in 5G and beyond Networks

Unmanned Aerial Vehicles (UAVs) are drawing enormous attention in both commercial and military applications to facilitate dynamic wireless communications and deliver seamless connectivity due to their flexible deployment, inherent line-of-sight (LOS) air-to-ground (A2G) channels, and high mobility. These advantages, however, render UAV-enabled wireless communication systems susceptible to eavesdropping attempts. Hence, there is a strong need to protect the wireless channel through which most of the UAV-enabled applications share data with each other. There exist various error correction techniques such as Low Density Parity Check (LDPC), polar codes that provide safe and reliable data transmission by exploiting the physical layer but require high transmission power. Also, the security gap achieved by these error-correction techniques must be reduced to improve the security level. In this paper, we present deep learning (DL) enabled punctured LDPC codes to provide secure and reliable transmission of data for UAVs through the Additive White Gaussian Noise (AWGN) channel irrespective of the computational power and channel state information (CSI) of the Eavesdropper. Numerical result analysis shows that the proposed scheme reduces the Bit Error Rate (BER) at Bob effectively as compared to Eve and the Signal to Noise Ratio (SNR) per bit value of 3.5 dB is achieved at the maximum threshold value of BER. Also, the security gap is reduced by 47.22 % as compared to conventional LDPC codes.

Authored by Himanshu Sharma, Neeraj Kumar, Raj Tekchandani, Nazeeruddin Mohammad

Mixed Differential Privacy in Computer Vision

We introduce AdaMix, an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data. While pre-training language models on large public datasets has enabled strong differential privacy (DP) guarantees with minor loss of accuracy, a similar practice yields punishing trade-offs in vision tasks. A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset. AdaMix incorporates few-shot training, or cross-modal zero-shot learning, on public data prior to private fine-tuning, to improve the trade-off. AdaMix reduces the error increase from the non-private upper bound from the 167–311% of the baseline, on average across 6 datasets, to 68-92% depending on the desired privacy level selected by the user. AdaMix tackles the trade-off arising in visual classification, whereby the most privacy sensitive data, corresponding to isolated points in representation space, are also critical for high classification accuracy. In addition, AdaMix comes with strong theoretical privacy guarantees and convergence analysis.

Authored by Aditya Golatkar, Alessandro Achille, Yu-Xiang Wang, Aaron Roth, Michael Kearns, Stefano Soatto

Toward Among-Device AI from On-Device AI with Stream Pipelines

Modern consumer electronic devices often provide intelligence services with deep neural networks. We have started migrating the computing locations of intelligence services from cloud servers (traditional AI systems) to the corresponding devices (on-device AI systems). On-device AI systems generally have the advantages of preserving privacy, removing network latency, and saving cloud costs. With the emergence of on-device AI systems having relatively low computing power, the inconsistent and varying hardware resources and capabilities pose difficulties. Authors' affiliation has started applying a stream pipeline framework, NNStreamer, for on-device AI systems, saving developmental costs and hardware resources and improving performance. We want to expand the types of devices and applications with on-device AI services products of both the affiliation and second/third parties. We also want to make each AI service atomic, re-deployable, and shared among connected devices of arbitrary vendors; we now have yet another requirement introduced as it always has been. The new requirement of “among-device AI” includes connectivity between AI pipelines so that they may share computing resources and hardware capabilities across a wide range of devices regardless of vendors and manufacturers. We propose extensions of the stream pipeline framework, NNStreamer, for on-device AI so that NNStreamer may provide among-device AI capability. This work is a Linux Foundation (LF AI & Data) open source project accepting contributions from the general public.

Authored by MyungJoo Ham, Sangjung Woo, Jaeyun Jung, Wook Song, Gichan Jang, Yongjoo Ahn, Hyoungjoo Ahn

A Robust Framework for Adaptive Selection of Filter Ensembles to Detect Adversarial Inputs

Existing defense strategies against adversarial attacks (AAs) on AI/ML are primarily focused on examining the input data streams using a wide variety of filtering techniques. For instance, input filters are used to remove noisy, misleading, and out-of-class inputs along with a variety of attacks on learning systems. However, a single filter may not be able to detect all types of AAs. To address this issue, in the current work, we propose a robust, transferable, distribution-independent, and cross-domain supported framework for selecting Adaptive Filter Ensembles (AFEs) to minimize the impact of data poisoning on learning systems. The optimal filter ensembles are determined through a Multi-Objective Bi-Level Programming Problem (MOBLPP) that provides a subset of diverse filter sequences, each exhibiting fair detection accuracy. The proposed framework of AFE is trained to model the pristine data distribution to identify the corrupted inputs and converges to the optimal AFE without vanishing gradients and mode collapses irrespective of input data distributions. We presented preliminary experiments to show the proposed defense outperforms the existing defenses in terms of robustness and accuracy.

Authored by Arunava Roy, Dipankar Dasgupta

Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a particular trigger. Several works attempt to detect whether a given DNN has been injected with a specific trigger during the training. In a parallel line of research, the lottery ticket hypothesis reveals the existence of sparse sub-networks which are capable of reaching competitive performance as the dense network after independent training. Connecting these two dots, we investigate the problem of Trojan DNN detection from the brand new lens of sparsity, even when no clean training data is available. Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a “winning Trojan lottery ticket” which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated sub-network. Extensive experiments on various datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet, with different network architectures, i.e., VGG-16, ResNet-18, ResNet-20s, and DenseNet-100 demonstrate the effectiveness of our proposal. Codes are available at https://github.com/VITA-Group/Backdoor-LTH.

Authored by Tianlong Chen, Zhenyu Zhang, Yihua Zhang, Shiyu Chang, Sijia Liu, Zhangyang Wang

Multi-level security defense method of smart substation based on data aggregation and convolution neural network

Aiming at the prevention of information security risk in protection and control of smart substation, a multi-level security defense method of substation based on data aggregation and convolution neural network (CNN) is proposed. Firstly, the intelligent electronic device(IED) uses "digital certificate + digital signature" for the first level of identity authentication, and uses UKey identification code for the second level of physical identity authentication; Secondly, the device group of the monitoring layer judges whether the data report is tampered during transmission according to the registration stage and its own ID information, and the device group aggregates the data using the credential information; Finally, the convolution decomposition technology and depth separable technology are combined, and the time factor is introduced to control the degree of data fusion and the number of input channels of the network, so that the network model can learn the original data and fused data at the same time. Simulation results show that the proposed method can effectively save communication overhead, ensure the reliable transmission of messages under normal and abnormal operation, and effectively improve the security defense ability of smart substation.

Authored by Dong Liu, Yingwei Zhu, Haoliang Du, Lixiang Ruan