Publications | Science of Security Virtual Organization

Towards XAI in the SOC – a user centric study of explainable alerts with SHAP and LIME

Many studies of the adoption of machine learning (ML) in Security Operation Centres (SOCs) have pointed to a lack of transparency and explanation – and thus trust – as a barrier to ML adoption, and have suggested eXplainable Artificial Intelligence (XAI) as a possible solution. However, there is a lack of studies addressing to which degree XAI indeed helps SOC analysts. Focusing on two XAI-techniques, SHAP and LIME, we have interviewed several SOC analysts to understand how XAI can be used and adapted to explain ML-generated alerts. The results show that XAI can provide valuable insights for the analyst by highlighting features and information deemed important for a given alert. As far as we are aware, we are the first to conduct such a user study of XAI usage in a SOC and this short paper provides our initial findings.

Authored by Håkon Eriksson, Gudmund Grov

Objection!: Identifying Misclassified Malicious Activities with XAI

Many studies have been conducted to detect various malicious activities in cyberspace using classifiers built by machine learning. However, it is natural for any classifier to make mistakes, and hence, human verification is necessary. One method to address this issue is eXplainable AI (XAI), which provides a reason for the classification result. However, when the number of classification results to be verified is large, it is not realistic to check the output of the XAI for all cases. In addition, it is sometimes difficult to interpret the output of XAI. In this study, we propose a machine learning model called classification verifier that verifies the classification results by using the output of XAI as a feature and raises objections when there is doubt about the reliability of the classification results. The results of experiments on malicious website detection and malware detection show that the proposed classification verifier can efficiently identify misclassified malicious activities.

Authored by Koji Fujita, Toshiki Shibahara, Daiki Chiba, Mitsuaki Akiyama, Masato Uchida

Potential Information Security Risks in The Implementation of AI - Based Systems

At present, technological solutions based on artificial intelligence (AI) are being accelerated in various sectors of the economy and social relations in the world. Practice shows that fast-developing information technologies, as a rule, carry new, previously unidentified threats to information security (IS). It is quite obvious that identification of vulnerabilities, threats and risks of AI technologies requires consideration of each technology separately or in some aggregate in cases of their joint use in application solutions. Of the wide range of AI technologies, data preparation, DevOps, Machine Learning (ML) algorithms, cloud technologies, microprocessors and public services (including Marketplaces) have received the most attention. Due to the high importance and impact on most AI solutions, this paper will focus on the key AI assets, the attacks and risks that arise when implementing AI-based systems, and the issue of building secure AI.

Authored by P. Lozhnikov, S. Zhumazhanova

Characterization of MPC-based Private Inference for Transformer-based Models

In this work, we provide an in-depth characterization study of the performance overhead for running Transformer models with secure multi-party computation (MPC). MPC is a cryptographic framework for protecting both the model and input data privacy in the presence of untrusted compute nodes. Our characterization study shows that Transformers introduce several performance challenges for MPC-based private machine learning inference. First, Transformers rely extensively on “softmax” functions. While softmax functions are relatively cheap in a non-private execution, softmax dominates the MPC inference runtime, consuming up to 50\% of the total inference runtime. Further investigation shows that computing the maximum, needed for providing numerical stability to softmax, is a key culprit for the increase in latency. Second, MPC relies on approximating non-linear functions that are part of the softmax computations, and the narrow dynamic ranges make optimizing softmax while maintaining accuracy quite difficult. Finally, unlike CNNs, Transformer-based NLP models use large embedding tables to convert input words into embedding vectors. Accesses to these embedding tables can disclose inputs; hence, additional obfuscation for embedding access patterns is required for guaranteeing the input privacy. One approach to hide address accesses is to convert an embedding table lookup into a matrix multiplication. However, this naive approach increases MPC inference runtime significantly. We then apply tensor-train (TT) decomposition, a lossy compression technique for representing embedding tables, and evaluate its performance on embedding lookups. We show the trade-off between performance improvements and the corresponding impact on model accuracy using detailed experiments.

Authored by Yongqin Wang, Edward Suh, Wenjie Xiong, Benjamin Lefaudeux, Brian Knott, Murali Annavaram, Hsien-Hsin Lee

Potential Information Security Risks in The Implementation of AI - Based Systems

Authored by P. Lozhnikov, S. Zhumazhanova

Internet of Things and Cloud Computing Involvement Microsoft Azure Platform

The rapid advancement of cloud technology has resulted in the emergence of many cloud service providers. Microsoft Azure is one among them to provide a flexible cloud computing platform that can scale business to exceptional heights. It offers extensive cloud services and is compatible with a wide range of developer tools, databases, and operating systems. In this paper, a detailed analysis of Microsoft Azure in the cloud computing era is performed. For this reason, the three significant Azure services, namely, the Azure AI (Artificial Intelligence) and Machine Learning (ML) Service, Azure Analytics Service and Internet of Things (IoT) are investigated. The paper briefs on the Azure Cognitive Search and Face Service under AI and ML service and explores this service s architecture and security measures. The proposed study also surveys the Data Lake and Data factory Services under Azure Analytics Service. Subsequently, an overview of Azure IoT service, mainly IoT Hub and IoT Central, is discussed. Along with Microsoft Azure, other providers in the market are Google Compute Engine and Amazon Web Service. The paper compares and contrasts each cloud service provider based on their computing capability.

Authored by Sreyes K, Anushka K, Dona Davis, N. Jayapandian

A New Approach for Machine Learning Security Risk Assessment – Work in Progress

We propose a new security risk assessment approach for Machine Learning-based AI systems (ML systems). The assessment of security risks of ML systems requires expertise in ML security. So, ML system developers, who may not know much about ML security, cannot assess the security risks of their systems. By using our approach, a ML system developers can easily assess the security risks of the ML system. In performing the assessment, the ML system developer only has to answer the yes/no questions about the specification of the ML system. In our trial, we confirmed that our approach works correctly. CCS CONCEPTS • Security and privacy; • Computing methodologies → Artificial intelligence; Machine learning;

Authored by Jun Yajima, Maki Inui, Takanori Oikawa, Fumiyoshi Kasahara, Ikuya Morikawa, Nobukazu Yoshioka

Clean-label Backdoor Attack on Machine Learning-based Malware Detection Models and Countermeasures

In recent years, machine learning technology has been extensively utilized, leading to increased attention to the security of AI systems. In the field of image recognition, an attack technique called clean-label backdoor attack has been widely studied, and it is more difficult to detect than general backdoor attacks because data labels do not change when tampering with poisoning data during model training. However, there remains a lack of research on malware detection systems. Some of the current work is under the white-box assumption that requires knowledge of machine learning-based models which can be advantageous for attackers. In this study, we focus on clean-label backdoor attacks in malware detection systems and propose a new clean-label backdoor attack under the black-box assumption that does not require knowledge of machine learning-based models, which is riskier. The experimental evaluation of the proposed attack method shows that the attack success rate is up to 80.50\% when the poisoning rate is 14.00\%, demonstrating the effectiveness of the proposed attack method. In addition, we experimentally evaluated the effectiveness of the dimensionality reduction techniques in preventing clean-label backdoor attacks, and showed that it can reduce the attack success rate by 76.00\%.

Authored by Wanjia Zheng, Kazumasa Omote

Exploring the Effect of Adversarial Attacks on Deep Learning Architectures for X-Ray Data

As artificial intelligent models continue to grow in their capacity and sophistication, they are often trusted with very sensitive information. In the sub-field of adversarial machine learning, developments are geared solely towards finding reliable methods to systematically erode the ability of artificial intelligent systems to perform as intended. These techniques can cause serious breaches of security, interruptions to major systems, and irreversible damage to consumers. Our research evaluates the effects of various white box adversarial machine learning attacks on popular computer vision deep learning models leveraging a public X-ray dataset from the National Institutes of Health (NIH). We make use of several experiments to gauge the feasibility of developing deep learning models that are robust to adversarial machine learning attacks by taking into account different defense strategies, such as adversarial training, to observe how adversarial attacks evolve over time. Our research details how a variety white box attacks effect different components of InceptionNet, DenseNet, and ResNeXt and suggest how the models can effectively defend against these attacks.

Authored by Ilyas Bankole-Hameed, Arav Parikh, Josh Harguess

A Fault and Intrusion Tolerance Framework for Containerized Environments: A Specification-Based Error Detection Approach

Container-based virtualization has gained momentum over the past few years thanks to its lightweight nature and support for agility. However, its appealing features come at the price of a reduced isolation level compared to the traditional host-based virtualization techniques, exposing workloads to various faults, such as co-residency attacks like container escape. In this work, we propose to leverage the automated management capabilities of containerized environments to derive a Fault and Intrusion Tolerance (FIT) framework based on error detection-recovery and fault treatment. Namely, we aim at deriving a specification-based error detection mechanism at the host level to systematically and formally capture security state errors indicating breaches potentially caused by malicious containers. Although the paper focuses on security side use cases, results are logically extendable to accidental faults. Our aim is to immunize the target environments against accidental and malicious faults and preserve their core dependability and security properties.

Authored by Taous Madi, Paulo Esteves-Verissimo

A Fault and Intrusion Tolerance Framework for Containerized Environments: A Specification-Based Error Detection Approach

Authored by Taous Madi, Paulo Esteves-Verissimo

Application of Intelligent Transportation System Data using Big Data Technologies

Problems such as the increase in the number of private vehicles with the population, the rise in environmental pollution, the emergence of unmet infrastructure and resource problems, and the decrease in time efficiency in cities have put local governments, cities, and countries in search of solutions. These problems faced by cities and countries are tried to be solved in the concept of smart cities and intelligent transportation by using information and communication technologies in line with the needs. While designing intelligent transportation systems (ITS), beyond traditional methods, big data should be designed in a state-of-the-art and appropriate way with the help of methods such as artificial intelligence, machine learning, and deep learning. In this study, a data-driven decision support system model was established to help the business make strategic decisions with the help of intelligent transportation data and to contribute to the elimination of public transportation problems in the city. Our study model has been established using big data technologies and business intelligence technologies: a decision support system including data sources layer, data ingestion/ collection layer, data storage and processing layer, data analytics layer, application/presentation layer, developer layer, and data management/ data security layer stages. In our study, the decision support system was modeled using ITS data supported by big data technologies, where the traditional structure could not find a solution. This paper aims to create a basis for future studies looking for solutions to the problems of integration, storage, processing, and analysis of big data and to add value to the literature that is missing within the framework of the model. We provide both the lack of literature, eliminate the lack of models before the application process of existing data sets to the business intelligence architecture and a model study before the application to be carried out by the authors.

Authored by Kutlu Sengul, Cigdem Tarhan, Vahap Tecim

Impact of Artificial Intelligence on Information Security in Business

Cyber security is a critical problem that causes data breaches, identity theft, and harm to millions of people and businesses. As technology evolves, new security threats emerge as a result of a dearth of cyber security specialists equipped with up-to-date information. It is hard for security firms to prevent cyber-attacks without the cooperation of senior professionals. However, by depending on artificial intelligence to combat cyber-attacks, the strain on specialists can be lessened. as the use of Artificial Intelligence (AI) can improve Machine Learning (ML) approaches that can mine data to detect the sources of cyberattacks or perhaps prevent them as an AI method, it enables and facilitates malware detection by utilizing data from prior cyber-attacks in a variety of methods, including behavior analysis, risk assessment, bot blocking, endpoint protection, and security task automation. However, deploying AI may present new threats, therefore cyber security experts must establish a balance between risk and benefit. While AI models can aid cybersecurity experts in making decisions and forming conclusions, they will never be able to make all cybersecurity decisions and judgments.

Authored by Safiya Alawadhi, Areej Zowayed, Hamad Abdulla, Moaiad Khder, Basel Ali

Semi-supervised Trojan Nets Classification Using Anomaly Detection Based on SCOAP Features

Recently, hardware Trojan has become a serious security concern in the integrated circuit (IC) industry. Due to the globalization of semiconductor design and fabrication processes, ICs are highly vulnerable to hardware Trojan insertion by malicious third-party vendors. Therefore, the development of effective hardware Trojan detection techniques is necessary. Testability measures have been proven to be efﬁcient features for Trojan nets classiﬁcation. However, most of the existing machine-learning-based techniques use supervised learning methods, which involve time-consuming training processes, need to deal with the class imbalance problem, and are not pragmatic in real-world situations. Furthermore, no works have explored the use of anomaly detection for hardware Trojan detection tasks. This paper proposes a semi-supervised hardware Trojan detection method at the gate level using anomaly detection. We ameliorate the existing computation of the Sandia Controllability/Observability Analysis Program (SCOAP) values by considering all types of D ﬂip-ﬂops and adopt semi-supervised anomaly detection techniques to detect Trojan nets. Finally, a novel topology-based location analysis is utilized to improve the detection performance. Testing on 17 Trust-Hub Trojan benchmarks, the proposed method achieves an overall 99.47\% true positive rate (TPR), 99.99\% true negative rate (TNR), and 99.99\% accuracy.

Authored by Pei-Yu Lo, Chi-Wei Chen, Wei-Ting Hsu, Chih-Wei Chen, Chin-Wei Tien, Sy-Yen Kuo

Hardware Trojan Detection at LUT: Where Structural Features Meet Behavioral Characteristics

This work proposes a novel hardware Trojan detection method that leverages static structural features and behavioral characteristics in ﬁeld programmable gate array (FPGA) netlists. Mapping of hardware design sources to look-up-table (LUT) networks makes these features explicit, allowing automated feature extraction and further effective Trojan detection through machine learning. Four-dimensional features are extracted for each signal and a random forest classiﬁer is trained for Trojan net classiﬁcation. Experiments using Trust-Hub benchmarks show promising Trojan detection results with accuracy, precision, and F1-measure of 99.986\%, 100\%, and 99.769\% respectively on average.

Authored by Lingjuan Wu, Xuelin Zhang, Siyi Wang, Wei Hu

Obtaining a Highly Informative Digital Image of Information Signals of Cyber-Physical Systems Using Time-Frequency Spectral Analysis and Digital Filtering

The paper presents the stages of constructing a highly informative digital image of the time-frequency representation of information signals of cyber-physical systems. Signal visualization includes the stage of displaying the signal on the frequency-time plane, the stage of two-dimensional digital filtering and the stage of extracting highly informative components of the signal image. The use of two-dimensional digital filtering allows you to select the most informative component of the image of a complex analyzed information signal. The obtained digital image of the signal of the cyber-physical system is a highly informative initial information for solving a wide range of different problems of information security systems in cyberphysical systems with the subsequent use of machine learning technologies.

Authored by Andrey Ragozin, Anastasiya Pletenkova

Development of Threat Hunting Model Using Machine Learning Algorithms for Cyber Attacks Mitigation

Threat hunting has become very popular due to the present dynamic cyber security environment. As there remains increase in attacks’ landscape, the traditional way of monitoring threats is not scalable anymore. Consequently, threat hunting modeling technique is implemented as an emergent activity using machine learning (ML) paradigms. ML predictive analytics was carried out on OSTO-CID dataset using four algorithms to develop the model. Cross validation ratio of 80:20 was used to train and test the model. Decision tree classifier (DTC) gives the best metrics results among the four ML algorithms with 99.30\% accuracy. Therefore, DTC can be used for developing threat hunting model to mitigate cyber-attacks using data mining approach.

Authored by Akinsola T., Olajubu A., Aderounmu A.

Detection of Intrusions using Support Vector Machines and Deep Neural Networks

An Intrusion detection system (IDS) plays a role in network intrusion detection through network data analysis, and high detection accuracy, precision, and recall are required to detect intrusions. Also, various techniques such as expert systems, data mining, and state transition analysis are used for network data analysis. The paper compares the detection effects of the two IDS methods using data mining. The first technique is a support vector machine (SVM), a machine learning algorithm; the second is a deep neural network (DNN), one of the artificial neural network models. The accuracy, precision, and recall were calculated and compared using NSL-KDD training and validation data, which is widely used in intrusion detection to compare the detection effects of the two techniques. DNN shows slightly higher accuracy than the SVM model. The risk of recognizing an actual intrusion as normal data is much greater than the risk of considering normal data as an intrusion, so DNN proves to be much more effective in intrusion detection than SVM.

Authored by N Patel, B Mehtre, Rajeev Wankar

Cyber Threat Intelligence and Machine Learning

Cyber Threat Intelligence has been demonstrated to be an effective element of defensive security and cyber protection with examples dating back to the founding of the Financial Sector Information Sharing and Analysis Center (FS ISAC) in 1998. Automated methods are needed today in order to stay current with the magnitude of attacks across the globe. Threat information must be actionable, current and credibly validated if they are to be ingested into computer operated defense systems. False positives degrade the value of the system. This paper outlines some of the progress made in applying artificial intelligence techniques as well as the challenges associated with utilizing machine learning to refine the flow of threat intelligence. A variety of methods have been developed to create learning models that can be integrated with firewalls, rules and heuristics. In addition more work is needed to effectively support the limited number of expert human hours available to evaluate the prioritized threat landscape flagged as malicious in a (Security Operations Center) SOC environment.

Authored by Jon Haass

Low-rank Defenses Against Adversarial Attacks in Recommender Systems

Recommender systems are powerful tools which touch on numerous aspects of everyday life, from shopping to consuming content, and beyond. However, as other machine learning models, recommender system models are vulnerable to adversarial attacks and their performance could drop significantly with a slight modification of the input data. Most of the studies in the area of adversarial machine learning are focused on the image and vision domain. There are very few work that study adversarial attacks on recommender systems and even fewer work that study ways to make the recommender systems robust and reliable. In this study, we explore two stateof-the-art adversarial attack methods proposed by Tang et al. [1] and Christakopoulou et al. [2] and we report our proposed defenses and experimental evaluations against these attacks. In particular, we observe that low-rank reconstructions and/or transformation of the attacked data has a significant alleviating effect on the attack, and we present extensive experimental evidence to demonstrate the effectiveness of this approach. We also show that a simple classifier is able to learn to detect fake users from real users and can successfully discard them from the dataset. This observation elaborates the fact that the threat model does not generate fake users that mimic the same behavior of real users and can be easily distinguished from real users’ behavior. We also examine how transforming latent factors of the matrix factorization model into a low-dimensional space impacts its performance. Furthermore, we combine fake users from both attacks to examine how our proposed defense is able to defend against multiple attacks at the same time. Local lowrank reconstruction was able to reduce the hit ratio of target items from 23.54\% to 15.69\% while the overall performance of the recommender system was preserved.

Authored by Negin Entezari, Evangelos Papalexakis

Automatic Mapping of Unstructured Cyber Threat Intelligence: An Experimental Study: (Practical Experience Report)

Proactive approaches to security, such as adversary emulation, leverage information about threat actors and their techniques (Cyber Threat Intelligence, CTI). However, most CTI still comes in unstructured forms (i.e., natural language), such as incident reports and leaked documents. To support proactive security efforts, we present an experimental study on the automatic classiﬁcation of unstructured CTI into attack techniques using machine learning (ML). We contribute with two new datasets for CTI analysis, and we evaluate several ML models, including both traditional and deep learning-based ones. We present several lessons learned about how ML can perform at this task, which classiﬁers perform best and under which conditions, which are the main causes of classiﬁcation errors, and the challenges ahead for CTI analysis.

Authored by Vittorio Orbinato, Mariarosaria Barbaraci, Roberto Natella, Domenico Cotroneo

Threat Modeling for Machine Learning-Based Network Intrusion Detection Systems

Network Intrusion Detection Systems (NIDS) monitor networking environments for suspicious events that could compromise the availability, integrity, or confidentiality of the network’s resources. To ensure NIDSs play their vital roles, it is necessary to identify how they can be attacked by adopting a viewpoint similar to the adversary to identify vulnerabilities and defenses hiatus. Accordingly, effective countermeasures can be designed to thwart any potential attacks. Machine learning (ML) approaches have been adopted widely for network anomaly detection. However, it has been found that ML models are vulnerable to adversarial attacks. In such attacks, subtle perturbations are inserted to the original inputs at inference time in order to evade the classifier detection or at training time to degrade its performance. Yet, modeling adversarial attacks and the associated threats of employing the machine learning approaches for NIDSs was not addressed. One of the growing challenges is to avoid ML-based systems’ diversity and ensure their security and trust. In this paper, we conduct threat modeling for ML-based NIDS using STRIDE and Attack Tree approaches to identify the potential threats on different levels. We model the threats that can be potentially realized by exploiting vulnerabilities in ML algorithms through a simplified structural attack tree. To provide holistic threat modeling, we apply the STRIDE method to systems’ data flow to uncover further technical threats. Our models revealed a noticing of 46 possible threats to consider. These presented models can help to understand the different ways that a ML-based NIDS can be attacked; hence, hardening measures can be developed to prevent these potential attacks from achieving their goals.

Authored by Huda Alatwi, Charles Morisset

Prior Knowledge based Advanced Persistent Threats Detection for IoT in a Realistic Benchmark

The number of Internet of Things (IoT) devices being deployed into networks is growing at a phenomenal pace, which makes IoT networks more vulnerable in the wireless medium. Advanced Persistent Threat (APT) is malicious to most of the network facilities and the available attack data for training the machine learning-based Intrusion Detection System (IDS) is limited when compared to the normal trafﬁc. Therefore, it is quite challenging to enhance the detection performance in order to mitigate the inﬂuence of APT. Therefore, Prior Knowledge Input (PKI) models are proposed and tested using the SCVIC-APT2021 dataset. To obtain prior knowledge, the proposed PKI model pre-classiﬁes the original dataset with unsupervised clustering method. Then, the obtained prior knowledge is incorporated into the supervised model to decrease training complexity and assist the supervised model in determining the optimal mapping between the raw data and true labels. The experimental ﬁndings indicate that the PKI model outperforms the supervised baseline, with the best macro average F1-score of 81.37\%, which is 10.47\% higher than the baseline.

Authored by Yu Shen, Murat Simsek, Burak Kantarci, Hussein Mouftah, Mehran Bagheri, Petar Djukic

Combating Advanced Persistent Threats for Imminent Low Earth Orbit Cognitive Communications Systems

With the proliferation of Low Earth Orbit (LEO) spacecraft constellations, comes the rise of space-based wireless cognitive communications systems (CCS) and the need to safeguard and protect data against potential hostiles to maintain widespread communications for enabling science, military and commercial services. For example, known adversaries are using advanced persistent threats (APT) or highly progressive intrusion mechanisms to target high priority wireless space communication systems. Specialized threats continue to evolve with the advent of machine learning and artificial intelligence, where computer systems inherently can identify system vulnerabilities expeditiously over naive human threat actors due to increased processing resources and unbiased pattern recognition. This paper presents a disruptive abuse case for an APT-attack on such a CCS and describes a trade-off analysis that was performed to evaluate a variety of machine learning techniques that could aid in the rapid detection and mitigation of an APT-attack. The trade results indicate that with the employment of neural networks, the CCS s resiliency would increase its operational functionality, and therefore, on-demand communication services reliability would increase. Further, modelling, simulation, and analysis (MS\&A) was achieved using the Knowledge Discovery and Data Mining (KDD) Cup 1999 data set as a means to validate a subset of the trade study results against Training Time and Number of Parameters selection criteria. Training and cross-validation learning curves were computed to model the learning performance over time to yield a reasonable conclusion about the application of neural networks.

Authored by Suzanna LaMar, Jordan Gosselin, Lisa Happel, Anura Jayasumana

An Efficient User Trust Computation Using Machine Learning Methods in Online Social Networks

Social networks are good platforms for likeminded people to exchange their views and thoughts. With the rapid growth of web applications, social networks became huge networks with million numbers of users. On the other hand, number of malicious activities by untrustworthy users also increased. Users must estimate the people trustworthiness before sharing their personal information with them. Since the social networks are huge and complex, the estimation of user trust value is not trivial task and could gain main researchers focus. Some of the mathematical methods are proposed to estimate the user trust value, but still they are lack of efficient methods to analyze user activities. In this paper “An Efficient Trust Computation Methods Using Machine Learning in Online Social Networks- TCML” is proposed. Here the twitter user activities are considered to estimate user direct trust value. The trust values of unknown users are computed through the recommendations of common friends. The available twitter data set is unlabeled data, hence unsupervised methods are used in categorization (clusters) of users and in computation of their trust value. In experiment results, silhouette score is used in assessing of cluster quality. The proposed method performance is compared with existing methods like mole and tidal where it could outperform them.

Authored by Anitha Yarava, Shoba Bindu