Publications | Science of Security Virtual Organization

Objection!: Identifying Misclassified Malicious Activities with XAI

Many studies have been conducted to detect various malicious activities in cyberspace using classifiers built by machine learning. However, it is natural for any classifier to make mistakes, and hence, human verification is necessary. One method to address this issue is eXplainable AI (XAI), which provides a reason for the classification result. However, when the number of classification results to be verified is large, it is not realistic to check the output of the XAI for all cases. In addition, it is sometimes difficult to interpret the output of XAI. In this study, we propose a machine learning model called classification verifier that verifies the classification results by using the output of XAI as a feature and raises objections when there is doubt about the reliability of the classification results. The results of experiments on malicious website detection and malware detection show that the proposed classification verifier can efficiently identify misclassified malicious activities.

Authored by Koji Fujita, Toshiki Shibahara, Daiki Chiba, Mitsuaki Akiyama, Masato Uchida

Applications of Transformer Attention Mechanisms in Information Security: Current Trends and Prospects

In this work, we present a comprehensive survey on applications of the most recent transformer architecture based on attention in information security. Our review reveals three primary areas of application: Intrusion detection, Anomaly Detection and Malware Detection. We have presented an overview of attention-based mechanisms and their application in each cybersecurity use case, and discussed open grounds for future trends in Artificial Intelligence enabled information security.

Authored by M. Vubangsi, Sarumi Abidemi, Olukayode Akanni, Auwalu Mubarak, Fadi Al-Turjman

Adaptive Intrusion Detection Systems: Class Incremental Learning for IoT Emerging Threats

In the evolving landscape of Internet of Things (IoT) security, the need for continuous adaptation of defenses is critical. Class Incremental Learning (CIL) can provide a viable solution by enabling Machine Learning (ML) and Deep Learning (DL) models to ( i) learn and adapt to new attack types (0-day attacks), ( ii) retain their ability to detect known threats, (iii) safeguard computational efficiency (i.e. no full re-training). In IoT security, where novel attacks frequently emerge, CIL offers an effective tool to enhance Intrusion Detection Systems (IDS) and secure network environments. In this study, we explore how CIL approaches empower DL-based IDS in IoT networks, using the publicly-available IoT-23 dataset. Our evaluation focuses on two essential aspects of an IDS: ( a) attack classification and ( b) misuse detection. A thorough comparison against a fully-retrained IDS, namely starting from scratch, is carried out. Finally, we place emphasis on interpreting the predictions made by incremental IDS models through eXplainable AI (XAI) tools, offering insights into potential avenues for improvement.

Authored by Francesco Cerasuolo, Giampaolo Bovenzi, Christian Marescalco, Francesco Cirillo, Domenico Ciuonzo, Antonio Pescapè

Detecting Conventional and Adversarial Attacks Using Deep Learning Techniques: A Systematic Review

Significant progress has been made towards developing Deep Learning (DL) in Artificial Intelligence (AI) models that can make independent decisions. However, this progress has also highlighted the emergence of malicious entities that aim to manipulate the outcomes generated by these models. Due to increasing complexity, this is a concerning issue in various fields, such as medical image classification, autonomous vehicle systems, malware detection, and criminal justice. Recent research advancements have highlighted the vulnerability of these classifiers to both conventional and adversarial assaults, which may skew their results in both the training and testing stages. The Systematic Literature Review (SLR) aims to analyse traditional and adversarial attacks comprehensively. It evaluates 45 published works from 2017 to 2023 to better understand adversarial attacks, including their impact, causes, and standard mitigation approaches.

Authored by Tarek Ali, Amna Eleyan, Tarek Bejaoui

Case Study: Neural Network Malware Detection Verification for Feature and Image Datasets

Malware, or software designed with harmful intent, is an ever-evolving threat that can have drastic effects on both individuals and institutions. Neural network malware classification systems are key tools for combating these threats but are vulnerable to adversarial machine learning attacks. These attacks perturb input data to cause misclassification, bypassing protective systems. Existing defenses often rely on enhancing the training process, thereby increasing the model’s robustness to these perturbations, which is quantified using verification. While training improvements are necessary, we propose focusing on the verification process used to evaluate improvements to training. As such, we present a case study that evaluates a novel verification domain that will help to ensure tangible safeguards against adversaries and provide a more reliable means of evaluating the robustness and effectiveness of anti-malware systems. To do so, we describe malware classification and two types of common malware datasets (feature and image datasets), demonstrate the certified robustness accuracy of malware classifiers using the Neural Network Verification (NNV) and Neural Network Enumeration (nnenum) tools1, and outline the challenges and future considerations necessary for the improvement and refinement of the verification of malware classification. By evaluating this novel domain as a case study, we hope to increase its visibility, encourage further research and scrutiny, and ultimately enhance the resilience of digital systems against malicious attacks.

Authored by Preston Robinette, Diego Lopez, Serena Serbinowska, Kevin Leach, Taylor Johnson

Cyber Automated Network Resilience Defensive Approach against Malware Images

Cyber threats have been a major issue in the cyber security domain. Every hacker follows a series of cyber-attack stages known as cyber kill chain stages. Each stage has its norms and limitations to be deployed. For a decade, researchers have focused on detecting these attacks. Merely watcher tools are not optimal solutions anymore. Everything is becoming autonomous in the computer science field. This leads to the idea of an Autonomous Cyber Resilience Defense algorithm design in this work. Resilience has two aspects: Response and Recovery. Response requires some actions to be performed to mitigate attacks. Recovery is patching the flawed code or back door vulnerability. Both aspects were performed by human assistance in the cybersecurity defense field. This work aims to develop an algorithm based on Reinforcement Learning (RL) with a Convoluted Neural Network (CNN), far nearer to the human learning process for malware images. RL learns through a reward mechanism against every performed attack. Every action has some kind of output that can be classified into positive or negative rewards. To enhance its thinking process Markov Decision Process (MDP) will be mitigated with this RL approach. RL impact and induction measures for malware images were measured and performed to get optimal results. Based on the Malimg Image malware, dataset successful automation actions are received. The proposed work has shown 98\% accuracy in the classification, detection, and autonomous resilience actions deployment.

Authored by Kainat Rizwan, Mudassar Ahmad, Muhammad Habib

Securing the Digital Fortress: Unveiling the Modern Battleground for Sustainable OSs and the Digital Threatscape

The increasing prevalence of cyber threats necessitates the exploration of cybersecurity challenges in sustainable operating systems. This research paper addresses these challenges by examining the dynamic landscape of cyber threats and the modifications required in operating systems to ensure robust security measures. Through the classification of these threats, the diverse nature of attacks faced by operating systems is revealed, highlighting the need for proactive security measures. Furthermore, the study investigates current cyber security solutions and prevention mechanisms employed to mitigate these threats. It also explores the modifications and challenges that operating systems must undergo in response to cybersecurity crimes, emphasizing the significance of proactive measures to address vulnerabilities exploited by cybercriminals.

Authored by Shadi bi, Samar Hendawi, Islam Altalahin, Muder Almiani, Ala Mughaid

Graph Neural Network for Malware Detection and Classification on Renewable Energy Management Platform

With the rapid development of science and technology, information security issues have been attracting more attention. According to statistics, tens of millions of computers around the world are infected by malicious software (Malware) every year, causing losses of up to several USD billion. Malware uses various methods to invade computer systems, including viruses, worms, Trojan horses, and others and exploit network vulnerabilities for intrusion. Most intrusion detection approaches employ behavioral analysis techniques to analyze malware threats with packet collection and filtering, feature engineering, and attribute comparison. These approaches are difficult to differentiate malicious traffic from legitimate traffic. Malware detection and classification are conducted with deep learning and graph neural networks (GNNs) to learn the characteristics of malware. In this study, a GNN-based model is proposed for malware detection and classification on a renewable energy management platform. It uses GNN to analyze malware with Cuckoo Sandbox malware records for malware detection and classification. To evaluate the effectiveness of the GNN-based model, the CIC-AndMal2017 dataset is used to examine its accuracy, precision, recall, and ROC curve. Experimental results show that the GNN-based model can reach better results.

Authored by Hsiao-Chung Lin, Ping Wang, Wen-Hui Lin, Yu-Hsiang Lin, Jia-Hong Chen

"Benchmark: Neural Network Malware Classification"

Authored by Preston Robinette, Diego Lopez, Taylor Johnson

Heterogeneous Graph Transformer for Advanced Persistent Threat Classification in Wireless Networks

Advanced Persistent Threats (APTs) have significantly impacted organizations over an extended period with their coordinated and sophisticated cyberattacks. Unlike signature-based tools such as antivirus and firewalls that can detect and block other types of malware, APTs exploit zero-day vulnerabilities to generate new variants of undetectable malware. Additionally, APT adversaries engage in complex relationships and interactions within network entities, necessitating the learning of interactions in network traffic flows, such as hosts, users, or IP addresses, for effective detection. However, traditional deep neural networks often fail to capture the inherent graph structure and overlook crucial contextual information in network traffic flows. To address these issues, this research models APTs as heterogeneous graphs, capturing the diverse features and complex interactions in network flows. Consequently, a hetero-geneous graph transformer (HGT) model is used to accurately distinguish between benign and malicious network connections. Experiment results reveal that the HGT model achieves better performance, with 100 \% accuracy and accelerated learning time, outperferming homogeneous graph neural network models.

Authored by Kazeem Saheed, Shagufta Henna

Android Malware Classification with Gray Wolf Optimization Algorithm and Deep Neural Network Hybrid Approach

Malware Classification - With the rapid development of technology and the increase in the use of Android software, the number of malware has also increased. This study presents a classification as malware/goodware with the features of 4465 Android applications. Cost is an important problem for the increasing number of applications and the analyzes to be made on each application. This study focused on this problem with the hybrid use of Gray Wolf Optimization Algorithm (GWO) and Deep Neural Networks (DNN). With the use of GWO, both feature selection and the features of the model to be created with DNN are determined. In this way, an approximate solution proposal is presented for the most suitable features and the most suitable model design. The model, which was created with the use of GWO-DNN hybrid in this study, offers an F1 score of 99.74%.

Authored by Merve Güllü, Necattin Barişçi

Why GloVe Shows Negative Effects in Malware Classification

Malware Classification - The past decades witness the development of various Machine Learning (ML) models for malware classification. Semantic representation is a crucial basis for these classifiers. This paper aims to assess the effect of semantic representation methods on malware classifier performance. Two commonly-used semantic representation methods including N-gram and GloVe. We utilize diverse ML classifiers to conduct comparative experiments to analyze the capability of N-gram, GloVe and image-based methods for malware classification. We also analyze deeply the reason why the GloVe can produce negative effects on malware static analysis.

Authored by Bingchu Jin, Zesheng Hu, Jianhua Wang, Monong Wei, Yawei Zhao, Chao Xue

Malware Family Classification via Residual Prefetch Artifacts

Malware Classification - Automated malware classification assigns unknown malware to known families. Most research in malware classification assumes that the defender has access to the malware for analysis. Unfortunately, malware can delete itself after execution. As a result, analysts are only left with digital residue, such as network logs or remnant artifacts of malware in memory or on the file system. In this paper, a novel malware classification method based on the Windows prefetch mechanism is presented and evaluated, enabling analysts to classify malware without a corresponding executable. The approach extracts features from Windows prefetch files, a file system artifact that contains historical process information such as loaded libraries and process dependencies. Results show that classification using these features with two different algorithms garnered F-Scores between 0.80 and 0.82, offering analysts a viable option for forensic analysis.

Authored by Adam Duby, Teryl Taylor, Yanyan Zhuang

Malware Detection Classification using Recurrent Neural Network

Malware Classification - Nowadays, increasing numbers of malicious programs are becoming a serious problem, which increases the need for automated detection and categorization of potential threats. These attacks often use undetected malware that is not recognized by the security vendor, making it difficult to protect the endpoints from viruses. Existing methods have been proposed to detect malware. However, as malware variations develop, they can lead to misdiagnosis and are difficult to diagnose accurately. To address this problem, in this work introduces a Recurrent Neural Network (RNN) to identify the malware or benign based on extract features using Information Gain Absolute Feature Selection (IGAFS) technique. First, Malware detection dataset is collected from kaggle repository. Then the proposed pre-process the dataset for removing null and noisy values to prepare the dataset. Next, the proposed Information Gain Absolute Feature Selection (IGAFS) technique is used to select most relevant features for malware from the pre-processed dataset. Selected features are trained into Recurrent Neural Network (RNN) method to classify as malware or not with better accuracy and false rate. The experimental result provides greater performance compared with previous methods.

Authored by Suresh Kumar, Umi B., Isa Mishra, Shitharth S., Diwakar Tripathi, Siva T.

Android Malware Classification by CNN-LSTM

Malware Classification - Mobile devices play a crucial role and have become an essential part of people's life particularly with online applications such as shopping, learning, mailing, etc. Android OS has continued to drive the market for other operating systems since 2012. Traditional Android malware detection methods, such as static, dynamic, hybrid analysis, or the Bayesian model, may show less accuracy to detect recent Android malware. We propose a deep learning method for Android malware detection using Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). CNN provides efficient feature extraction from data and the use of additional LSTM layers improves prediction accuracy. According to the test results, CNN-LSTM can provide reliable malware prediction in Android applications. We train and test our approach using the CICMalDroid2020 dataset. The test results show that the CNN-LSTM classifier exceeds with an accuracy of 94%.

Authored by Shakhnaz Amenova, Cemil Turan, Dinara Zharkynbek

Malware Classification based on a Light-weight Architecture of CNN: MalShuffleNet

Malware Classification - Traditional methods of malware detection have difficulty in detecting massive malware variants. Malware detection based on malware visualization has been proved an effective method for identifying unknown malware variants. In order to improve the accuracy and reduce the detection time of above methods, a novel method for malware classification in a light-weight CNN architecture named MalshuffleNet is proposed. The model is customized based on ShuffleNet V2 by adjusting the numbers of the fully connected layer for adopting to malware classification. Empirical results on Malimg dataset indicate that our model achieves 99.03% in accuracy, and identify an unknown malware only taking 5.3 milliseconds on average.

Authored by Lingfeng Qiu, Shuo Wang, Jian Wang, Yifei Wang, Wei Huang

Markov Image with Transfer Learning for Malware Detection and Classification

Malware Classification - Malware attack is a severe problem that can cause a considerable loss. To prevent the malware attack, different malware detection and classification method have been implemented in recent years. This paper proposed a new method based on Markov image and transfer learning on machine learning. Also, an experience comparing the performance on malware detection and classification between the proposed and grayscale methods was done. The accuracy and loss of malware detection and classification by using the proposed method are 0.973 and 0.076, 0.987 and 0.062 respectively. The accuracy and loss of malware detection and classification using the grayscale method are 0.989 and 0.037, 0.973 and 0.202 respectively. Although the grayscale method has done better in malware detection, the proposed method's accuracy is over 0.97. Therefore, the result shows that the proposed method are suitable for malware detection and classification.

Authored by Lok Kwan

Malware Classification Based on GAF Visualization of Dynamic API Call Sequences

Malware Classification - Due to the constant updates of malware and its variants and the continuous development of malware obfuscation techniques. Malware intrusions targeting Windows hosts are also on the rise. Traditional static analysis methods such as signature matching mechanisms have been difficult to adapt to the detection of new malware. Therefore, a novel visual detection method of malware is proposed for first-time to convert the Windows API call sequence with sequential nature into feature images based on the Gramian Angular Field (GAF) idea, and train a neural network to identify malware. The experimental results demonstrate the effectiveness of our proposed method. For the binary classification of malware, the GAF visualization image of the API call sequence is compared with its original sequence. After GAF visualization, the classification accuracy of the classic machine learning model MLP is improved by 9.64%, and the classification accuracy of the deep learning model CNN is improved by 4.82%. Furthermore, our experiments show that the proposed method is also feasible and effective for the multi-class classification of malware.

Authored by Hongmei Zhang, Xiaoqian Yun, Xiaofang Deng, Xiaoxiong Zhong

BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec

Malware Classification - Rapid digitalisation spurred by the Covid-19 pandemic has resulted in more cyber crime. Malware-as-a-service is now a booming business for cyber criminals. With the surge in malware activities, it is vital for cyber defenders to understand more about the malware samples they have at hand as such information can greatly influence their next course of actions during a breach. Recently, researchers have shown how malware family classification can be done by first converting malware binaries into grayscale images and then passing them through neural networks for classification. However, most work focus on studying the impact of different neural network architectures on classification performance. In the last year, researchers have shown that augmenting supervised learning with self-supervised learning can improve performance. Even more recently, Data2Vec was proposed as a modality agnostic self-supervised framework to train neural networks. In this paper, we present BinImg2Vec, a framework of training malware binary image classifiers that incorporates both self-supervised learning and supervised learning to produce a model that consistently outperforms one trained only via supervised learning. We also show how our framework produces outputs that facilitate explanability.

Authored by Lee Sern, Tay Keng, Chua Fu

Malware Image Classification using VGG16

Malware Classification - Methodologies used for the detection of malicious applications can be broadly classified into static and dynamic analysis based approaches. With traditional signature-based methods, new variants of malware families cannot be detected. A combination of deep learning techniques along with image-based features is used in this work to classify malware. The data set used here is the ‘Malimg’ dataset, which contains a pictorial representation of well-known malware families. This paper proposes a methodology for identifying malware images and classifying them into various families. The classification is based on image features. The features are extracted using the pre-trained model namely VGG16. The samples of malware are depicted as byteplot grayscale images. Features are extracted employing the convolutional layer of a VGG16 deep learning network, which uses ImageNet dataset for the pre-training step. The features are used to train different classifiers which employ SVM, XGBoost, DNN and Random Forest for the classification task into different malware families. Using 9339 samples from 25 different malware families, we performed experimental evaluations and demonstrate that our approach is effective in identifying malware families with high accuracy.

Authored by K. Deepa, K. Adithyakumar, P. Vinod

Malware analysis and multi-label category detection issues: Ensemble-based approaches

Malware Analysis - Detection of malware and security attacks is a complex process that can vary in its details and analysis activities. As part of the detection process, malware scanners try to categorize a malware once it is detected under one of the known malware categories (e.g. worms, spywares, viruses, etc.). However, many studies and researches indicate problems with scanners categorizing or identifying a particular malware under more than one malware category. This paper, and several others, show that machine learning can be used for malware detection especially with ensemble base prediction methods. In this paper, we evaluated several custom-built ensemble models. We focused on multi-label malware classification as individual or classical classifiers showed low accuracy in such territory.This paper showed that recent machine models such as ensemble and deep learning can be used for malware detection with better performance in comparison with classical models. This is very critical in such a dynamic and yet important detection systems where challenges such as the detection of unknown or zero-day malware will continue to exist and evolve.

Authored by Izzat Alsmadi, Bilal Al-Ahmad, Mohammad Alsmadi

GNN-Based Malicious Network Entities Identification In Large-Scale Network Data

Malware Analysis and Graph Theory - A reliable database of Indicators of Compromise (IoC’s) is a cornerstone of almost every malware detection system. Building the database and keeping it up-to-date is a lengthy and often manual process where each IoC should be manually reviewed and labeled by an analyst. In this paper, we focus on an automatic way of identifying IoC’s intended to save analysts’ time and scale to the volume of network data. We leverage relations of each IoC to other entities on the internet to build a heterogeneous graph. We formulate a classification task on this graph and apply graph neural networks (GNNs) in order to identify malicious domains. Our experiments show that the presented approach provides promising results on the task of identifying high-risk malware as well as legitimate domains classification.

Authored by Stepan Dvorak, Pavel Prochazka, Lukas Bajer

CFGExplainer: Explaining Graph Neural Network-Based Malware Classification from Control Flow Graphs

Malware Analysis and Graph Theory - With the ever increasing threat of malware, extensive research effort has been put on applying Deep Learning for malware classification tasks. Graph Neural Networks (GNNs) that process malware as Control Flow Graphs (CFGs) have shown great promise for malware classification. However, these models are viewed as black-boxes, which makes it hard to validate and identify malicious patterns. To that end, we propose CFG-Explainer, a deep learning based model for interpreting GNN-oriented malware classification results. CFGExplainer identifies a subgraph of the malware CFG that contributes most towards classification and provides insight into importance of the nodes (i.e., basic blocks) within it. To the best of our knowledge, CFGExplainer is the first work that explains GNN-based mal-ware classification. We compared CFGExplainer against three explainers, namely GNNExplainer, SubgraphX and PGExplainer, and showed that CFGExplainer is able to identify top equisized subgraphs with higher classification accuracy than the other three models.

Authored by Jerome Herath, Priti Wakodikar, Ping Yang, Guanhua Yan

Representation Learning with Function Call Graph Transformations for Malware Open Set Recognition

Malware Analysis and Graph Theory - Open set recognition (OSR) problem has been a challenge in many machine learning (ML) applications, such as security. As new/unknown malware families occur regularly, it is difficult to exhaust samples that cover all the classes for the training process in ML systems. An advanced malware classification system should classify the known classes correctly while sensitive to the unknown class. In this paper, we introduce a self-supervised pre-training approach for the OSR problem in malware classification. We propose two transformations for the function call graph (FCG) based malware representations to facilitate the pretext task. Also, we present a statistical thresholding approach to find the optimal threshold for the unknown class. Moreover, the experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem.

Authored by Jingyun Jia, Philip Chan

Detection of Botnets in IoT Networks using Graph Theory and Machine Learning

Malware Analysis and Graph Theory - The Internet of things (IoT) is proving to be a boon in granting internet access to regularly used objects and devices. Sensors, programs, and other innovations interact and trade information with different gadgets and frameworks over the web. Even in modern times, IoT gadgets experience the ill effects of primary security threats, which expose them to many dangers and malware, one among them being IoT botnets. Botnets carry out attacks by serving as a vector and this has become one of the significant dangers on the Internet. These vectors act against associations and carry out cybercrimes. They are used to produce spam, DDOS attacks, click frauds, and steal confidential data. IoT gadgets bring various challenges unlike the common malware on PCs and Android devices as IoT gadgets have heterogeneous processor architecture. Numerous researches use static or dynamic analysis for detection and classification of botnets on IoT gadgets. Most researchers haven t addressed the multi-architecture issue and they use a lot of computing resources for analyzing. Therefore, this approach attempts to classify botnets in IoT by using PSI-Graphs which effectively addresses the problem of encryption in IoT botnet detection, tackles the multi-architecture problem, and reduces computation time. It proposes another methodology for describing and recognizing botnets utilizing graph-based Machine Learning techniques and Exploratory Data Analysis to analyze the data and identify how separable the data is to recognize bots at an earlier stage so that IoT devices can be prevented from being attacked.

Authored by Putsa Pranav, Sachin Verma, Sahana Shenoy, S. Saravanan