Malware Classification - With the rapid development of technology and the increase in the use of Android software, the number of malware has also increased. This study presents a classification as malware/goodware with the features of 4465 Android applications. Cost is an important problem for the increasing number of applications and the analyzes to be made on each application. This study focused on this problem with the hybrid use of Gray Wolf Optimization Algorithm (GWO) and Deep Neural Networks (DNN). With the use of GWO, both feature selection and the features of the model to be created with DNN are determined. In this way, an approximate solution proposal is presented for the most suitable features and the most suitable model design. The model, which was created with the use of GWO-DNN hybrid in this study, offers an F1 score of 99.74%.
Authored by Merve Güllü, Necattin Barişçi
Malware Classification - The past decades witness the development of various Machine Learning (ML) models for malware classification. Semantic representation is a crucial basis for these classifiers. This paper aims to assess the effect of semantic representation methods on malware classifier performance. Two commonly-used semantic representation methods including N-gram and GloVe. We utilize diverse ML classifiers to conduct comparative experiments to analyze the capability of N-gram, GloVe and image-based methods for malware classification. We also analyze deeply the reason why the GloVe can produce negative effects on malware static analysis.
Authored by Bingchu Jin, Zesheng Hu, Jianhua Wang, Monong Wei, Yawei Zhao, Chao Xue
Malware Classification - Automated malware classification assigns unknown malware to known families. Most research in malware classification assumes that the defender has access to the malware for analysis. Unfortunately, malware can delete itself after execution. As a result, analysts are only left with digital residue, such as network logs or remnant artifacts of malware in memory or on the file system. In this paper, a novel malware classification method based on the Windows prefetch mechanism is presented and evaluated, enabling analysts to classify malware without a corresponding executable. The approach extracts features from Windows prefetch files, a file system artifact that contains historical process information such as loaded libraries and process dependencies. Results show that classification using these features with two different algorithms garnered F-Scores between 0.80 and 0.82, offering analysts a viable option for forensic analysis.
Authored by Adam Duby, Teryl Taylor, Yanyan Zhuang
Malware Classification - Nowadays, increasing numbers of malicious programs are becoming a serious problem, which increases the need for automated detection and categorization of potential threats. These attacks often use undetected malware that is not recognized by the security vendor, making it difficult to protect the endpoints from viruses. Existing methods have been proposed to detect malware. However, as malware variations develop, they can lead to misdiagnosis and are difficult to diagnose accurately. To address this problem, in this work introduces a Recurrent Neural Network (RNN) to identify the malware or benign based on extract features using Information Gain Absolute Feature Selection (IGAFS) technique. First, Malware detection dataset is collected from kaggle repository. Then the proposed pre-process the dataset for removing null and noisy values to prepare the dataset. Next, the proposed Information Gain Absolute Feature Selection (IGAFS) technique is used to select most relevant features for malware from the pre-processed dataset. Selected features are trained into Recurrent Neural Network (RNN) method to classify as malware or not with better accuracy and false rate. The experimental result provides greater performance compared with previous methods.
Authored by Suresh Kumar, Umi B., Isa Mishra, Shitharth S., Diwakar Tripathi, Siva T.
Malware Classification - Mobile devices play a crucial role and have become an essential part of people's life particularly with online applications such as shopping, learning, mailing, etc. Android OS has continued to drive the market for other operating systems since 2012. Traditional Android malware detection methods, such as static, dynamic, hybrid analysis, or the Bayesian model, may show less accuracy to detect recent Android malware. We propose a deep learning method for Android malware detection using Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). CNN provides efficient feature extraction from data and the use of additional LSTM layers improves prediction accuracy. According to the test results, CNN-LSTM can provide reliable malware prediction in Android applications. We train and test our approach using the CICMalDroid2020 dataset. The test results show that the CNN-LSTM classifier exceeds with an accuracy of 94%.
Authored by Shakhnaz Amenova, Cemil Turan, Dinara Zharkynbek
Malware Classification - Traditional methods of malware detection have difficulty in detecting massive malware variants. Malware detection based on malware visualization has been proved an effective method for identifying unknown malware variants. In order to improve the accuracy and reduce the detection time of above methods, a novel method for malware classification in a light-weight CNN architecture named MalshuffleNet is proposed. The model is customized based on ShuffleNet V2 by adjusting the numbers of the fully connected layer for adopting to malware classification. Empirical results on Malimg dataset indicate that our model achieves 99.03% in accuracy, and identify an unknown malware only taking 5.3 milliseconds on average.
Authored by Lingfeng Qiu, Shuo Wang, Jian Wang, Yifei Wang, Wei Huang
Malware Classification - Malware attack is a severe problem that can cause a considerable loss. To prevent the malware attack, different malware detection and classification method have been implemented in recent years. This paper proposed a new method based on Markov image and transfer learning on machine learning. Also, an experience comparing the performance on malware detection and classification between the proposed and grayscale methods was done. The accuracy and loss of malware detection and classification by using the proposed method are 0.973 and 0.076, 0.987 and 0.062 respectively. The accuracy and loss of malware detection and classification using the grayscale method are 0.989 and 0.037, 0.973 and 0.202 respectively. Although the grayscale method has done better in malware detection, the proposed method's accuracy is over 0.97. Therefore, the result shows that the proposed method are suitable for malware detection and classification.
Authored by Lok Kwan
Malware Classification - Due to the constant updates of malware and its variants and the continuous development of malware obfuscation techniques. Malware intrusions targeting Windows hosts are also on the rise. Traditional static analysis methods such as signature matching mechanisms have been difficult to adapt to the detection of new malware. Therefore, a novel visual detection method of malware is proposed for first-time to convert the Windows API call sequence with sequential nature into feature images based on the Gramian Angular Field (GAF) idea, and train a neural network to identify malware. The experimental results demonstrate the effectiveness of our proposed method. For the binary classification of malware, the GAF visualization image of the API call sequence is compared with its original sequence. After GAF visualization, the classification accuracy of the classic machine learning model MLP is improved by 9.64%, and the classification accuracy of the deep learning model CNN is improved by 4.82%. Furthermore, our experiments show that the proposed method is also feasible and effective for the multi-class classification of malware.
Authored by Hongmei Zhang, Xiaoqian Yun, Xiaofang Deng, Xiaoxiong Zhong
Malware Classification - Rapid digitalisation spurred by the Covid-19 pandemic has resulted in more cyber crime. Malware-as-a-service is now a booming business for cyber criminals. With the surge in malware activities, it is vital for cyber defenders to understand more about the malware samples they have at hand as such information can greatly influence their next course of actions during a breach. Recently, researchers have shown how malware family classification can be done by first converting malware binaries into grayscale images and then passing them through neural networks for classification. However, most work focus on studying the impact of different neural network architectures on classification performance. In the last year, researchers have shown that augmenting supervised learning with self-supervised learning can improve performance. Even more recently, Data2Vec was proposed as a modality agnostic self-supervised framework to train neural networks. In this paper, we present BinImg2Vec, a framework of training malware binary image classifiers that incorporates both self-supervised learning and supervised learning to produce a model that consistently outperforms one trained only via supervised learning. We also show how our framework produces outputs that facilitate explanability.
Authored by Lee Sern, Tay Keng, Chua Fu
Malware Classification - Methodologies used for the detection of malicious applications can be broadly classified into static and dynamic analysis based approaches. With traditional signature-based methods, new variants of malware families cannot be detected. A combination of deep learning techniques along with image-based features is used in this work to classify malware. The data set used here is the ‘Malimg’ dataset, which contains a pictorial representation of well-known malware families. This paper proposes a methodology for identifying malware images and classifying them into various families. The classification is based on image features. The features are extracted using the pre-trained model namely VGG16. The samples of malware are depicted as byteplot grayscale images. Features are extracted employing the convolutional layer of a VGG16 deep learning network, which uses ImageNet dataset for the pre-training step. The features are used to train different classifiers which employ SVM, XGBoost, DNN and Random Forest for the classification task into different malware families. Using 9339 samples from 25 different malware families, we performed experimental evaluations and demonstrate that our approach is effective in identifying malware families with high accuracy.
Authored by K. Deepa, K. Adithyakumar, P. Vinod
Malware Analysis - The rapid development of network information technology, individual’s information networks security has become a very critical issue in our daily life. Therefore, it is necessary to study the malware propagation model system. In this paper, the traditional integer order malware propagation model system is extended to the field of fractional-order. Then we analyze the asymptotic stability of the fractional-order malware propagation model system when the equilibrium point is the origin and the time delay is 0. Next, the asymptotic stability and bifurcation analysis of the fractional-order malware propagation model system when the equilibrium point is the origin and the time delay is not 0 are carried out. Moreover, we study the asymptotic stability of the fractional-order malware propagation model system with an interior equilibrium point. In the end, so as to verify our theoretical results, many numerical simulations are provided.
Authored by Zhe Zhang, Yaonan Wang, Jing Zhang, Xu Xiao
Malware Analysis - Detection of malware and security attacks is a complex process that can vary in its details and analysis activities. As part of the detection process, malware scanners try to categorize a malware once it is detected under one of the known malware categories (e.g. worms, spywares, viruses, etc.). However, many studies and researches indicate problems with scanners categorizing or identifying a particular malware under more than one malware category. This paper, and several others, show that machine learning can be used for malware detection especially with ensemble base prediction methods. In this paper, we evaluated several custom-built ensemble models. We focused on multi-label malware classification as individual or classical classifiers showed low accuracy in such territory.This paper showed that recent machine models such as ensemble and deep learning can be used for malware detection with better performance in comparison with classical models. This is very critical in such a dynamic and yet important detection systems where challenges such as the detection of unknown or zero-day malware will continue to exist and evolve.
Authored by Izzat Alsmadi, Bilal Al-Ahmad, Mohammad Alsmadi
Malware Analysis - Android malware is continuously evolving at an alarming rate due to the growing vulnerabilities. This demands more effective malware detection methods. This paper presents DynaMalDroid, a dynamic analysis-based framework to detect malicious applications in the Android platform. The proposed framework contains three modules: dynamic analysis, feature engineering, and detection. We utilized the well-known CICMalDroid2020 dataset, and the system calls of apps are extracted through dynamic analysis. We trained our proposed model to recognize malware by selecting features obtained through the feature engineering module. Further, with these selected features, the detection module applies different Machine Learning classifiers like Random Forest, Decision Tree, Logistic Regression, Support Vector Machine, Naïve-Bayes, K-Nearest Neighbour, and AdaBoost, to recognize whether an application is malicious or not. The experiments have shown that several classifiers have demonstrated excellent performance and have an accuracy of up to 99\%. The models with Support Vector Machine and AdaBoost classifiers have provided better detection accuracy of 99.3\% and 99.5\%, respectively.
Authored by Hashida Manzil, Manohar S
Malware Analysis - Malware attacks in the cyber world continue to increase despite the efforts of Malware analysts to combat this problem. Recently, Malware samples have been presented as binary sequences and assembly codes. However, most researchers focus only on the raw Malware sequence in their proposed solutions, ignoring that the assembly codes may contain important details that enable rapid Malware detection. In this work, we leveraged the capabilities of deep autoencoders to investigate the presence of feature disparities in the assembly and raw binary Malware samples. First, we treated the task as outliers to investigate whether the autoencoder would identify and justify features as samples from the same family. Second, we added noise to all samples and used Deep Autoencoder to reconstruct the original samples by denoising. Experiments with the Microsoft Malware dataset showed that the byte samples features differed from the assembly code samples.
Authored by Muhammed Abdullah, Yongbin Yu, Jingye Cai, Yakubu Imrana, Nartey Tettey, Daniel Addo, Kwabena Sarpong, Bless Lord Y. Agbley, Benjamin Appiah
Malware Analysis - The rising use of smartphones each year is matched by the development of the smartphone s operating system, Android. Due to the immense popularity of the Android operating system, many unauthorized users (in this case, the attackers) wish to exploit this vulnerability to get sensitive data from every Android user. The flubot malware assault, which happened in 2021 and targeted Android devices practically globally, is one of the attacks on Android smartphones. It was known at the time that the flubot virus stole information, particularly from banking applications installed on the victim s device. To prevent this from happening again, we research the signature and behavior of flubot malware. In this study, a hybrid analysis will be conducted on three samples of flubot malware that are available on the open-source Hatching Triage platform. Using the Android Virtual Device (AVD) as the primary environment for malware installation, the analysis was conducted with the Android Debug Bridge (ADB) and Burpsuite as supporting tools for dynamic analysis. During the static analysis, the Mobile Security Framework (MobSF) and the Bytecode Viewer were used to examine the source code of the three malware samples. Analysis of the flubot virus revealed that it extracts or drops dex files on the victim s device, where the file is the primary malware. The Flubot virus will clone the messaging application or Short Message Service (SMS) on the default device. Additionally, we discovered a form of flubot malware that operates as a Domain Generation Algorithm (DGA) and communicates with its Command and Control (C\&C) server.
Authored by Hanifah Salsabila, Syafira Mardhiyah, Raden Hadiprakoso
Malware Analysis - The effective security system improvement from malware attacks on the Android operating system should be updated and improved. Effective malware detection increases the level of data security and high protection for the users. Malicious software or malware typically finds a means to circumvent the security procedure, even when the user is unaware whether the application can act as malware. The effectiveness of obfuscated android malware detection is evaluated by collecting static analysis data from a data set. The experiment assesses the risk level of which malware dataset using the hash value of the malware and records malware behavior. A set of hash SHA256 malware samples has been obtained from an internet dataset and will be analyzed using static analysis to record malware behavior and evaluate which risk level of the malware. According to the results, most of the algorithms provide the same total score because of the multiple crime inside the malware application.
Authored by Teddy Mantoro, Muhammad Fahriza, Muhammad Bhakti
Malware Analysis - Malwares are designed to cause harm to the machine without the user s knowledge. Malwares belonging to different families infect the system in its own unique way causing damage which could be irreversible and hence there is a need to detect and analyse the malwares. Manual analysis of all types of malwares is not a practical approach due to the huge effort involved and hence Automated Malware Analysis is resorted to so that the burden on humans can be decreased and the process is made robust. A lot of Automated Malware Analysis tools are present right now both offline and online but the problem arises as to which tool to select while analysing a suspicious binary. A comparative analysis of three most widely used automated tools has been done with different malware class samples. These tools are Cuckoo Sandbox, Any. Run and Intezer Analyze. In order to check the efficacy of the tool in both online and offline analysis, Cuckoo Sandbox was configured for offline use, and Any. Run and Intezer Analyze were configured for online analysis. Individual tools analyse each malware sample and after analysis is completed, a comparative chart is prepared to determine which tool is good at finding registry changes, processes created, files created, network connections, etc by the malicious binary. The findings conclude that Intezer Analyze tool recognizes file changes better than others but otherwise Cuckoo Sandbox and Any. Run tools are better in determining other functionalities.
Authored by Preeti, Animesh Agrawal
Malware Analysis - The static and dynamic malware analysis are used by industrialists and academics to understand malware capabilities and threat level. The antimalware industries calculate malware threat levels using different techniques which involve human involvement and a large number of resources and analysts. As malware complexity, velocity and volume increase, it becomes impossible to allocate so many resources. Due to this reason, it is projected that the number of malware apps will continue to rise, and that more devices will be targeted in order to commit various sorts of cybercrime. It is therefore necessary to develop techniques that can calculate the damage or threat posed by malware automatically as soon as it is identified. In this way, early warnings about zero-day (unknown) malware can assist in allocating resources for carrying out a close analysis of it as soon as it is identified. In this paper, a fuzzy modelling approach is described for calculating the potential risk of malicious programs through static malware analysis.
Authored by Meghna Dhalaria, Ekta Gandotra
Malware Analysis - Any software that runs malicious payloads on victims’ computers is referred to as malware. It is an increasing threat that costs people, businesses, and organizations a lot of money. Attacks on security have developed significantly in recent years. Malware may infiltrate both offline and online media, like: chat, SMS, and spam (email, or social media), because it has a built-in defensive mechanism and may conceal itself from antivirus software or even corrupt it. As a result, there is an urgent need to detect and prevent malware before it damages critical assets around the world. In fact, there are lots of different techniques and tools used to combat versus malware. In this paper, the malware samples were analyzing in the Virtual Box environment using in-depth analysis based on reverse engineering using advanced static malware analysis techniques. The results Obtained from malware analysis which represent a set of valuable information, all anti-malware and anti-virus program companies need for in order to update their products.
Authored by Maher Ismael, Karam Thanoon
Malware Analysis - This document addresses the issue of the actual security level of PDF documents. Two types of detection approaches are utilized to detect dangerous elements within malware: static analysis and dynamic analysis. Analyzing malware binaries to identify dangerous strings, as well as reverse-engineering is included in static analysis for t1he malware to disassemble it. On the other hand, dynamic analysis monitors malware activities by running them in a safe environment, such as a virtual machine. Each method has its own set of strengths and weaknesses, and it is usually best to employ both methods while analyzing malware. Malware detection could be simplified without sacrificing accuracy by reducing the number of malicious traits. This may allow the researcher to devote more time to analysis. Our worry is that there is no obvious need to identify malware with numerous functionalities when it isn t necessary. We will solve this problem by developing a system that will identify if the given file is infected with malware or not.
Authored by Md Khalil, Vivek, Kumar Anand, Antarlina Paul, Rahul Grover
Information Reuse and Security - New malware increasingly adopts novel fileless techniques to evade detection from antivirus programs. Process injection is one of the most popular fileless attack techniques. This technique makes malware more stealthy by writing malicious code into memory space and reusing the name and port of the host process. It is difficult for traditional security software to detect and intercept process injections due to the stealthiness of its behavior. We propose a novel framework called ProcGuard for detecting process injection behaviors. This framework collects sensitive function call information of typical process injection. Then we perform a fine-grained analysis of process injection behavior based on the function call chain characteristics of the program, and we also use the improved RCNN network to enhance API analysis on the tampered memory segments. We combine API analysis with deep learning to determine whether a process injection attack has been executed. We collect a large number of malicious samples with process injection behavior and construct a dataset for evaluating the effectiveness of ProcGuard. The experimental results demonstrate that it achieves an accuracy of 81.58\% with a lower false-positive rate compared to other systems. In addition, we also evaluate the detection time and runtime performance loss metrics of ProcGuard, both of which are improved compared to previous detection tools.
Authored by Juan Wang, Chenjun Ma, Ziang Li, Huanyu Yuan, Jie Wang
Malware Analysis and Graph Theory - A reliable database of Indicators of Compromise (IoC’s) is a cornerstone of almost every malware detection system. Building the database and keeping it up-to-date is a lengthy and often manual process where each IoC should be manually reviewed and labeled by an analyst. In this paper, we focus on an automatic way of identifying IoC’s intended to save analysts’ time and scale to the volume of network data. We leverage relations of each IoC to other entities on the internet to build a heterogeneous graph. We formulate a classification task on this graph and apply graph neural networks (GNNs) in order to identify malicious domains. Our experiments show that the presented approach provides promising results on the task of identifying high-risk malware as well as legitimate domains classification.
Authored by Stepan Dvorak, Pavel Prochazka, Lukas Bajer
Malware Analysis and Graph Theory - The rapidly increasing malware threats must be coped with new effective malware detection methodologies. Current malware threats are not limited to daily personal transactions but dowelled deeply within large enterprises and organizations. This paper introduces a new methodology for detecting and discriminating malicious versus normal applications. In this paper, we employed Ant-colony optimization to generate two behavioural graphs that characterize the difference in the execution behavior between malware and normal applications. Our proposed approach relied on the API call sequence generated when an application is executed. We used the API calls as one of the most widely used malware dynamic analysis features. Our proposed method showed distinctive behavioral differences between malicious and non-malicious applications. Our experimental results showed a comparative performance compared to other machine learning methods. Therefore, we can employ our method as an efficient technique in capturing malicious applications.
Authored by Eslam Amer, Adham Samir, Hazem Mostafa, Amer Mohamed, Mohamed Amin
Malware Analysis and Graph Theory - This paper provides an in-depth analysis of Android malware that bypassed the strictest defenses of the Google Play application store and penetrated the official Android market between January 2016 and July 2021. We systematically identified 1,238 such malicious applications, grouped them into 134 families, and manually analyzed one application from 105 distinct families. During our manual analysis, we identified malicious payloads the applications execute, conditions guarding execution of the payloads, hiding techniques applications employ to evade detection by the user, and other implementation-level properties relevant for automated malware detection. As most applications in our dataset contain multiple payloads, each triggered via its own complex activation logic, we also contribute a graph-based representation showing activation paths for all application payloads in form of a control- and data-flow graph. Furthermore, we discuss the capabilities of existing malware detection tools, put them in context of the properties observed in the analyzed malware, and identify gaps and future research directions. We believe that our detailed analysis of the recent, evasive malware will be of interest to researchers and practitioners and will help further improve malware detection tools.
Authored by Michael Cao, Khaled Ahmed, Julia Rubin
Malware Analysis and Graph Theory - With the ever increasing threat of malware, extensive research effort has been put on applying Deep Learning for malware classification tasks. Graph Neural Networks (GNNs) that process malware as Control Flow Graphs (CFGs) have shown great promise for malware classification. However, these models are viewed as black-boxes, which makes it hard to validate and identify malicious patterns. To that end, we propose CFG-Explainer, a deep learning based model for interpreting GNN-oriented malware classification results. CFGExplainer identifies a subgraph of the malware CFG that contributes most towards classification and provides insight into importance of the nodes (i.e., basic blocks) within it. To the best of our knowledge, CFGExplainer is the first work that explains GNN-based mal-ware classification. We compared CFGExplainer against three explainers, namely GNNExplainer, SubgraphX and PGExplainer, and showed that CFGExplainer is able to identify top equisized subgraphs with higher classification accuracy than the other three models.
Authored by Jerome Herath, Priti Wakodikar, Ping Yang, Guanhua Yan