Publications | Science of Security Virtual Organization

Evaluating Opcodes for Detection of Obfuscated Android Malware

Obfuscation refers to changing the structure of code in a way that original semantics can be hidden. These techniques are often used by application developers for code hardening but it has been found that obfuscation techniques are widely used by malware developers in order to hide the work flow and semantics of malicious code. Class Encryption, Code Re-Ordering, Junk Code insertion and Control Flow modifications are Code Obfuscation techniques. In these techniques, code of the application is changed. These techniques change the signature of the application and also affect the systems that use sequence of instructions in order to detect maliciousness of an application. In this paper an ’Opcode sequence’ based detection system is designed and tested against obfuscated samples. It has been found that the system works efficiently for the detection of non obfuscated samples but the performance is effected significantly against obfuscated samples. The study tests different code obfuscation schemes and reports the effect of each on sequential opcode based analytic system.

Authored by Saneeha Khalid, Faisal Hussain

Designing a Framework of an Integrated Network and Security Operation Center: A Convergence Approach

Cyber-security incidents have grown significantly in modern networks, far more diverse and highly destructive and disruptive. According to the 2021 Cyber Security Statistics Report [1], cybercrime is up 600% during this COVID pandemic, the top attacks are but are not confined to (a) sophisticated phishing emails, (b) account and DNS hijacking, (c) targeted attacks using stealth and air gap malware, (d) distributed denial of services (DDoS), (e) SQL injection. Additionally, 95% of cyber-security breaches result from human error, according to Cybint Report [2]. The average time to identify a breach is 207 days as per Ponemon Institute and IBM, 2022 Cost of Data Breach Report [3]. However, various preventative controls based on cyber-security risk estimation and awareness results decrease most incidents, but not all. Further, any incident detection delay and passive actions to cyber-security incidents put the organizational assets at risk. Therefore, the cyber-security incident management system has become a vital part of the organizational strategy. Thus, the authors propose a framework to converge a "Security Operation Center" (SOC) and a "Network Operations Center" (NOC) in an "Integrated Network Security Operation Center" (INSOC), to overcome cyber-threat detection and mitigation inefficiencies in the near-real-time scenario. We applied the People, Process, Technology, Governance and Compliance (PPTGC) approach to develop the INSOC conceptual framework, according to the requirements we formulated for its operation [4], [5]. The article briefly describes the INSOC conceptual framework and its usefulness, including the central area of the PPTGC approach while designing the framework.

Authored by Deepesh Shahjee, Nilesh Ware

Design and Implementation of System for URL Signature Construction and Impact Assessment

The attacker’s server plays an important role in sending attack orders and receiving stolen information, particularly in the more recent cyberattacks. Under these circumstances, it is important to use network-based signatures to block malicious communications in order to reduce the damage. However, in addition to blocking malicious communications, signatures are also required not to block benign communications during normal business operations. Therefore, the generation of signatures requires a high level of understanding of the business, and highly depends on individual skills. In addition, in actual operation, it is necessary to test whether the generated signatures do not interfere with benign communications, which results in high operational costs. In this paper, we propose SIGMA, a system that automatically generates signatures to block malicious communication without interfering with benign communication and then automatically evaluates the impact of the signatures. SIGMA automatically extracts the common parts of malware communication destinations by clustering them and generates multiple candidate signatures. After that, SIGMA automatically calculates the impact on normal communication based on business logs, etc., and presents the final signature to the analyst, which has the highest blockability of malicious communication and non-blockability of normal communication. Our objectives with this system are to reduce the human factor in generating the signatures, reduce the cost of the impact evaluation, and support the decision of whether to apply the signatures. In the preliminary evaluation, we showed that SIGMA can automatically generate a set of signatures that detect 100% of suspicious URLs with an over-detection rate of just 0.87%, using the results of 14,238 malware analyses and actual business logs. This result suggests that the cost for generation of signatures and the evaluation of their impact on business operations can be suppressed, which used to be a time-consuming and human-intensive process.

Authored by Shota Fujii, Nobutaka Kawaguchi, Shoya Kojima, Tomoya Suzuki, Toshihiro Yamauchi

NBP-MS: Malware Signature Generation Based on Network Behavior Profiling

With the proliferation of malware, the detection and classification of malware have been hot topics in the academic and industrial circles of cyber security, and the generation of malware signatures is one of the important research directions. In this paper, we propose NBP-MS, a method of signature generation that is based on network traffic generated by malware. Specifically, we utilize the network traffic generated by malware to perform fine-grained profiling of its network behaviors first, and then cluster all the profiles to generate network behavior signatures to classify malware, providing support for subsequent analysis and defense.

Authored by Zhixin Shi, Xiangyu Wang, Pengcheng Liu

Automation Detection of Malware and Stenographical Content using Machine Learning

In recent times, the occurrence of malware attacks are increasing at an unprecedented rate. Particularly, the image-based malware attacks are spreading worldwide and many people get harmful malware-based images through the technique called steganography. In the existing system, only open malware and files from the internet can be identified. However, the image-based malware cannot be identified and detected. As a result, so many phishers make use of this technique and exploit the target. Social media platforms would be totally harmful to the users. To avoid these difficulties, Machine learning can be implemented to find the steganographic malware images (contents). The proposed methodology performs an automatic detection of malware and steganographic content by using Machine Learning. Steganography is used to hide messages from apparently innocuous media (e.g., images), and steganalysis is the approach used for detecting this malware. This research work proposes a machine learning (ML) approach to perform steganalysis. In the existing system, only open malware and files from the internet are identified but in the recent times many people get harmful malware-based images through the technique called steganography. Social media platforms would be totally harmful to the users. To avoid these difficulties, the proposed Machine learning has been developed to appropriately detect the steganographic malware images (contents). Father, the steganalysis method using machine learning has been developed for performing logistic classification. By using this, the users can avoid sharing the malware images in social media platforms like WhatsApp, Facebook without downloading it. It can be also used in all the photo-sharing sites such as google photos.

Authored by Henry Samuel, Santhanam Kumar, R. Aishwarya, G. Mathivanan

CR-Spectre: Defense-Aware ROP Injected Code-Reuse Based Dynamic Spectre

Side-channel attacks have been a constant threat to computing systems. In recent times, vulnerabilities in the architecture were discovered and exploited to mount and execute a state-of-the-art attack such as Spectre. The Spectre attack exploits a vulnerability in the Intel-based processors to leak confidential data through the covert channel. There exist some defenses to mitigate the Spectre attack. Among multiple defenses, hardware-assisted attack/intrusion detection (HID) systems have received overwhelming response due to its low overhead and efficient attack detection. The HID systems deploy machine learning (ML) classifiers to perform anomaly detection to determine whether the system is under attack. For this purpose, a performance monitoring tool profiles the applications to record hardware performance counters (HPC), utilized for anomaly detection. Previous HID systems assume that the Spectre is executed as a standalone application. In contrast, we propose an attack that dynamically generates variations in the injected code to evade detection. The attack is injected into a benign application. In this manner, the attack conceals itself as a benign application and gen-erates perturbations to avoid detection. For the attack injection, we exploit a return-oriented programming (ROP)-based code-injection technique that reuses the code, called gadgets, present in the exploited victim's (host) memory to execute the attack, which, in our case, is the CR-Spectre attack to steal sensitive data from a target victim (target) application. Our work focuses on proposing a dynamic attack that can evade HID detection by injecting perturbations, and its dynamically generated variations thereof, under the cloak of a benign application. We evaluate the proposed attack on the MiBench suite as the host. From our experiments, the HID performance degrades from 90% to 16%, indicating our Spectre-CR attack avoids detection successfully.

Authored by Abhijitt Dhavlle, Setareh Rafatirad, Houman Homayoun, Sai Dinakarrao

Dynamic malicious code detection technology based on deep learning

In this paper, the malicious code is run in the sandbox in a safe and controllable environment, the API sequence is deduplicated by the idea of the longest common subsequence, and the CNN and Bi-LSTM are integrated to process and analyze the API sequence. Compared with the method, the method using deep learning can have higher accuracy and work efficiency.

Authored by Lizhuo Wei, Fengkai Xu, Ni Zhang, Wei Yan, Chuchu Chai

Countermeasure Against Anti-Sandbox Technology Based on Activity Recognition

In order to prevent malicious environment, more and more applications use anti-sandbox technology to detect the running environment. Malware often uses this technology against analysis, which brings great difficulties to the analysis of applications. Research on anti-sandbox countermeasure technology based on application virtualization can solve such problems, but there is no good solution for sensor simulation. In order to prevent detection, most detection systems can only use real device sensors, which brings great hidden dangers to users’ privacy. Aiming at this problem, this paper proposes and implements a sensor anti-sandbox countermeasure technology for Android system. This technology uses the CNN-LSTM model to identify the activity of the real machine sensor data, and according to the recognition results, the real machine sensor data is classified and stored, and then an automatic data simulation algorithm is designed according to the stored data, and finally the simulation data is sent back by using the Hook technology for the application under test. The experimental results show that the method can effectively simulate the data characteristics of the acceleration sensor and prevent the triggering of anti-sandbox behaviors.

Authored by Jin Yang, Yunqing Liu

Threat Detection and Response in Linux Endpoints

We demonstrate an in-house built Endpoint Detection and Response (EDR) for linux systems using open-sourced tools like Osquery and Elastic. The advantage of building an in-house EDR tools against using commercial EDR tools provides both the knowledge and the technical capability to detect and investigate security incidents. We discuss the architecture of the tools and advantages it offers. Specifically, in our method all the endpoint logs are collected at a common server which we leverage to perform correlation between events happening on different endpoints and automatically detect threats like pivoting and lateral movements. We discuss various attacks that can be detected by our tool.

Authored by Shubham Agarwal, Arjun Sable, Devesh Sawant, Sunil Kahalekar, Manjesh Hanawal

A Survey on Mobile Malware Detection Methods using Machine Learning

The prevalence of mobile devices (smartphones) along with the availability of high-speed internet access world-wide resulted in a wide variety of mobile applications that carry a large amount of confidential information. Although popular mobile operating systems such as iOS and Android constantly increase their defenses methods, data shows that the number of intrusions and attacks using mobile applications is rising continuously. Experts use techniques to detect malware before the malicious application gets installed, during the runtime or by the network traffic analysis. In this paper, we first present the information about different categories of mobile malware and threats; then, we classify the recent research methods on mobile malware traffic detection.

Authored by Mina Kambar, Armin Esmaeilzadeh, Yoohwan Kim, Kazem Taghva

Ransomware Classification and Detection With Machine Learning Algorithms

Malicious attacks, malware, and ransomware families pose critical security issues to cybersecurity, and it may cause catastrophic damages to computer systems, data centers, web, and mobile applications across various industries and businesses. Traditional anti-ransomware systems struggle to fight against newly created sophisticated attacks. Therefore, state-of-the-art techniques like traditional and neural network-based architectures can be immensely utilized in the development of innovative ransomware solutions. In this paper, we present a feature selection-based framework with adopting different machine learning algorithms including neural network-based architectures to classify the security level for ransomware detection and prevention. We applied multiple machine learning algorithms: Decision Tree (DT), Random Forest (RF), Naïve Bayes (NB), Logistic Regression (LR) as well as Neural Network (NN)-based classifiers on a selected number of features for ransomware classification. We performed all the experiments on one ransomware dataset to evaluate our proposed framework. The experimental results demonstrate that RF classifiers outperform other methods in terms of accuracy, F -beta, and precision scores.

Authored by Mohammad Masum, Md Faruk, Hossain Shahriar, Kai Qian, Dan Lo, Muhaiminul Adnan

Improving the Prediction Accuracy with Feature Selection for Ransomware Detection

This paper presents the machine learning algorithm to detect whether an executable binary is benign or ransomware. The ransomware cybercriminals have targeted our infrastructure, businesses, and everywhere which has directly affected our national security and daily life. Tackling the ransomware threats more effectively is a big challenge. We applied a machine-learning model to classify and identify the security level for a given suspected malware for ransomware detection and prevention. We use the feature selection data preprocessing to improve the prediction accuracy of the model.

Authored by Chulan Gao, Hossain Shahriar, Dan Lo, Yong Shi, Kai Qian

Detection of Malware in UHF RFID User Memory Bank using Random Forest Classifier on Signal Strength Data in the Frequency Domain

A method of detecting UHF RFID tags with SQL in-jection virus code written in its user memory bank is explored. A spectrum analyzer took signal strength readings in the frequency spectrum while an RFID reader was reading the tag. The strength of the signal transmitted by the RFID tag in the UHF range, more specifically within the 902–908 MHz sub-band, was used as data to train a Random Forest model for Malware detection. Feature reduction is accomplished by dividing the observed spectrum into 15 ranges with a bandwidth of 344 kHz each and detecting the number of maxima in each range. The malware-infested tag could be detected more than 80% of the time. The frequency ranges contributing most in this detection method were the low (903.451-903.795 MHz, 902.418-902.762 MHz) and high (907.238-907.582 MHz) bands in the observed spectrum.

Authored by Shah Hasnaeen, Andrew Chrysler

An Approach for P2P Based Botnet Detection Using Machine Learning

The internet has developed and transformed the world dramatically in recent years, which has resulted in several cyberattacks. Cybersecurity is one of society’s most serious challenge, costing millions of dollars every year. The research presented here will look into this area, focusing on malware that can establish botnets, and in particular, detecting connections made by infected workstations connecting with the attacker’s machine. In recent years, the frequency of network security incidents has risen dramatically. Botnets have previously been widely used by attackers to carry out a variety of malicious activities, such as compromising machines to monitor their activities by installing a keylogger or sniffing traffic, launching Distributed Denial of Service (DDOS) attacks, stealing the identity of the machine or credentials, and even exfiltrating data from the user’s computer. Botnet detection is still a work in progress because no one approach exists that can detect a botnet’s whole ecosystem. A detailed analysis of a botnet, discuss numerous parameter’s result of detection methods related to botnet attacks, as well as existing work of botnet identification in field of machine learning are discuss here. This paper focuses on the comparative analysis of various classifier based on design of botnet detection technique which are able to detect P2P botnet using machine learning classifier.

Authored by Priyanka Tikekar, Swati Sherekar, Vilas Thakre

Analyzing Machine Learning-based Feature Selection for Botnet Detection

In this cyber era, the number of cybercrime problems grows significantly, impacting network communication security. Some factors have been identified, such as malware. It is a malicious code attack that is harmful. On the other hand, a botnet can exploit malware to threaten whole computer networks. Therefore, it needs to be handled appropriately. Several botnet activity detection models have been developed using a classification approach in previous studies. However, it has not been analyzed about selecting features to be used in the learning process of the classification algorithm. In fact, the number and selection of features implemented can affect the detection accuracy of the classification algorithm. This paper proposes an analysis technique for determining the number and selection of features developed based on previous research. It aims to obtain the analysis of using features. The experiment has been conducted using several classification algorithms, namely Decision tree, k-NN, Naïve Bayes, Random Forest, and Support Vector Machine (SVM). The results show that taking a certain number of features increases the detection accuracy. Compared with previous studies, the results obtained show that the average detection accuracy of 98.34% using four features has the highest value from the previous study, 97.46% using 11 features. These results indicate that the selection of the correct number and features affects the performance of the botnet detection model.

Authored by Winda Safitri, Tohari Ahmad, Dandy Hostiadi

Android Malware Detection Based on Heterogeneous Information Network with Cross-Layer Features

As a mature and open mobile operating system, Android runs on many IoT devices, which has led to Android-based IoT devices have become a hotbed of malware. Existing static detection methods for malware using artificial intelligence algorithms focus only on the java code layer when extracting API features, however there is a lot of malicious behavior involving native layer code. Thus, to make up for the neglect of the native code layer, we propose a heterogeneous information network-based Android malware detection method with cross-layer features. We first translate the semantic information of apps and API calls into the form of meta-paths, and construct the adjacency of apps based on API calls, then combine information from different meta-paths using multi-core learning. We implemented our method on the dataset from VirusShare and AndroZoo, and the experimental results show that the accuracy of our method is 93.4%, which is at least 2% higher than other related methods using heterogeneous information networks for malware detection.

Authored by Ren Xixuan, Zhao Lirui, Wang Kai, Xue Zhixing, Hou Anran, Shao Qiao

Cyber Automated Network Resilience Defensive Approach against Malware Images

Cyber threats have been a major issue in the cyber security domain. Every hacker follows a series of cyber-attack stages known as cyber kill chain stages. Each stage has its norms and limitations to be deployed. For a decade, researchers have focused on detecting these attacks. Merely watcher tools are not optimal solutions anymore. Everything is becoming autonomous in the computer science field. This leads to the idea of an Autonomous Cyber Resilience Defense algorithm design in this work. Resilience has two aspects: Response and Recovery. Response requires some actions to be performed to mitigate attacks. Recovery is patching the flawed code or back door vulnerability. Both aspects were performed by human assistance in the cybersecurity defense field. This work aims to develop an algorithm based on Reinforcement Learning (RL) with a Convoluted Neural Network (CNN), far nearer to the human learning process for malware images. RL learns through a reward mechanism against every performed attack. Every action has some kind of output that can be classified into positive or negative rewards. To enhance its thinking process Markov Decision Process (MDP) will be mitigated with this RL approach. RL impact and induction measures for malware images were measured and performed to get optimal results. Based on the Malimg Image malware, dataset successful automation actions are received. The proposed work has shown 98% accuracy in the classification, detection, and autonomous resilience actions deployment.

Authored by Kainat Rizwan, Mudassar Ahmad, Muhammad Habib

Data Sanitization Approach to Mitigate Clean-Label Attacks Against Malware Detection Systems

Machine learning (ML) models are increasingly being used in the development of Malware Detection Systems. Existing research in this area primarily focuses on developing new architectures and feature representation techniques to improve the accuracy of the model. However, recent studies have shown that existing state-of-the art techniques are vulnerable to adversarial machine learning (AML) attacks. Among those, data poisoning attacks have been identified as a top concern for ML practitioners. A recent study on clean-label poisoning attacks in which an adversary intentionally crafts training samples in order for the model to learn a backdoor watermark was shown to degrade the performance of state-of-the-art classifiers. Defenses against such poisoning attacks have been largely under-explored. We investigate a recently proposed clean-label poisoning attack and leverage an ensemble-based Nested Training technique to remove most of the poisoned samples from a poisoned training dataset. Our technique leverages the relatively large sensitivity of poisoned samples to feature noise that disproportionately affects the accuracy of a backdoored model. In particular, we show that for two state-of-the art architectures trained on the EMBER dataset affected by the clean-label attack, the Nested Training approach improves the accuracy of backdoor malware samples from 3.42% to 93.2%. We also show that samples produced by the clean-label attack often successfully evade malware classification even when the classifier is not poisoned during training. However, even in such scenarios, our Nested Training technique can mitigate the effect of such clean-label-based evasion attacks by recovering the model's accuracy of malware detection from 3.57% to 93.2%.

Authored by Samson Ho, Achyut Reddy, Sridhar Venkatesan, Rauf Izmailov, Ritu Chadha, Alina Oprea