Publications | Science of Security Virtual Organization

Adversarial Audio Detection Method Based on Transformer

Speech recognition technology has been applied to all aspects of our daily life, but it faces many security issues. One of the major threats is the adversarial audio examples, which may tamper the recognition results of the acoustic speech recognition system (ASR). In this paper, we propose an adversarial detection framework to detect adversarial audio examples. The method is based on the transformer self-attention mechanism. Spectrogram features are extracted from the audio and divided into patches. Position information are embedded and then fed into transformer encoder. Experimental results show that the method achieves good performance with the detection accuracy of above 96.5% under the white-box attacks and blackbox attacks, and noisy circumstances. Even when detecting adversarial examples generated by the unknown attacks, it also achieves satisfactory results.

Authored by Yunchen Li, Da Luo

A Step Towards Improvement in Classical Honeypot Security System

Data security is a vast term that doesn’t have any limits, but there are a certain amount of tools and techniques that could help in gaining security. Honeypot is among one of the tools that are designated and designed to protect the security of a network but in a very dissimilar manner. It is a system that is designed and developed to be compromised and exploited. Honeypots are meant to lure the invaders, but due to advancements in computing systems parallelly, the intruding technologies are also attaining their gigantic influence. In this research work, an approach involving apache-spark (a Big Data Technique) would be introduced in order to use it with the Honeypot System. This work includes an extensive study based on several research papers, through which elaborated experiment-based result has been expressed on the best known open-source honeypot systems. The preeminent possible method of using The Honeypot with apache spark in the sequential channel would also be proposed with the help of a framework diagram.

Authored by Akshay Mudgal, Shaveta Bhatia

The Analysis and Development of an XAI Process on Feature Contribution Explanation

Explainable Artificial Intelligence (XAI) research focuses on effective explanation techniques to understand and build AI models with trust, reliability, safety, and fairness. Feature importance explanation summarizes feature contributions for end-users to make model decisions. However, XAI methods may produce varied summaries that lead to further analysis to evaluate the consistency across multiple XAI methods on the same model and data set. This paper defines metrics to measure the consistency of feature contribution explanation summaries under feature importance order and saliency map. Driven by these consistency metrics, we develop an XAI process oriented on the XAI criterion of feature importance, which performs a systematical selection of XAI techniques and evaluation of explanation consistency. We demonstrate the process development involving twelve XAI methods on three topics, including a search ranking system, code vulnerability detection and image classification. Our contribution is a practical and systematic process with defined consistency metrics to produce rigorous feature contribution explanations.

Authored by Jun Huang, Zerui Wang, Ding Li, Yan Liu

Blockchain Technology and its Impact on Future of Internet of Things (IoT) and Cyber Security

Due to Bitcoin's innovative block structure, it is both immutable and decentralized, making it a valuable tool or instrument for changing current financial systems. However, the appealing features of Bitcoin have also drawn the attention of cybercriminals. The Bitcoin scripting system allows users to include up to 80 bytes of arbitrary data in Bitcoin transactions, making it possible to store illegal information in the blockchain. This makes Bitcoin a powerful tool for obfuscating information and using it as the command-and-control infrastructure for blockchain-based botnets. On the other hand, Blockchain offers an intriguing solution for IoT security. Blockchain provides strong protection against data tampering, locks Internet of Things devices, and enables the shutdown of compromised devices within an IoT network. Thus, blockchain could be used both to attack and defend IoT networks and communications.

Authored by Aditya Vikram, Sumit Kumar, Mohana

Implementation of Machine Learning for CAPTCHAs Authentication Using Facial Recognition

Web-based technologies are evolving day by day and becoming more interactive and secure. Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is one of the security features that help detect automated bots on the Web. Earlier captcha was complex designed text-based, but some optical recognition-based algorithms can be used to crack it. That is why now the captcha system is image-based. But after the arrival of strong image recognition algorithms, image-based captchas can also be cracked nowadays. In this paper, we propose a new captcha system that can be used to differentiate real humans and bots on the Web. We use advanced deep layers with pre-trained machine learning models for captchas authentication using a facial recognition system.

Authored by Rupendra Raavi, Mansour Alqarni, Patrick Hung

An Approach for P2P Based Botnet Detection Using Machine Learning

The internet has developed and transformed the world dramatically in recent years, which has resulted in several cyberattacks. Cybersecurity is one of society’s most serious challenge, costing millions of dollars every year. The research presented here will look into this area, focusing on malware that can establish botnets, and in particular, detecting connections made by infected workstations connecting with the attacker’s machine. In recent years, the frequency of network security incidents has risen dramatically. Botnets have previously been widely used by attackers to carry out a variety of malicious activities, such as compromising machines to monitor their activities by installing a keylogger or sniffing traffic, launching Distributed Denial of Service (DDOS) attacks, stealing the identity of the machine or credentials, and even exfiltrating data from the user’s computer. Botnet detection is still a work in progress because no one approach exists that can detect a botnet’s whole ecosystem. A detailed analysis of a botnet, discuss numerous parameter’s result of detection methods related to botnet attacks, as well as existing work of botnet identification in field of machine learning are discuss here. This paper focuses on the comparative analysis of various classifier based on design of botnet detection technique which are able to detect P2P botnet using machine learning classifier.

Authored by Priyanka Tikekar, Swati Sherekar, Vilas Thakre

Botnet Detection Based on Machine Learning

A botnet is a new type of attack method developed and integrated on the basis of traditional malicious code such as network worms and backdoor tools, and it is extremely threatening. This course combines deep learning and neural network methods in machine learning methods to detect and classify the existence of botnets. This sample does not rely on any prior features, the final multi-class classification accuracy rate is higher than 98.7%, the effect is significant.

Authored by Xiaoran Yang, Zhen Guo, Zetian Mai

A Mechine Learning Approach for Botnet Detection Using LightGBM

The botnet-based network assault are one of the most serious security threats overlay the Internet this day. Although significant progress has been made in this region of research in recent years, it is still an ongoing and challenging topic to virtually direction the threat of botnets due to their continuous evolution, increasing complexity and stealth, and the difficulties in detection and defense caused by the limitations of network and system architectures. In this paper, we propose a novel and efficient botnet detection method, and the results of the detection method are validated with the CTU-13 dataset.

Authored by Dehao Gong, Yunqing Liu

Botnet Detection via Machine Learning Techniques

The botnet is a serious network security threat that can cause servers crash, so how to detect the behavior of Botnet has already become an important part of the research of network security. DNS(Domain Name System) request is the first step for most of the mainframe computers controlled by Botnet to communicate with the C&C(command; control) server. The detection of DNS request domain names is an important way for mainframe computers controlled by Botnet. However, the detection method based on fixed rules is hard to take effect for botnet based on DGA(Domain Generation Algorithm) because malicious domain names keep evolving and derive many different generation methods. Contrasted with the traditional methods, the method based on machine learning is a better way to detect it by learning and modeling the DGA. This paper presents a method based on the Naive Bayes model, the XGBoost model, the SVM(Support Vector Machine) model, and the MLP(Multi-Layer Perceptron) model, and tests it with real data sets collected from DGA, Alexa, and Secrepo. The experimental results show the precision score, the recall score, and the F1 score for each model.

Authored by Haofan Wang

Analyzing Machine Learning-based Feature Selection for Botnet Detection

In this cyber era, the number of cybercrime problems grows significantly, impacting network communication security. Some factors have been identified, such as malware. It is a malicious code attack that is harmful. On the other hand, a botnet can exploit malware to threaten whole computer networks. Therefore, it needs to be handled appropriately. Several botnet activity detection models have been developed using a classification approach in previous studies. However, it has not been analyzed about selecting features to be used in the learning process of the classification algorithm. In fact, the number and selection of features implemented can affect the detection accuracy of the classification algorithm. This paper proposes an analysis technique for determining the number and selection of features developed based on previous research. It aims to obtain the analysis of using features. The experiment has been conducted using several classification algorithms, namely Decision tree, k-NN, Naïve Bayes, Random Forest, and Support Vector Machine (SVM). The results show that taking a certain number of features increases the detection accuracy. Compared with previous studies, the results obtained show that the average detection accuracy of 98.34% using four features has the highest value from the previous study, 97.46% using 11 features. These results indicate that the selection of the correct number and features affects the performance of the botnet detection model.

Authored by Winda Safitri, Tohari Ahmad, Dandy Hostiadi

Research on Trust Measurement of Terminal Equipment Based on Device Fingerprint

Nowadays, network information security is of great concern, and the measurement of the trustworthiness of terminal devices is of great significance to the security of the entire network. The measurement method of terminal device security trust still has the problems of high complexity, lack of universality. In this paper, the device fingerprint library of device access network terminal devices is first established through the device fingerprint mixed collection method; Secondly, the software and hardware features of the device fingerprint are used to increase the uniqueness of the device identification, and the multi- dimensional standard metric is used to measure the trustworthiness of the terminal device; Finally, Block chain technology is used to store the fingerprint and standard model of network access terminal equipment on the chain. To improve the security level of network access devices, a device access method considering the trust of terminal devices from multiple perspectives is implemented.

Authored by Jiaqi Peng, Ke Yang, Jiaxing Xuan, Da Li, Lei Fan

Machine Learning to Identify Bitcoin Mining by Web Browsers

In the recent development of the online cryptocurrency mining platform, Coinhive, numerous websites have employed “Cryptojacking.” They may need the unauthorized use of CPU resources to mine cryptocurrency and replace advertising income. Web cryptojacking technologies are the most recent attack in information security. Security teams have suggested blocking Cryptojacking scripts by using a blacklist as a strategy. However, the updating procedure of the static blacklist has not been able to promptly safeguard consumers because of the sharp rise in “Cryptojacking kidnapping”. Therefore, we propose a Cryptojacking identification technique based on analyzing the user's computer resources to combat the assault technology known as “Cryptojacking kidnapping.” Machine learning techniques are used to monitor changes in computer resources such as CPU changes. The experiment results indicate that this method is more accurate than the blacklist system and, in contrast to the blacklist system, manually updates the blacklist regularly. The misuse of online Cryptojacking programs and the unlawful hijacking of users' machines for Cryptojacking are becoming worse. In the future, information security undoubtedly addresses the issue of how to prevent Cryptojacking and abduction. The result of this study helps to save individuals from unintentionally becoming miners.

Authored by Min-Hao Wu, Jian-Hung Huang, Jian-Xin Chen, Hao-Jyun Wang, Chen-Yu Chiu

Defense Against Spectrum Sensing Data Falsification Attack in Cognitive Radio Networks using Machine Learning

Cognitive radio (CR) networks are an emerging and promising technology to improve the utilization of vacant bands. In CR networks, security is a very noteworthy domain. Two threatening attacks are primary user emulation (PUE) and spectrum sensing data falsification (SSDF). A PUE attacker mimics the primary user signals to deceive the legitimate secondary users. The SSDF attacker falsifies its observations to misguide the fusion center to make a wrong decision about the status of the primary user. In this paper, we propose a scheme based on clustering the secondary users to counter SSDF attacks. Our focus is on detecting and classifying each cluster as reliable or unreliable. We introduce two different methods using an artificial neural network (ANN) for both methods and five more classifiers such as support vector machine (SVM), random forest (RF), K-nearest neighbors (KNN), logistic regression (LR), and decision tree (DR) for the second one to achieve this goal. Moreover, we consider deterministic and stochastic scenarios with white Gaussian noise (WGN) for attack strategy. Results demonstrate that our method outperforms a recently suggested scheme.

Authored by Nazanin Parhizgar, Ali Jamshidi, Peyman Setoodeh

Sequential Statistical Analysis-Based Method for Attacks Detection in Cognitive Radio Networks

This Cognitive radio networks are vulnerable to specific intrusions due to the unique cognitive characteristics of these networks. This DoS attacks are known as the Primary User Emulation Attack and the Spectrum Sensing Data Falsification. If the intruder behavior is not statistically identical to the behavior of the primary users, intrusion detection techniques based on observing the energy of the received signals can be used. Both machine learning-based intrusion detection and sequential statistical analysis can be effectively applied. However, in some cases, statistical sequential analysis has some advantages in dealing with such challenges. This paper discusses aspects of using statistical sequential analysis methods to detect attacks in Cognitive radio networks.

Authored by Vladimir Shakhov

Operator Partitioning and Parallel Scheduling Optimization for Deep Learning Compiler

TVM(tensor virtual machine) as a deep learning compiler which supports the conversion of machine learning models into TVM IR(intermediate representation) and to optimise the generation of high-performance machine code for various hardware platforms. While the traditional approach is to parallelise the cyclic transformations of operators, in this paper we partition the implementation of the operators in the deep learning compiler TVM with parallel scheduling to derive a faster running time solution for the operators. An optimisation algorithm for partitioning and parallel scheduling is designed for the deep learning compiler TVM, where operators such as two-dimensional convolutions are partitioned into multiple smaller implementations and several partitioned operators are run in parallel scheduling to derive the best operator partitioning and parallel scheduling decisions by means of performance estimation. To evaluate the effectiveness of the algorithm, multiple examples of the two-dimensional convolution operator, the average pooling operator, the maximum pooling operator, and the ReLU activation operator with different input sizes were tested on the CPU platform, and the performance of these operators was experimentally shown to be improved and the operators were run speedily.

Authored by Zhiyu Li, Xiang Zhou, Wenbin Weng

DIComP: Lightweight Data-Driven Inference of Binary Compiler Provenance with High Accuracy

Binary analysis is pervasively utilized to assess software security and test vulnerabilities without accessing source codes. The analysis validity is heavily influenced by the inferring ability of information related to the code compilation. Among the compilation information, compiler type and optimization level, as the key factors determining how binaries look like, are still difficult to be inferred efficiently with existing tools. In this paper, we conduct a thorough empirical study on the binary's appearance under various compilation settings and propose a lightweight binary analysis tool based on the simplest machine learning method, called DIComP to infer the compiler and optimization level via most relevant features according to the observation. Our comprehensive evaluations demonstrate that DIComP can fully recognize the compiler provenance, and it is effective in inferring the optimization levels with up to 90% accuracy. Also, it is efficient to infer thousands of binaries at a millisecond level with our lightweight machine learning model (1MB).

Authored by Ligeng Chen, Zhongling He, Hao Wu, Fengyuan Xu, Yi Qian, Bing Mao

Scalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler Augmentation

Derivatives are key to numerous science, engineering, and machine learning applications. While existing tools generate derivatives of programs in a single language, modern parallel applications combine a set of frameworks and languages to leverage available performance and function in an evolving hardware landscape. We propose a scheme for differentiating arbitrary DAG-based parallelism that preserves scalability and efficiency, implemented into the LLVM-based Enzyme automatic differentiation framework. By integrating with a full-fledged compiler backend, Enzyme can differentiate numerous parallel frameworks and directly control code generation. Combined with its ability to differentiate any LLVM-based language, this flexibility permits Enzyme to leverage the compiler tool chain for parallel and differentiation-specitic optimizations. We differentiate nine distinct versions of the LULESH and miniBUDE applications, written in different programming languages (C++, Julia) and parallel frameworks (OpenMP, MPI, RAJA, Julia tasks, MPI.jl), demonstrating similar scalability to the original program. On benchmarks with 64 threads or nodes, we find a differentiation overhead of 3.4–6.8× on C++ and 5.4–12.5× on Julia.

Authored by William Moses, Sri Narayanan, Ludger Paehler, Valentin Churavy, Michel Schanen, Jan Hückelheim, Johannes Doerfert, Paul Hovland

Cybers Security Analysis and Measurement Tools Using Machine Learning Approach

Artificial intelligence (AI) and machine learning (ML) have been used in transforming our environment and the way people think, behave, and make decisions during the last few decades [1]. In the last two decades everyone connected to the Internet either an enterprise or individuals has become concerned about the security of his/their computational resources. Cybersecurity is responsible for protecting hardware and software resources from cyber attacks e.g. viruses, malware, intrusion, eavesdropping. Cyber attacks either come from black hackers or cyber warfare units. Artificial intelligence (AI) and machine learning (ML) have played an important role in developing efficient cyber security tools. This paper presents Latest Cyber Security Tools Based on Machine Learning which are: Windows defender ATP, DarckTrace, Cisco Network Analytic, IBM QRader, StringSifter, Sophos intercept X, SIME, NPL, and Symantec Targeted Attack Analytic.

Authored by Taher Ghazal, Mohammad Hasan, Raed Zitar, Nidal Al-Dmour, Waleed Al-Sit, Shayla Islam

Diverse Approaches Have Been Presented To Mitigate SQL Injection Attack, But It Is Still Alive: A Review

A huge amount of stored and transferred data is expanding rapidly. Therefore, managing and securing the big volume of diverse applications should have a high priority. However, Structured Query Language Injection Attack (SQLIA) is one of the most common dangerous threats in the world. Therefore, a large number of approaches and models have been presented to mitigate, detect or prevent SQL injection attack but it is still alive. Most of old and current models are created based on static, dynamic, hybrid or machine learning techniques. However, SQL injection attack still represents the highest risk in the trend of web application security risks based on several recent studies in 2021. In this paper, we present a review of the latest research dealing with SQL injection attack and its types, and demonstrating several types of most recent and current techniques, models and approaches which are used in mitigating, detecting or preventing this type of dangerous attack. Then, we explain the weaknesses and highlight the critical points missing in these techniques. As a result, we still need more efforts to make a real, novel and comprehensive solution to be able to cover all kinds of malicious SQL commands. At the end, we provide significant guidelines to follow in order to mitigate such kind of attack, and we strongly believe that these tips will help developers, decision makers, researchers and even governments to innovate solutions in the future research to stop SQLIA.

Authored by Mohammad Qbea'h, Saed Alrabaee, Mohammad Alshraideh, Khair Sabri

Evaluating Synthetic Datasets for Training Machine Learning Models to Detect Malicious Commands

Electrical substations in power grid act as the critical interface points for the transmission and distribution networks. Over the years, digital technology has been integrated into the substations for remote control and automation. As a result, substations are more prone to cyber attacks and exposed to digital vulnerabilities. One of the notable cyber attack vectors is the malicious command injection, which can lead to shutting down of substations and subsequently power outages as demonstrated in Ukraine Power Plant Attack in 2015. Prevailing measures based on cyber rules (e.g., firewalls and intrusion detection systems) are often inadequate to detect advanced and stealthy attacks that use legitimate-looking measurements or control messages to cause physical damage. Additionally, defenses that use physics-based approaches (e.g., power flow simulation, state estimation, etc.) to detect malicious commands suffer from high latency. Machine learning serves as a potential solution in detecting command injection attacks with high accuracy and low latency. However, sufficient datasets are not readily available to train and evaluate the machine learning models. In this paper, focusing on this particular challenge, we discuss various approaches for the generation of synthetic data that can be used to train the machine learning models. Further, we evaluate the models trained with the synthetic data against attack datasets that simulates malicious commands injections with different levels of sophistication. Our findings show that synthetic data generated with some level of power grid domain knowledge helps train robust machine learning models against different types of attacks.

Authored by Jia Teo, Sean Gunawan, Partha Biswas, Daisuke Mashima

Detection of web attacks using machine learning based URL classification techniques

For a long time, online attacks were regarded to pose a severe threat to web - based applications, websites, and clients. It can bypass authentication methods, steal sensitive information from datasets and clients, and also gain ultimate authority of servers. A variety of ways for safeguarding online apps have been developed and used to deal the website risks. Based on the studies about the intersection of cybersecurity and machine learning, countermeasures for identifying typical web assaults have recently been presented (ML). In order to establish a better understanding on this essential topic, it is necessary to study ML methodologies, feature extraction techniques, evaluate datasets, and performance metrics utilised in a systematic manner. In this paper, we go through web security flaws like SQLi, XSS, malicious URLs, phishing attacks, path traversal, and CMDi in detail. We also go through the existing security methods for detecting these threats using machine learning approaches for URL classification. Finally, we discuss potential research opportunities for ML and DL-based techniques in this category, based on a thorough examination of existing solutions in the literature.

Authored by Aditi Saxena, Akarshi Arora, Saumya Saxena, Ashwni Kumar

Network-Based Machine Learning Detection of Covert Channel Attacks on Cyber-Physical Systems

Most of the recent high-profile attacks targeting cyber-physical systems (CPS) started with lengthy reconnaissance periods that enabled attackers to gain in-depth understanding of the victim’s environment. To simulate these stealthy attacks, several covert channel tools have been published and proven effective in their ability to blend into existing CPS communication streams and have the capability for data exfiltration and command injection.In this paper, we report a novel machine learning feature engineering and data processing pipeline for the detection of covert channel attacks on CPS systems with real-time detection throughput. The system also operates at the network layer without requiring physical system domain-specific state modeling, such as voltage levels in a power generation system. We not only demonstrate the effectiveness of using TCP payload entropy as engineered features and the technique of grouping information into network flows, but also pitch the proposed detector against scenarios employing advanced evasion tactics, and still achieve above 99% detection performance.

Authored by Hongwei Li, Danai Chasaki

An LSTM-based Intent Detector for Conversational Recommender Systems

With the rapid development of artificial intelligence (AI), many companies are moving towards automating their services using automated conversational agents. Dialogue-based conversational recommender agents, in particular, have gained much attention recently. The successful development of such systems in the case of natural language input is conditioned by the ability to understand the users’ utterances. Predicting the users’ intents allows the system to adjust its dialogue strategy and gradually upgrade its preference profile. Nevertheless, little work has investigated this problem so far. This paper proposes an LSTM-based Neural Network model and compares its performance to seven baseline Machine Learning (ML) classifiers. Experiments on a new publicly available dataset revealed The superiority of the LSTM model with 95% Accuracy and 94% F1-score on the full dataset despite the relatively small dataset size (9300 messages and 17 intents) and label imbalance.

Authored by Mourad Jbene, Smail Tigani, Rachid Saadane, Abdellah Chehri

Quality Assurance of Generative Dialog Models in an Evolving Conversational Agent Used for Swedish Language Practice

Due to the migration megatrend, efficient and effective second-language acquisition is vital. One proposed solution involves AI-enabled conversational agents for person-centered interactive language practice. We present results from ongoing action research targeting quality assurance of proprietary generative dialog models trained for virtual job interviews. The action team elicited a set of 38 requirements for which we designed corresponding automated test cases for 15 of particular interest to the evolving solution. Our results show that six of the test case designs can detect meaningful differences between candidate models. While quality assurance of natural language processing applications is complex, we provide initial steps toward an automated framework for machine learning model selection in the context of an evolving conversational agent. Future work will focus on model selection in an MLOps setting.

Authored by Markus Borg, Johan Bengtsson, Harald Österling, Alexander Hagelborn, Isabella Gagner, Piotr Tomaszewski

AI-Enabled Conversational Agents in Service of Mild Cognitive Impairment Patients

Over the past two decades, several forms of non-intrusive technology have been deployed in cooperation with medical specialists in order to aid patients diagnosed with some form of mental, cognitive or psychological condition. Along with the availability and accessibility to applications offered by mobile devices, as well as the advancements in the field of Artificial Intelligence applications and Natural Language Processing, Conversational Agents have been developed with the objective of aiding medical specialists detecting those conditions in their early stages and monitoring their symptoms and effects on the cognitive state of the patient, as well as supporting the patient in their effort of mitigating those symptoms. Coupled with the recent advances in the the scientific field of machine and deep learning, we aim to explore the grade of applicability of such technologies into cognitive health support Conversational Agents, and their impact on the acceptability of such applications bytheir end users. Therefore, we conduct a systematic literature review, following a transparent and thorough process in order to search and analyze the bibliography of the past five years, focused on the implementation of Conversational Agents, supported by Artificial Intelligence technologies and in service of patients diagnosed with Mild Cognitive Impairment and its variants.

Authored by Ioannis Kostis, Konstantinos Karamitsios, Konstantinos Kotrotsios, Magda Tsolaki, Anthoula Tsolaki