Publications | Science of Security Virtual Organization

Year

Type

Cryptocurrency Prediction and Analysis between Supervised and Unsupervised Learning with XAI

The stock market is a topic that is of interest to all sorts of people. It is a place where the prices change very drastically. So, something needs to be done to help the people risking their money on the stock market. The public s opinions are crucial for the stock market. Sentiment is a very powerful force that is constantly changing and having a significant impact. It is reflected on social media platforms, where almost the entire country is active, as well as in the daily news. Many projects have been done in the stock prediction genre, but since sentiments play a big part in the stock market, making predictions of prices without them would lead to inefficient predictions, and hence Sentiment analysis is very important for stock market price prediction. To predict stock market prices, we will combine sentiment analysis from various sources, including News and Twitter. Results are evaluated for two different cryptocurrencies: Ethereum and Solana. Random Forest achieved the best RMSE of 13.434 and MAE of 11.919 for Ethereum. Support Vector Machine achieved the best RMSE of 2.48 and MAE of 1.78 for Solana.

Authored by Arayan Gupta, Durgesh Vyas, Pranav Nale, Harsh Jain, Sashikala Mishra, Ranjeet Bidwe, Bhushan Zope, Amar Buchade

XAI for Communication Networks

Explainable AI (XAI) is a topic of intense activity in the research community today. However, for AI models deployed in the critical infrastructure of communications networks, explainability alone is not enough to earn the trust of network operations teams comprising human experts with many decades of collective experience. In the present work we discuss some use cases in communications networks and state some of the additional properties, including accountability, that XAI models would have to satisfy before they can be widely deployed. In particular, we advocate for a human-in-the-Ioop approach to train and validate XAI models. Additionally, we discuss the use cases of XAI models around improving data preprocessing and data augmentation techniques, and refining data labeling rules for producing consistently labeled network datasets.

Authored by Sayandev Mukherjee, Jason Rupe, Jingjie Zhu

Real-Time Zero-Day Intrusion Detection System for Automotive Controller Area Network on FPGAs

Increasing automation in vehicles enabled by increased connectivity to the outside world has exposed vulnerabilities in previously siloed automotive networks like controller area networks (CAN). Attributes of CAN such as broadcast-based communication among electronic control units (ECUs) that lowered deployment costs are now being exploited to carry out active injection attacks like denial of service (DoS), fuzzing, and spoofing attacks. Research literature has proposed multiple supervised machine learning models deployed as Intrusion detection systems (IDSs) to detect such malicious activity; however, these are largely limited to identifying previously known attack vectors. With the ever-increasing complexity of active injection attacks, detecting zero-day (novel) attacks in these networks in real-time (to prevent propagation) becomes a problem of particular interest. This paper presents an unsupervised-learning-based convolutional autoencoder architecture for detecting zero-day attacks, which is trained only on benign (attack-free) CAN messages. We quantise the model using Vitis-AI tools from AMD/Xilinx targeting a resource-constrained Zynq Ultrascale platform as our IDS-ECU system for integration. The proposed model successfully achieves equal or higher classification accuracy (\textgreater 99.5\%) on unseen DoS, fuzzing, and spoofing attacks from a publicly available attack dataset when compared to the state-of-the-art unsupervised learning-based IDSs. Additionally, by cleverly overlapping IDS operation on a window of CAN messages with the reception, the model is able to meet line-rate detection (0.43 ms per window) of high-speed CAN, which when coupled with the low energy consumption per inference, makes this architecture ideally suited for detecting zero-day attacks on critical CAN networks.

Authored by Shashwat Khandelwal, Shanker Shreejith

Advancing Network Survivability and Reliability: Integrating XAI-Enhanced Autoencoders and LDA for Effective Detection of Unknown Attacks

This study presents a novel approach for fortifying network security systems, crucial for ensuring network reliability and survivability against evolving cyber threats. Our approach integrates Explainable Artificial Intelligence (XAI) with an en-semble of autoencoders and Linear Discriminant Analysis (LDA) to create a robust framework for detecting both known and elusive zero-day attacks. We refer to this integrated method as AE- LDA. Our method stands out in its ability to effectively detect both known and previously unidentified network intrusions. By employing XAI for feature selection, we ensure improved inter-pretability and precision in identifying key patterns indicative of network anomalies. The autoencoder ensemble, trained on benign data, is adept at recognising a broad spectrum of network behaviours, thereby significantly enhancing the detection of zero-day attacks. Simultaneously, LDA aids in the identification of known threats, ensuring a comprehensive coverage of potential network vulnerabilities. This hybrid model demonstrates superior performance in anomaly detection accuracy and complexity management. Our results highlight a substantial advancement in network intrusion detection capabilities, showcasing an effective strategy for bolstering network reliability and resilience against a diverse range of cyber threats.

Authored by Fatemeh Stodt, Fabrice Theoleyre, Christoph Reich

Adaptive Security Management Model for Networks

Adaptive security is considered as an approach in cybersecurity that analyzes events and against events and behaviors to protect a network. This study will provide details about the different algorithms being used to secure networks. These approaches are driven by a small quantity of labeled data and a massive amount of unlabeled data. In this context, contemporary semi-supervised learning strategies base their operations on the assumption that the distributions of labeled and unlabeled data are comparable. This assumption has a substantial influence on how well these strategies perform overall. If unlabeled data contain information that does not belong to a particular category, the efficiency of the system will deteriorate.

Authored by Lakshmana Maguluri, Jemi P, Rahini Sudha, K.P. Aishwarya, Jayanthi S, Narendra Bohra

Federated Learning for Zero-Day Attack Detection in 5G and Beyond V2X Networks

Deploying Connected and Automated Vehicles (CAVs) on top of 5G and Beyond networks (5GB) makes them vulnerable to increasing vectors of security and privacy attacks. In this context, a wide range of advanced machine/deep learningbased solutions have been designed to accurately detect security attacks. Speciﬁcally, supervised learning techniques have been widely applied to train attack detection models. However, the main limitation of such solutions is their inability to detect attacks different from those seen during the training phase, or new attacks, also called zero-day attacks. Moreover, training the detection model requires signiﬁcant data collection and labeling, which increases the communication overhead, and raises privacy concerns. To address the aforementioned limits, we propose in this paper a novel detection mechanism that leverages the ability of the deep auto-encoder method to detect attacks relying only on the benign network trafﬁc pattern. Using federated learning, the proposed intrusion detection system can be trained with large and diverse benign network trafﬁc, while preserving the CAVs’ privacy, and minimizing the communication overhead. The in-depth experiment on a recent network trafﬁc dataset shows that the proposed system achieved a high detection rate while minimizing the false positive rate, and the detection delay.

Authored by Abdelaziz Korba, Abdelwahab Boualouache, Bouziane Brik, Rabah Rahal, Yacine Ghamri-Doudane, Sidi Senouci

Semi-supervised Trojan Nets Classification Using Anomaly Detection Based on SCOAP Features

Recently, hardware Trojan has become a serious security concern in the integrated circuit (IC) industry. Due to the globalization of semiconductor design and fabrication processes, ICs are highly vulnerable to hardware Trojan insertion by malicious third-party vendors. Therefore, the development of effective hardware Trojan detection techniques is necessary. Testability measures have been proven to be efﬁcient features for Trojan nets classiﬁcation. However, most of the existing machine-learning-based techniques use supervised learning methods, which involve time-consuming training processes, need to deal with the class imbalance problem, and are not pragmatic in real-world situations. Furthermore, no works have explored the use of anomaly detection for hardware Trojan detection tasks. This paper proposes a semi-supervised hardware Trojan detection method at the gate level using anomaly detection. We ameliorate the existing computation of the Sandia Controllability/Observability Analysis Program (SCOAP) values by considering all types of D ﬂip-ﬂops and adopt semi-supervised anomaly detection techniques to detect Trojan nets. Finally, a novel topology-based location analysis is utilized to improve the detection performance. Testing on 17 Trust-Hub Trojan benchmarks, the proposed method achieves an overall 99.47\% true positive rate (TPR), 99.99\% true negative rate (TNR), and 99.99\% accuracy.

Authored by Pei-Yu Lo, Chi-Wei Chen, Wei-Ting Hsu, Chih-Wei Chen, Chin-Wei Tien, Sy-Yen Kuo

Threat Recognition Through Victim And Assailant s Pose And Used Threat Object By Applying YOLOv5s Algorithm

This study aimed to recognize threats by recognizing the assailant pose, victim pose, and the threat object used by the assailant in one frame in a threat emergency situation using a 2D camera and by applying YOLOv5s algorithm. The system s ability to correctly identify threats depends heavily on the training and labeling in YOLOv5s. Thus, the bounding boxes were carefully assigned, and the labels were arranged properly. Through the application of YOLOv5s algorithm, supervised learning was implemented. Recognized threats were identified by recognizing the three variables including, victim pose, assailant pose, and threat object in one frame. The YOLOv5s were able to localize the pose and object and avoid misclassification by setting the appropriate Intersection over Union (IoU) and confidence threshold. Using a truth table, YOLOv5s was able to identify threats by removing possibilities that were not even threats. As for the result, the system was able to recognize each of the assailant poses, victim poses, and threat objects in one frame. Thus, the system was able to obtain an overall reliability of 98.125\%.

Authored by Shaina Languido, Erika Entredicho, Kimbierly Borromeo, Ma. Manaois, Karl Villanueva, Engr. Tolentino

Prior Knowledge based Advanced Persistent Threats Detection for IoT in a Realistic Benchmark

The number of Internet of Things (IoT) devices being deployed into networks is growing at a phenomenal pace, which makes IoT networks more vulnerable in the wireless medium. Advanced Persistent Threat (APT) is malicious to most of the network facilities and the available attack data for training the machine learning-based Intrusion Detection System (IDS) is limited when compared to the normal trafﬁc. Therefore, it is quite challenging to enhance the detection performance in order to mitigate the inﬂuence of APT. Therefore, Prior Knowledge Input (PKI) models are proposed and tested using the SCVIC-APT2021 dataset. To obtain prior knowledge, the proposed PKI model pre-classiﬁes the original dataset with unsupervised clustering method. Then, the obtained prior knowledge is incorporated into the supervised model to decrease training complexity and assist the supervised model in determining the optimal mapping between the raw data and true labels. The experimental ﬁndings indicate that the PKI model outperforms the supervised baseline, with the best macro average F1-score of 81.37\%, which is 10.47\% higher than the baseline.

Authored by Yu Shen, Murat Simsek, Burak Kantarci, Hussein Mouftah, Mehran Bagheri, Petar Djukic

Transferring multiple text styles using CycleGAN with supervised style latent space

Neural Style Transfer - Text style transfer is a relevant task, contributing to theoretical and practical advancement in several areas, especially when working with non-parallel data. The concept behind nonparallel style transfer is to change a specific dimension of the sentence while retaining the overall context. Previous work used adversarial learning to perform such a task. Although it was not initially created to work with textual data, it proved very effective. Most of the previous work has focused on developing algorithms capable of transferring between binary styles, with limited generalization capabilities and limited applications. This work proposes a framework capable of working with multiple styles and improving content retention (BLEU) after a transfer. The proposed framework combines supervised learning of latent spaces and their separation within the architecture. The results suggest that the proposed framework improves content retention in multi-style scenarios while maintaining accuracy comparable to state-of-the-art.

Authored by Lorenzo Vecchi, Eliane Maffezzolli, Emerson Paraiso

Analysis of the Optimized KNN Algorithm for the Data Security of DR Service

Nearest Neighbor Search - The data of large-scale distributed demand-side iot devices are gradually migrated to the cloud. This cloud deployment mode makes it convenient for IoT devices to participate in the interaction between supply and demand, and at the same time exposes various vulnerabilities of IoT devices to the Internet, which can be easily accessed and manipulated by hackers to launch large-scale DDoS attacks. As an easy-to-understand supervised learning classification algorithm, KNN can obtain more accurate classification results without too many adjustment parameters, and has achieved many research achievements in the field of DDoS detection. However, in the face of high-dimensional data, this method has high operation cost, high cost and not practical. Aiming at this disadvantage, this chapter explores the potential of classical KNN algorithm in data storage structure, Knearest neighbor search and hyperparameter optimization, and proposes an improved KNN algorithm for DDoS attack detection of demand-side IoT devices.

Authored by Kun Shi, Songsong Chen, Dezhi Li, Ke Tian, Meiling Feng

Automatic classification of OER for metadata quality assessment

Metadata Discovery Problem - Open Educational Resources (OER) are educational materials that are available in different repositories such as Merlot, SkillsCommons, MIT OpenCourseWare, etc. The quality of metadata facilitates the search and discovery tasks of educational resources. This work evaluates the metadata quality of 4142 OER from SkillsCommons. We applied supervised machine learning algorithms (Support Vector Machine and Random Forest Classiﬁer) for automatic classiﬁcation of two metadata: description and material type. Based on our data and model, performances of a ﬁrst classiﬁcation effort is reported with the accuracy of 70\%.

Authored by Veronica Segarra-Faggioni, Audrey Romero-Pelaez

BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec

Malware Classification - Rapid digitalisation spurred by the Covid-19 pandemic has resulted in more cyber crime. Malware-as-a-service is now a booming business for cyber criminals. With the surge in malware activities, it is vital for cyber defenders to understand more about the malware samples they have at hand as such information can greatly influence their next course of actions during a breach. Recently, researchers have shown how malware family classification can be done by first converting malware binaries into grayscale images and then passing them through neural networks for classification. However, most work focus on studying the impact of different neural network architectures on classification performance. In the last year, researchers have shown that augmenting supervised learning with self-supervised learning can improve performance. Even more recently, Data2Vec was proposed as a modality agnostic self-supervised framework to train neural networks. In this paper, we present BinImg2Vec, a framework of training malware binary image classifiers that incorporates both self-supervised learning and supervised learning to produce a model that consistently outperforms one trained only via supervised learning. We also show how our framework produces outputs that facilitate explanability.

Authored by Lee Sern, Tay Keng, Chua Fu

Representation Learning with Function Call Graph Transformations for Malware Open Set Recognition

Malware Analysis and Graph Theory - Open set recognition (OSR) problem has been a challenge in many machine learning (ML) applications, such as security. As new/unknown malware families occur regularly, it is difficult to exhaust samples that cover all the classes for the training process in ML systems. An advanced malware classification system should classify the known classes correctly while sensitive to the unknown class. In this paper, we introduce a self-supervised pre-training approach for the OSR problem in malware classification. We propose two transformations for the function call graph (FCG) based malware representations to facilitate the pretext task. Also, we present a statistical thresholding approach to find the optimal threshold for the unknown class. Moreover, the experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem.

Authored by Jingyun Jia, Philip Chan

Analysis of the Optimized KNN Algorithm for the Data Security of DR Service

Internet-scale Computing Security - The data of large-scale distributed demand-side iot devices are gradually migrated to the cloud. This cloud deployment mode makes it convenient for IoT devices to participate in the interaction between supply and demand, and at the same time exposes various vulnerabilities of IoT devices to the Internet, which can be easily accessed and manipulated by hackers to launch large-scale DDoS attacks. As an easy-to-understand supervised learning classification algorithm, KNN can obtain more accurate classification results without too many adjustment parameters, and has achieved many research achievements in the field of DDoS detection. However, in the face of high-dimensional data, this method has high operation cost, high cost and not practical. Aiming at this disadvantage, this chapter explores the potential of classical KNN algorithm in data storage structure, K-nearest neighbor search and hyperparameter optimization, and proposes an improved KNN algorithm for DDoS attack detection of demand-side IoT devices.

Authored by Kun Shi, Songsong Chen, Dezhi Li, Ke Tian, Meiling Feng

A Framework to Detect the Malicious Insider Threat in Cloud Environment using Supervised Learning Methods

Insider Threat - A malicious insider threat is more vulnerable to an organization. It is necessary to detect the malicious insider because of its huge impact to an organization. The occurrence of a malicious insider threat is less but quite destructive. So, the major focus of this paper is to detect the malicious insider threat in an organization. The traditional insider threat detection algorithm is not suitable for real time insider threat detection. A supervised learning-based anomaly detection technique is used to classify, predict and detect the malicious and non-malicious activity based on highest level of anomaly score. In this paper, a framework is proposed to detect the malicious insider threat using supervised learning-based anomaly detection. It is used to detect the malicious insider threat activity using One-Class Support Vector Machine (OCSVM). The experimental results shows that the proposed framework using OCSVM performs well and detects the malicious insider who obtain huge anomaly score than a normal user.

Authored by G. Padmavathi, D. Shanmugapriya, S. Asha

Representation Learning with Function Call Graph Transformations for Malware Open Set Recognition

Open set recognition (OSR) problem has been a challenge in many machine learning (ML) applications, such as security. As new/unknown malware families occur regularly, it is difficult to exhaust samples that cover all the classes for the training process in ML systems. An advanced malware classification system should classify the known classes correctly while sensitive to the unknown class. In this paper, we introduce a self-supervised pre-training approach for the OSR problem in malware classification. We propose two transformations for the function call graph (FCG) based malware representations to facilitate the pretext task. Also, we present a statistical thresholding approach to find the optimal threshold for the unknown class. Moreover, the experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem.

Authored by Jingyun Jia, Philip Chan

SHIL: Self-Supervised Hybrid Learning for Security Attack Detection in Containerized Applications

Container security has received much research attention recently. Previous work has proposed to apply various machine learning techniques to detect security attacks in containerized applications. On one hand, supervised machine learning schemes require sufficient labelled training data to achieve good attack detection accuracy. On the other hand, unsupervised machine learning methods are more practical by avoiding training data labelling requirements, but they often suffer from high false alarm rates. In this paper, we present SHIL, a self-supervised hybrid learning solution, which combines unsupervised and supervised learning methods to achieve high accuracy without requiring any manual data labelling. We have implemented a prototype of SHIL and conducted experiments over 41 real world security attacks in 28 commonly used server applications. Our experimental results show that SHIL can reduce false alarms by 39-91% compared to existing supervised or unsupervised machine learning schemes while achieving a higher or similar detection rate.

Authored by Yuhang Lin, Olufogorehan Tunde-Onadele, Xiaohui Gu, Jingzhu He, Hugo Latapie

An Innovative Method in Improving the accuracy in Intrusion detection by comparing Random Forest over Support Vector Machine

Improving the accuracy of intruders in innovative Intrusion detection by comparing Machine Learning classifiers such as Random Forest (RF) with Support Vector Machine (SVM). Two groups of supervised Machine Learning algorithms acquire perfection by looking at the Random Forest calculation (N=20) with the Support Vector Machine calculation (N=20)G power value is 0.8. Random Forest (99.3198%) has the highest accuracy than the SVM (9S.56l5%) and the independent T-test was carried out (=0.507) and shows that it is statistically insignificant (p \textgreater0.05) with a confidence value of 95% by comparing RF and SVM. Conclusion: The comparative examination displays that the Random Forest is more productive than the Support Vector Machine for identifying the intruders are significantly tested.

Authored by Marri Kumar, K. Malathi

Comparative Study of Machine Learning Techniques for Intrusion Detection Systems

Being a part of today’s technical world, we are connected through a vast network. More we are addicted to these modernization techniques we need security. There must be reliability in a network security system so that it is capable of doing perfect monitoring of the whole network of an organization so that any unauthorized users or intruders wouldn’t be able to halt our security breaches. Firewalls are there for securing our internal network from unauthorized outsiders but still some time possibility of attacks is there as according to a survey 60% of attacks were internal to the network. So, the internal system needs the same higher level of security just like external. So, understanding the value of security measures with accuracy, efficiency, and speed we got to focus on implementing and comparing an improved intrusion detection system. A comprehensive literature review has been done and found that some feature selection techniques with standard scaling combined with Machine Learning Techniques can give better results over normal existing ML Techniques. In this survey paper with the help of the Uni-variate Feature selection method, the selection of 14 essential features out of 41 is performed which are used in comparative analysis. We implemented and compared both binary class classification and multi-class classification-based Intrusion Detection Systems (IDS) for two Supervised Machine Learning Techniques Support Vector Machine and Classification and Regression Techniques.

Authored by Pushpa Singh, Parul Tomar, Madhumita Kathuria

Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers

Graph-based Semi-Supervised Learning (GSSL) is a practical solution to learn from a limited amount of labelled data together with a vast amount of unlabelled data. However, due to their reliance on the known labels to infer the unknown labels, these algorithms are sensitive to data quality. It is therefore essential to study the potential threats related to the labelled data, more specifically, label poisoning. In this paper, we propose a novel data poisoning method which efficiently approximates the result of label inference to identify the inputs which, if poisoned, would produce the highest number of incorrectly inferred labels. We extensively evaluate our approach on three classification problems under 24 different experimental settings each. Compared to the state of the art, our influence-driven attack produces an average increase of error rate 50% higher, while being faster by multiple orders of magnitude. Moreover, our method can inform engineers of inputs that deserve investigation (relabelling them) before training the learning model. We show that relabelling one-third of the poisoned inputs (selected based on their influence) reduces the poisoning effect by 50%. ACM Reference Format: Adriano Franci, Maxime Cordy, Martin Gubri, Mike Papadakis, and Yves Le Traon. 2022. Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers. In 1st Conference on AI Engineering - Software Engineering for AI (CAIN’22), May 16–24, 2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3522664.3528606

Authored by Adriano Franci, Maxime Cordy, Martin Gubri, Mike Papadakis, Yves Le Traon

Using Supervised Learning to Assign New Consumers to Demand Response Programs According to the Context

Active consumers have now been empowered thanks to the smart grid concept. To avoid fossil fuels, the demand side must provide flexibility through Demand Response events. However, selecting the proper participants for an event can be complex due to response uncertainty. The authors design a Contextual Consumer Rate to identify the trustworthy participants according to previous performances. In the present case study, the authors address the problem of new players with no information. In this way, two different methods were compared to predict their rate. Besides, the authors also refer to the consumer privacy testing of the dataset with and without information that could lead to the participant identification. The results found to prove that, for the proposed methodology, private information does not have a high impact to attribute a rate.

Authored by Cátia Silva, Pedro Faria, Zita Vale

ATVSA: Vehicle Driver Profiling for Situational Awareness

Increasing connectivity and automation in vehicles leads to a greater potential attack surface. Such vulnerabilities within vehicles can also be used for auto-theft, increasing the potential for attackers to disable anti-theft mechanisms implemented by vehicle manufacturers. We utilize patterns derived from Controller Area Network (CAN) bus traffic to verify driver “behavior”, as a basis to prevent vehicle theft. Our proposed model uses semi-supervised learning that continuously profiles a driver, using features extracted from CAN bus traffic. We have selected 15 key features and obtained an accuracy of 99% using a dataset comprising a total of 51 features across 10 different drivers. We use a number of data analysis algorithms, such as J48, Random Forest, JRip and clustering, using 94K records. Our results show that J48 is the best performing algorithm in terms of training and testing (1.95 seconds and 0.44 seconds recorded, respectively). We also analyze the effect of using a sliding window on algorithm performance, altering the size of the window to identify the impact on prediction accuracy.

Authored by Rashid Khan, Neetesh Saxena, Omer Rana, Prosanta Gope

A Novel Approach Exploiting Machine Learning to Detect SQLi Attacks

The increasing use of Information Technology applications in the distributed environment is increasing security exploits. Information about vulnerabilities is also available on the open web in an unstructured format that developers can take advantage of to fix vulnerabilities in their IT applications. SQL injection (SQLi) attacks are frequently launched with the objective of exfiltration of data typically through targeting the back-end server organisations to compromise their customer databases. There have been a number of high profile attacks against large enterprises in recent years. With the ever-increasing growth of online trading, it is possible to see how SQLi attacks can continue to be one of the leading routes for cyber-attacks in the future, as indicated by findings reported in OWASP. Various machine learning and deep learning algorithms have been applied to detect and prevent these attacks. However, such preventive attempts have not limited the incidence of cyber-attacks and the resulting compromised database as reported by (CVE) repository. In this paper, the potential of using data mining approaches is pursued in order to enhance the efficacy of SQL injection safeguarding measures by reducing the false-positive rates in SQLi detection. The proposed approach uses CountVectorizer to extract features and then apply various supervised machine-learning models to automate the classification of SQLi. The model that returns the highest accuracy has been chosen among available models. Also a new model has been created PALOSDM (Performance analysis and Iterative optimisation of the SQLI Detection Model) for reducing false-positive rate and false-negative rate. The detection rate accuracy has also been improved significantly from a baseline of 94% up to 99%.

Authored by Ahmed Ashlam, Atta Badii, Frederic Stahl

NoSQL Injection Detection Using Supervised Text Classification

For a long time, SQL injection has been considered one of the most serious security threats. NoSQL databases are becoming increasingly popular as big data and cloud computing technologies progress. NoSQL injection attacks are designed to take advantage of applications that employ NoSQL databases. NoSQL injections can be particularly harmful because they allow unrestricted code execution. In this paper we use supervised learning and natural language processing to construct a model to detect NoSQL injections. Our model is designed to work with MongoDB, CouchDB, CassandraDB, and Couchbase queries. Our model has achieved an F1 score of 0.95 as established by 10-fold cross validation.

Authored by Sivakami Praveen, Alysha Dcouth, A Mahesh