Metadata Discovery Problem - Open Educational Resources (OER) are educational materials that are available in different repositories such as Merlot, SkillsCommons, MIT OpenCourseWare, etc. The quality of metadata facilitates the search and discovery tasks of educational resources. This work evaluates the metadata quality of 4142 OER from SkillsCommons. We applied supervised machine learning algorithms (Support Vector Machine and Random Forest Classifier) for automatic classification of two metadata: description and material type. Based on our data and model, performances of a first classification effort is reported with the accuracy of 70\%.
Authored by Veronica Segarra-Faggioni, Audrey Romero-Pelaez
Measurement and Metrics Testing - This paper belongs to a sequence of manuscripts that discuss generic and easy-to-apply security metrics for Strong PUFs. These metrics cannot and shall not fully replace in-depth machine learning (ML) studies in the security assessment of Strong PUF candidates. But they can complement the latter, serve in initial PUF complexity analyses, and are much easier and more efficient to apply: They do not require detailed knowledge of various ML methods, substantial computation times, or the availability of an internal parametric model of the studied PUF. Our metrics also can be standardized particularly easily. This avoids the sometimes inconclusive or contradictory findings of existing ML-based security test, which may result from the usage of different or non-optimized ML algorithms and hyperparameters, differing hardware resources, or varying numbers of challenge-response pairs in the training phase.
Authored by Fynn Kappelhoff, Rasmus Rasche, Debdeep Mukhopadhyay, Ulrich Rührmair
MANET Security - Recently, the mobile ad hoc network (MANET) has enjoyed a great reputation thanks to its advantages such as: high performance, no expensive infrastructure to install, use of unlicensed frequency spectrum, and fast distribution of information around the transmitter. But the topology of MANETs attracts the attention of several attacks. Although authentication and encryption techniques can provide some protection, especially by minimizing the number of intrusions, such cryptographic techniques do not work effectively in the case of unseen or unknown attacks. In this case, the machine learning approach is successful to detect unfamiliar intrusive behavior. Security methodologies in MANETs mainly focus on eliminating malicious attacks, misbehaving nodes, and providing secure routing.
Authored by Wafa Bouassaba, Abdellah Nabou, Mohammed Ouzzif
Malware Classification - The past decades witness the development of various Machine Learning (ML) models for malware classification. Semantic representation is a crucial basis for these classifiers. This paper aims to assess the effect of semantic representation methods on malware classifier performance. Two commonly-used semantic representation methods including N-gram and GloVe. We utilize diverse ML classifiers to conduct comparative experiments to analyze the capability of N-gram, GloVe and image-based methods for malware classification. We also analyze deeply the reason why the GloVe can produce negative effects on malware static analysis.
Authored by Bingchu Jin, Zesheng Hu, Jianhua Wang, Monong Wei, Yawei Zhao, Chao Xue
Malware Classification - Malware attack is a severe problem that can cause a considerable loss. To prevent the malware attack, different malware detection and classification method have been implemented in recent years. This paper proposed a new method based on Markov image and transfer learning on machine learning. Also, an experience comparing the performance on malware detection and classification between the proposed and grayscale methods was done. The accuracy and loss of malware detection and classification by using the proposed method are 0.973 and 0.076, 0.987 and 0.062 respectively. The accuracy and loss of malware detection and classification using the grayscale method are 0.989 and 0.037, 0.973 and 0.202 respectively. Although the grayscale method has done better in malware detection, the proposed method's accuracy is over 0.97. Therefore, the result shows that the proposed method are suitable for malware detection and classification.
Authored by Lok Kwan
Malware Classification - Due to the constant updates of malware and its variants and the continuous development of malware obfuscation techniques. Malware intrusions targeting Windows hosts are also on the rise. Traditional static analysis methods such as signature matching mechanisms have been difficult to adapt to the detection of new malware. Therefore, a novel visual detection method of malware is proposed for first-time to convert the Windows API call sequence with sequential nature into feature images based on the Gramian Angular Field (GAF) idea, and train a neural network to identify malware. The experimental results demonstrate the effectiveness of our proposed method. For the binary classification of malware, the GAF visualization image of the API call sequence is compared with its original sequence. After GAF visualization, the classification accuracy of the classic machine learning model MLP is improved by 9.64%, and the classification accuracy of the deep learning model CNN is improved by 4.82%. Furthermore, our experiments show that the proposed method is also feasible and effective for the multi-class classification of malware.
Authored by Hongmei Zhang, Xiaoqian Yun, Xiaofang Deng, Xiaoxiong Zhong
MANET Attack Detection - Mobile Adhoc Networks also known as MANETS or Wireless Adhoc Networks is a network that usually has a routable networking environment on top of a Link Layer ad hoc network. They consist of a set of mobile nodes connected wirelessly in a self-configured, self-healing network without having a fixed infrastructure. MANETS, have been predominantly utilized in military or emergency situations however, the prospects of Manets’ usage outside these realms is now being considered for possible public adoption in light of the recent global events such as the pandemic and new emerging infectious diseases. These particular events birthed new challenges, one of which was the considerable strain that was placed on mainstream ISP’s. Whilst there has been a significant amount of research conducted in the sphere Manet Security via various means such as: development of intrusion detection systems, attack classification and prediction systems, etcetera. There still exists prevailing concerns of MANET security and risks. Additionally, recently researched trends within the field has evidenced key disparities in terms of studies related to MANET Risk profiles. This paper seeks to provide an overview of existing studies with respect to MANETS as well as briefly introduces a new method of determining the initial Risk Profile of MANETS via the usage of probabilistic machine learning techniques. It explores new regions of probability-based approaches to further supplement the existing impact-based methodologies for assessing risk within Manets.
Authored by Hosein Michael, Aqui Jedidiah
MANET Attack Detection - Recently, the mobile ad hoc network (MANET) has enjoyed a great reputation thanks to its advantages such as: high performance, no expensive infrastructure to install, use of unlicensed frequency spectrum, and fast distribution of information around the transmitter. But the topology of MANETs attracts the attention of several attacks. Although authentication and encryption techniques can provide some protection, especially by minimizing the number of intrusions, such cryptographic techniques do not work effectively in the case of unseen or unknown attacks. In this case, the machine learning approach is successful to detect unfamiliar intrusive behavior. Security methodologies in MANETs mainly focus on eliminating malicious attacks, misbehaving nodes, and providing secure routing. In this paper we present to most recent works that propose or apply the concept of Machine Learning (ML) to secure the MANET environment.
Authored by Wafa Bouassaba, Abdellah Nabou, Mohammed Ouzzif
MANET Attack Prevention - Wireless ad hoc networks are characterized by dynamic topology and high node mobility. Network attacks on wireless ad hoc networks can significantly reduce performance metrics, such as the packet delivery ratio from the source to the destination node, overhead, throughput, etc. The article presents an experimental study of an intrusion detection system prototype in mobile ad hoc networks based on machine learning. The experiment is carried out in a MANET segment of 50 nodes, the detection and prevention of DDoS and cooperative blackhole attacks are investigated. The dependencies of features on the type of network traffic and the dependence of performance metrics on the speed of mobile nodes in the network are investigated. The conducted experimental studies show the effectiveness of an intrusion detection system prototype on simulated data.
Authored by Leonid Legashev, Luybov Grishina
Malware Analysis - Detection of malware and security attacks is a complex process that can vary in its details and analysis activities. As part of the detection process, malware scanners try to categorize a malware once it is detected under one of the known malware categories (e.g. worms, spywares, viruses, etc.). However, many studies and researches indicate problems with scanners categorizing or identifying a particular malware under more than one malware category. This paper, and several others, show that machine learning can be used for malware detection especially with ensemble base prediction methods. In this paper, we evaluated several custom-built ensemble models. We focused on multi-label malware classification as individual or classical classifiers showed low accuracy in such territory.This paper showed that recent machine models such as ensemble and deep learning can be used for malware detection with better performance in comparison with classical models. This is very critical in such a dynamic and yet important detection systems where challenges such as the detection of unknown or zero-day malware will continue to exist and evolve.
Authored by Izzat Alsmadi, Bilal Al-Ahmad, Mohammad Alsmadi
Malware Analysis - Android malware is continuously evolving at an alarming rate due to the growing vulnerabilities. This demands more effective malware detection methods. This paper presents DynaMalDroid, a dynamic analysis-based framework to detect malicious applications in the Android platform. The proposed framework contains three modules: dynamic analysis, feature engineering, and detection. We utilized the well-known CICMalDroid2020 dataset, and the system calls of apps are extracted through dynamic analysis. We trained our proposed model to recognize malware by selecting features obtained through the feature engineering module. Further, with these selected features, the detection module applies different Machine Learning classifiers like Random Forest, Decision Tree, Logistic Regression, Support Vector Machine, Naïve-Bayes, K-Nearest Neighbour, and AdaBoost, to recognize whether an application is malicious or not. The experiments have shown that several classifiers have demonstrated excellent performance and have an accuracy of up to 99\%. The models with Support Vector Machine and AdaBoost classifiers have provided better detection accuracy of 99.3\% and 99.5\%, respectively.
Authored by Hashida Manzil, Manohar S
Information Reuse and Security - Common Vulnerabilities and Exposures (CVE) databases contain information about vulnerabilities of software products and source code. If individual elements of CVE descriptions can be extracted and structured, then the data can be used to search and analyze CVE descriptions. Herein we propose a method to label each element in CVE descriptions by applying Named Entity Recognition (NER). For NER, we used BERT, a transformer-based natural language processing model. Using NER with machine learning can label information from CVE descriptions even if there are some distortions in the data. An experiment involving manually prepared label information for 1000 CVE descriptions shows that the labeling accuracy of the proposed method is about 0.81 for precision and about 0.89 for recall. In addition, we devise a way to train the data by dividing it into labels. Our proposed method can be used to label each element automatically from CVE descriptions.
Authored by Kensuke Sumoto, Kenta Kanakogi, Hironori Washizaki, Naohiko Tsuda, Nobukazu Yoshioka, Yoshiaki Fukazawa, Hideyuki Kanuka
Information Reuse and Security - Successive approximation register analog-to-digital converter (SAR ADC) is widely adopted in the Internet of Things (IoT) systems due to its simple structure and high energy efficiency. Unfortunately, SAR ADC dissipates various and unique power features when it converts different input signals, leading to severe vulnerability to power side-channel attack (PSA). The adversary can accurately derive the input signal by only measuring the power information from the analog supply pin (AVDD), digital supply pin (DVDD), and/or reference pin (Ref) which feed to the trained machine learning models. This paper first presents the detailed mathematical analysis of power side-channel attack (PSA) to SAR ADC, concluding that the power information from AVDD is the most vulnerable to PSA compared with the other supply pin. Then, an LSB-reused protection technique is proposed, which utilizes the characteristic of LSB from the SAR ADC itself to protect against PSA. Lastly, this technique is verified in a 12-bit 5 MS/s secure SAR ADC implemented in 65nm technology. By using the current waveform from AVDD, the adopted convolutional neural network (CNN) algorithms can achieve \textgreater99\% prediction accuracy from LSB to MSB in the SAR ADC without protection. With the proposed protection, the bit-wise accuracy drops to around 50\%.
Authored by Lele Fang, Jiahao Liu, Yan Zhu, Chi-Hang Chan, Rui Martins
Intrusion Intolerance - Container-based virtualization has gained momentum over the past few years thanks to its lightweight nature and support for agility. However, its appealing features come at the price of a reduced isolation level compared to the traditional host-based virtualization techniques, exposing workloads to various faults, such as co-residency attacks like container escape. In this work, we propose to leverage the automated management capabilities of containerized environments to derive a Fault and Intrusion Tolerance (FIT) framework based on error detection-recovery and fault treatment. Namely, we aim at deriving a specification-based error detection mechanism at the host level to systematically and formally capture security state errors indicating breaches potentially caused by malicious containers. Although the paper focuses on security side use cases, results are logically extendable to accidental faults. Our aim is to immunize the target environments against accidental and malicious faults and preserve their core dependability and security properties.
Authored by Taous Madi, Paulo Esteves-Verissimo
Malware Analysis and Graph Theory - The rapidly increasing malware threats must be coped with new effective malware detection methodologies. Current malware threats are not limited to daily personal transactions but dowelled deeply within large enterprises and organizations. This paper introduces a new methodology for detecting and discriminating malicious versus normal applications. In this paper, we employed Ant-colony optimization to generate two behavioural graphs that characterize the difference in the execution behavior between malware and normal applications. Our proposed approach relied on the API call sequence generated when an application is executed. We used the API calls as one of the most widely used malware dynamic analysis features. Our proposed method showed distinctive behavioral differences between malicious and non-malicious applications. Our experimental results showed a comparative performance compared to other machine learning methods. Therefore, we can employ our method as an efficient technique in capturing malicious applications.
Authored by Eslam Amer, Adham Samir, Hazem Mostafa, Amer Mohamed, Mohamed Amin
Malware Analysis and Graph Theory - Open set recognition (OSR) problem has been a challenge in many machine learning (ML) applications, such as security. As new/unknown malware families occur regularly, it is difficult to exhaust samples that cover all the classes for the training process in ML systems. An advanced malware classification system should classify the known classes correctly while sensitive to the unknown class. In this paper, we introduce a self-supervised pre-training approach for the OSR problem in malware classification. We propose two transformations for the function call graph (FCG) based malware representations to facilitate the pretext task. Also, we present a statistical thresholding approach to find the optimal threshold for the unknown class. Moreover, the experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem.
Authored by Jingyun Jia, Philip Chan
Malware Analysis and Graph Theory - The Internet of things (IoT) is proving to be a boon in granting internet access to regularly used objects and devices. Sensors, programs, and other innovations interact and trade information with different gadgets and frameworks over the web. Even in modern times, IoT gadgets experience the ill effects of primary security threats, which expose them to many dangers and malware, one among them being IoT botnets. Botnets carry out attacks by serving as a vector and this has become one of the significant dangers on the Internet. These vectors act against associations and carry out cybercrimes. They are used to produce spam, DDOS attacks, click frauds, and steal confidential data. IoT gadgets bring various challenges unlike the common malware on PCs and Android devices as IoT gadgets have heterogeneous processor architecture. Numerous researches use static or dynamic analysis for detection and classification of botnets on IoT gadgets. Most researchers haven t addressed the multi-architecture issue and they use a lot of computing resources for analyzing. Therefore, this approach attempts to classify botnets in IoT by using PSI-Graphs which effectively addresses the problem of encryption in IoT botnet detection, tackles the multi-architecture problem, and reduces computation time. It proposes another methodology for describing and recognizing botnets utilizing graph-based Machine Learning techniques and Exploratory Data Analysis to analyze the data and identify how separable the data is to recognize bots at an earlier stage so that IoT devices can be prevented from being attacked.
Authored by Putsa Pranav, Sachin Verma, Sahana Shenoy, S. Saravanan
Malware Analysis and Graph Theory - Malicious cybersecurity activities have become increasingly worrisome for individuals and companies alike. While machine learning methods like Graph Neural Networks (GNNs) have proven successful on the malware detection task, their output is often difficult to understand. Explainable malware detection methods are needed to automatically identify malicious programs and present results to malware analysts in a way that is human interpretable. In this survey, we outline a number of GNN explainability methods and compare their performance on a real-world malware detection dataset. Specifically, we formulated the detection problem as a graph classification problem on the malware Control Flow Graphs (CFGs). We find that gradient-based methods outperform perturbation-based methods in terms of computational expense and performance on explainer-specific metrics (e.g., Fidelity and Sparsity). Our results provide insights into designing new GNN-based models for cyber malware detection and attribution.
Authored by Dana Warmsley, Alex Waagen, Jiejun Xu, Zhining Liu, Hanghang Tong
Malware Analysis - Detection of malware and security attacks is a complex process that can vary in its details and analysis activities. As part of the detection process, malware scanners try to categorize a malware once it is detected under one of the known malware categories (e.g. worms, spywares, viruses, etc.). However, many studies and researches indicate problems with scanners categorizing or identifying a particular malware under more than one malware category. This paper, and several others, show that machine learning can be used for malware detection especially with ensemble base prediction methods. In this paper, we evaluated several custom-built ensemble models. We focused on multi-label malware classification as individual or classical classifiers showed low accuracy in such territory.This paper showed that recent machine models such as ensemble and deep learning can be used for malware detection with better performance in comparison with classical models. This is very critical in such a dynamic and yet important detection systems where challenges such as the detection of unknown or zero-day malware will continue to exist and evolve.
Authored by Izzat Alsmadi, Bilal Al-Ahmad, Mohammad Alsmadi
Malware Analysis - Android malware is continuously evolving at an alarming rate due to the growing vulnerabilities. This demands more effective malware detection methods. This paper presents DynaMalDroid, a dynamic analysis-based framework to detect malicious applications in the Android platform. The proposed framework contains three modules: dynamic analysis, feature engineering, and detection. We utilized the well-known CICMalDroid2020 dataset, and the system calls of apps are extracted through dynamic analysis. We trained our proposed model to recognize malware by selecting features obtained through the feature engineering module. Further, with these selected features, the detection module applies different Machine Learning classifiers like Random Forest, Decision Tree, Logistic Regression, Support Vector Machine, Naïve-Bayes, K-Nearest Neighbour, and AdaBoost, to recognize whether an application is malicious or not. The experiments have shown that several classifiers have demonstrated excellent performance and have an accuracy of up to 99\%. The models with Support Vector Machine and AdaBoost classifiers have provided better detection accuracy of 99.3\% and 99.5\%, respectively.
Authored by Hashida Manzil, Manohar S
Machine Learning - Estimation for obesity levels is always an important topic in medical field since it can provide useful guidance for people that would like to lose weight or keep fit. The article tries to find a model that can predict obesity and provides people with the information of how to avoid overweight. To be more specific, this article applied dimension reduction to the data set to simplify the data and tried to Figure out a most decisive feature of obesity through Principal Component Analysis (PCA) based on the data set. The article also used some machine learning methods like Support Vector Machine (SVM), Decision Tree to do prediction of obesity and wanted to find the major reason of obesity. In addition, the article uses Artificial Neural Network (ANN) to do prediction which has more powerful feature extraction ability to do this. Finally, the article found that family history of obesity is the most decisive feature, and it may because of obesity may be greatly affected by genes or the family eating diet may have great influence. And both ANN and Decision tree’s accuracy of prediction is higher than 90\%.
Authored by Zhenghao He
Machine Learning - Fashion is the way we present ourselves which mainly focuses on vision, has attracted great interest from computer vision researchers. It is generally used to search fashion products in online shopping malls to know the descriptive information of the product. The main objectives of our paper is to use deep learning (DL) and machine learning (ML) methods to correctly identify and categorize clothing images. In this work, we used ML algorithms (support vector machines (SVM), K-Nearest Neirghbors (KNN), Decision tree (DT), Random Forest (RF)), DL algorithms (Convolutionnal Neurals Network (CNN), AlexNet, GoogleNet, LeNet, LeNet5) and the transfer learning using a pretrained models (VGG16, MobileNet and RestNet50). We trained and tested our models online using google colaboratory with Tensorflow/Keras and Scikit-Learn libraries that support deep learning and machine learning in Python. The main metric used in our study to evaluate the performance of ML and DL algorithms is the accuracy and matrix confusion. The best result for the ML models is obtained with the use of ANN (88.71\%) and for the DL models is obtained for the GoogleNet architecture (93.75\%). The results obtained showed that the number of epochs and the depth of the network have an effect in obtaining the best results.
Authored by Bougareche Samia, Zehani Soraya, Mimi Malika
Machine Learning - In this paper, stock selection strategy design based on machine learning and multi-factor analysis is a research hotspot in quantitative investment field. Four machine learning algorithms including support vector machine, gradient lifting regression, random forest and linear regression are used to predict the rise and fall of stocks by taking stock fundamentals as input variables. The portfolio strategy is constructed on this basis. Finally, the stock selection strategy is further optimized. The empirical results show that the multifactor quantitative stock selection strategy has a good stock selection effect, and yield performance under the support vector machine algorithm is the best. With the increase of the number of factors, there is an inverse relationship between the fitting degree and the yield under various algorithms.
Authored by Chengzhao Zhang, Huiyue Tang
Machine Learning - An IDS is a system that helps in detecting any kind of doubtful activity on a computer network. It is capable of identifying suspicious activities at both the levels i.e. locally at the system level and in transit at the network level. Since, the system does not have its own dataset as a result it is inefficient in identifying unknown attacks. In order to overcome this inefficiency, we make use of ML. ML assists in analysing and categorizing attacks on diverse datasets. In this study, the efficacy of eight machine learning algorithms based on KDD CUP99 is assessed. Based on our implementation and analysis, amongst the eight Algorithms considered here, Support Vector Machine (SVM), Random Forest (RF) and Decision Tree (DT) have the highest testing accuracy of which got SVM does have the highest accuracy
Authored by Utkarsh Dixit, Suman Bhatia, Pramod Bhatia
Machine Learning - Sentiment Analysis (SA) is an approach for detecting subjective information such as thoughts, outlooks, reactions, and emotional state. The majority of previous SA work treats it as a text-classification problem that requires labelled input to train the model. However, obtaining a tagged dataset is difficult. We will have to do it by hand the majority of the time. Another concern is that the absence of sufficient cross-domain portability creates challenging situation to reuse same-labelled data across applications. As a result, we will have to manually classify data for each domain. This research work applies sentiment analysis to evaluate the entire vaccine twitter dataset. The work involves the lexicon analysis using NLP libraries like neattext, textblob and multi class classification using BERT. This word evaluates and compares the results of the machine learning algorithms.
Authored by Amarjeet Rawat, Himani Maheshwari, Manisha Khanduja, Rajiv Kumar, Minakshi Memoria, Sanjeev Kumar