Insider Threat - In recent years, data security incidents caused by insider threats in distributed file systems have attracted the attention of academia and industry. The most common way to detect insider threats is based on user profiles. Through analysis, we realize that based on existing user profiles are not efficient enough, and there are many false positives when a stable user profile has not yet been formed. In this work, we propose personalized user profiles and design an insider threat detection framework, which can intelligently detect insider threats for securing distributed file systems in real-time. To generate personalized user profiles, we come up with a time window-based clustering algorithm and a weighted kernel density estimation algorithm. Compared with non-personalized user profiles, both the Recall and Precision of insider threat detection based on personalized user profiles have been improved, resulting in their harmonic mean F1 increased to 96.52\%. Meanwhile, to reduce the false positives of insider threat detection, we put forward operation recommendations based on user similarity to predict new operations that users will produce in the future, which can reduce the false positive rate (FPR). The FPR is reduced to 1.54\% and the false positive identification rate (FPIR) is as high as 92.62\%. Furthermore, to mitigate the risks caused by inaccurate authorization for users, we present user tags based on operation content and permission. The experimental results show that our proposed framework can detect insider threats more effectively and precisely, with lower FPR and high FPIR.
Authored by Wu Xin, Qingni Shen, Ke Feng, Yutang Xia, Zhonghai Wu, Zhenghao Lin
Insider Threat - A malicious insider threat is more vulnerable to an organization. It is necessary to detect the malicious insider because of its huge impact to an organization. The occurrence of a malicious insider threat is less but quite destructive. So, the major focus of this paper is to detect the malicious insider threat in an organization. The traditional insider threat detection algorithm is not suitable for real time insider threat detection. A supervised learning-based anomaly detection technique is used to classify, predict and detect the malicious and non-malicious activity based on highest level of anomaly score. In this paper, a framework is proposed to detect the malicious insider threat using supervised learning-based anomaly detection. It is used to detect the malicious insider threat activity using One-Class Support Vector Machine (OCSVM). The experimental results shows that the proposed framework using OCSVM performs well and detects the malicious insider who obtain huge anomaly score than a normal user.
Authored by G. Padmavathi, D. Shanmugapriya, S. Asha
Information Theoretic Security - Building occupancy data helps increase energy management systems’ performance, enabling lower energy use while preserving occupant comfort. The focus of this study is employing environmental data (e.g., including but not limited to temperature, humidity, carbon dioxide (CO2), etc.) to infer occupancy information. This will be achieved by exploring the application of information theory metrics with machine learning (ML) approaches to classify occupancy levels for a given dataset. Three datasets and six distinct ML algorithms were used in a comparative study to determine the best strategy for identifying occupancy patterns. It was determined that both k-nearest neighbors (kNN) and random forest (RF) identify occupancy labels with the highest overall level of accuracy, reaching 97.99\% and 98.56\%, respectively.
Authored by Aya Sayed, Ridha Hamila, Yassine Himeur, Faycal Bensaali
Industrial Control Systems - The new paradigm of industrial development, called Industry 4.0, faces the problems of Cybersecurity, and as it has already manifested itself in Information Systems, focuses on the use of Artificial Intelligence tools. The authors of this article build on their experience with the use of the above mentioned tools to increase the resilience of Information Systems against Cyber threats, approached to the choice of an effective structure of Cyber-protection of Industrial Systems, primarily analyzing the objective differences between them and Information Systems. A number of analyzes show increased resilience of the decentralized architecture in the management of large-scale industrial processes to the centralized management architecture. These considerations provide sufficient grounds for the team of the project to give preference to the decentralized structure with flock behavior for further research and experiments. The challenges are to determine the indicators which serve to assess and compare the impacts on the controlled elements.
Authored by Roumen Trifonov, Slavcho Manolov, Georgi Tsochev, Galya Pavlova, Kamelia Raynova
Artificial intelligence creation comes into fashion and has brought unprecedented challenges to intellectual property law. In order to study the viewpoints of AI creation copyright ownership from professionals in different institutions, taking the papers of AI creation on CNKI from 2016 to 2021, we applied orthogonal design and analysis of variance method to construct the dataset. A kernel-SVM classifier with different kernel methods in addition to some shallow machine learning classifiers are selected in analyzing and predicting the copyright ownership of AI creation. Support vector machine (svm) is widely used in statistics and the performance of SVM method is closely related to the choice of the kernel function. SVM with RBF kernel surpasses the other seven kernel-SVM classifiers and five shallow classifier, although the accuracy provided by all of them was not satisfactory. Various performance metrics such as accuracy, F1-score are used to evaluate the performance of KSVM and other classifiers. The purpose of this study is to explore the overall viewpoints of AI creation copyright ownership, investigate the influence of different features on the final copyright ownership and predict the most likely viewpoint in the future. And it will encourage investors, researchers and promote intellectual property protection in China.
Authored by Xinjia Xie, Yunxiao Guo, Jiangting Yin, Shun Gai, Han Long
The rapidly increasing malware threats must be coped with new effective malware detection methodologies. Current malware threats are not limited to daily personal transactions but dowelled deeply within large enterprises and organizations. This paper introduces a new methodology for detecting and discriminating malicious versus normal applications. In this paper, we employed Ant-colony optimization to generate two behavioural graphs that characterize the difference in the execution behavior between malware and normal applications. Our proposed approach relied on the API call sequence generated when an application is executed. We used the API calls as one of the most widely used malware dynamic analysis features. Our proposed method showed distinctive behavioral differences between malicious and non-malicious applications. Our experimental results showed a comparative performance compared to other machine learning methods. Therefore, we can employ our method as an efficient technique in capturing malicious applications.
Authored by Eslam Amer, Adham Samir, Hazem Mostafa, Amer Mohamed, Mohamed Amin
Open set recognition (OSR) problem has been a challenge in many machine learning (ML) applications, such as security. As new/unknown malware families occur regularly, it is difficult to exhaust samples that cover all the classes for the training process in ML systems. An advanced malware classification system should classify the known classes correctly while sensitive to the unknown class. In this paper, we introduce a self-supervised pre-training approach for the OSR problem in malware classification. We propose two transformations for the function call graph (FCG) based malware representations to facilitate the pretext task. Also, we present a statistical thresholding approach to find the optimal threshold for the unknown class. Moreover, the experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem.
Authored by Jingyun Jia, Philip Chan
The Internet of things (IoT) is proving to be a boon in granting internet access to regularly used objects and devices. Sensors, programs, and other innovations interact and trade information with different gadgets and frameworks over the web. Even in modern times, IoT gadgets experience the ill effects of primary security threats, which expose them to many dangers and malware, one among them being IoT botnets. Botnets carry out attacks by serving as a vector and this has become one of the significant dangers on the Internet. These vectors act against associations and carry out cybercrimes. They are used to produce spam, DDOS attacks, click frauds, and steal confidential data. IoT gadgets bring various challenges unlike the common malware on PCs and Android devices as IoT gadgets have heterogeneous processor architecture. Numerous researches use static or dynamic analysis for detection and classification of botnets on IoT gadgets. Most researchers haven t addressed the multi-architecture issue and they use a lot of computing resources for analyzing. Therefore, this approach attempts to classify botnets in IoT by using PSI-Graphs which effectively addresses the problem of encryption in IoT botnet detection, tackles the multi-architecture problem, and reduces computation time. It proposes another methodology for describing and recognizing botnets utilizing graph-based Machine Learning techniques and Exploratory Data Analysis to analyze the data and identify how separable the data is to recognize bots at an earlier stage so that IoT devices can be prevented from being attacked.
Authored by Putsa Pranav, Sachin Verma, Sahana Shenoy, S. Saravanan
Malicious cybersecurity activities have become increasingly worrisome for individuals and companies alike. While machine learning methods like Graph Neural Networks (GNNs) have proven successful on the malware detection task, their output is often difficult to understand. Explainable malware detection methods are needed to automatically identify malicious programs and present results to malware analysts in a way that is human interpretable. In this survey, we outline a number of GNN explainability methods and compare their performance on a real-world malware detection dataset. Specifically, we formulated the detection problem as a graph classification problem on the malware Control Flow Graphs (CFGs). We find that gradient-based methods outperform perturbation-based methods in terms of computational expense and performance on explainer-specific metrics (e.g., Fidelity and Sparsity). Our results provide insights into designing new GNN-based models for cyber malware detection and attribution.
Authored by Dana Warmsley, Alex Waagen, Jiejun Xu, Zhining Liu, Hanghang Tong
Detection of malware and security attacks is a complex process that can vary in its details and analysis activities. As part of the detection process, malware scanners try to categorize a malware once it is detected under one of the known malware categories (e.g. worms, spywares, viruses, etc.). However, many studies and researches indicate problems with scanners categorizing or identifying a particular malware under more than one malware category. This paper, and several others, show that machine learning can be used for malware detection especially with ensemble base prediction methods. In this paper, we evaluated several custom-built ensemble models. We focused on multi-label malware classification as individual or classical classifiers showed low accuracy in such territory.This paper showed that recent machine models such as ensemble and deep learning can be used for malware detection with better performance in comparison with classical models. This is very critical in such a dynamic and yet important detection systems where challenges such as the detection of unknown or zero-day malware will continue to exist and evolve.
Authored by Izzat Alsmadi, Bilal Al-Ahmad, Mohammad Alsmadi
Android malware is continuously evolving at an alarming rate due to the growing vulnerabilities. This demands more effective malware detection methods. This paper presents DynaMalDroid, a dynamic analysis-based framework to detect malicious applications in the Android platform. The proposed framework contains three modules: dynamic analysis, feature engineering, and detection. We utilized the well-known CICMalDroid2020 dataset, and the system calls of apps are extracted through dynamic analysis. We trained our proposed model to recognize malware by selecting features obtained through the feature engineering module. Further, with these selected features, the detection module applies different Machine Learning classifiers like Random Forest, Decision Tree, Logistic Regression, Support Vector Machine, Naïve-Bayes, K-Nearest Neighbour, and AdaBoost, to recognize whether an application is malicious or not. The experiments have shown that several classifiers have demonstrated excellent performance and have an accuracy of up to 99\%. The models with Support Vector Machine and AdaBoost classifiers have provided better detection accuracy of 99.3\% and 99.5\%, respectively.
Authored by Hashida Manzil, Manohar S
Estimation for obesity levels is always an important topic in medical field since it can provide useful guidance for people that would like to lose weight or keep fit. The article tries to find a model that can predict obesity and provides people with the information of how to avoid overweight. To be more specific, this article applied dimension reduction to the data set to simplify the data and tried to Figure out a most decisive feature of obesity through Principal Component Analysis (PCA) based on the data set. The article also used some machine learning methods like Support Vector Machine (SVM), Decision Tree to do prediction of obesity and wanted to find the major reason of obesity. In addition, the article uses Artificial Neural Network (ANN) to do prediction which has more powerful feature extraction ability to do this. Finally, the article found that family history of obesity is the most decisive feature, and it may because of obesity may be greatly affected by genes or the family eating diet may have great influence. And both ANN and Decision tree’s accuracy of prediction is higher than 90\%.
Authored by Zhenghao He
Fashion is the way we present ourselves which mainly focuses on vision, has attracted great interest from computer vision researchers. It is generally used to search fashion products in online shopping malls to know the descriptive information of the product. The main objectives of our paper is to use deep learning (DL) and machine learning (ML) methods to correctly identify and categorize clothing images. In this work, we used ML algorithms (support vector machines (SVM), K-Nearest Neirghbors (KNN), Decision tree (DT), Random Forest (RF)), DL algorithms (Convolutionnal Neurals Network (CNN), AlexNet, GoogleNet, LeNet, LeNet5) and the transfer learning using a pretrained models (VGG16, MobileNet and RestNet50). We trained and tested our models online using google colaboratory with Tensorflow/Keras and Scikit-Learn libraries that support deep learning and machine learning in Python. The main metric used in our study to evaluate the performance of ML and DL algorithms is the accuracy and matrix confusion. The best result for the ML models is obtained with the use of ANN (88.71\%) and for the DL models is obtained for the GoogleNet architecture (93.75\%). The results obtained showed that the number of epochs and the depth of the network have an effect in obtaining the best results.
Authored by Bougareche Samia, Zehani Soraya, Mimi Malika
In this paper, stock selection strategy design based on machine learning and multi-factor analysis is a research hotspot in quantitative investment field. Four machine learning algorithms including support vector machine, gradient lifting regression, random forest and linear regression are used to predict the rise and fall of stocks by taking stock fundamentals as input variables. The portfolio strategy is constructed on this basis. Finally, the stock selection strategy is further optimized. The empirical results show that the multifactor quantitative stock selection strategy has a good stock selection effect, and yield performance under the support vector machine algorithm is the best. With the increase of the number of factors, there is an inverse relationship between the fitting degree and the yield under various algorithms.
Authored by Chengzhao Zhang, Huiyue Tang
An IDS is a system that helps in detecting any kind of doubtful activity on a computer network. It is capable of identifying suspicious activities at both the levels i.e. locally at the system level and in transit at the network level. Since, the system does not have its own dataset as a result it is inefficient in identifying unknown attacks. In order to overcome this inefficiency, we make use of ML. ML assists in analysing and categorizing attacks on diverse datasets. In this study, the efficacy of eight machine learning algorithms based on KDD CUP99 is assessed. Based on our implementation and analysis, amongst the eight Algorithms considered here, Support Vector Machine (SVM), Random Forest (RF) and Decision Tree (DT) have the highest testing accuracy of which got SVM does have the highest accuracy
Authored by Utkarsh Dixit, Suman Bhatia, Pramod Bhatia
Sentiment Analysis (SA) is an approach for detecting subjective information such as thoughts, outlooks, reactions, and emotional state. The majority of previous SA work treats it as a text-classification problem that requires labelled input to train the model. However, obtaining a tagged dataset is difficult. We will have to do it by hand the majority of the time. Another concern is that the absence of sufficient cross-domain portability creates challenging situation to reuse same-labelled data across applications. As a result, we will have to manually classify data for each domain. This research work applies sentiment analysis to evaluate the entire vaccine twitter dataset. The work involves the lexicon analysis using NLP libraries like neattext, textblob and multi class classification using BERT. This word evaluates and compares the results of the machine learning algorithms.
Authored by Amarjeet Rawat, Himani Maheshwari, Manisha Khanduja, Rajiv Kumar, Minakshi Memoria, Sanjeev Kumar
This study develops a framework for personalized care to tackle heart disease risk using an at-home system. The machine learning models used to predict heart disease are Logistic Regression, K - Nearest Neighbor, Support Vector Machine, Naive Bayes, Decision Tree, Random Forest and XG Boost. Timely and efficient detection of heart disease plays an important role in health care. It is essential to detect cardiovascular disease (CVD) at the earliest, consult a specialist doctor before the severity of the disease and start medication. The performance of the proposed model was assessed using the Cleveland Heart Disease dataset from the UCI Machine Learning Repository. Compared to all machine learning algorithms, the Random Forest algorithm shows a better performance accuracy score of 90.16\%. The best model may evaluate patient fitness rather than routine hospital visits. The proposed work will reduce the burden on hospitals and help hospitals reach only critical patients.
Authored by Goutam Sahoo, Keerthana Kanike, Santos Das, Poonam Singh
A good ecological environment is crucial to attracting talents, cultivating talents, retaining talents and making talents fully effective. This study provides a solution to the current mainstream problem of how to deal with excellent employee turnover in advance, so as to promote the sustainable and harmonious human resources ecological environment of enterprises with a shortage of talents.This study obtains open data sets and conducts data preprocessing, model construction and model optimization, and describes a set of enterprise employee turnover prediction models based on RapidMiner workflow. The data preprocessing is completed with the help of the data statistical analysis software IBM SPSS Statistic and RapidMiner.Statistical charts, scatter plots and boxplots for analysis are generated to realize data visualization analysis. Machine learning, model application, performance vector, and cross-validation through RapidMiner s multiple operators and workflows. Model design algorithms include support vector machines, naive Bayes, decision trees, and neural networks. Comparing the performance parameters of the algorithm model from the four aspects of accuracy, precision, recall and F1-score. It is concluded that the performance of the decision tree algorithm model is the highest. The performance evaluation results confirm the effectiveness of this model in sustainable exploring of enterprise employee turnover prediction in human resource management.
Authored by Yong Shi
With the development of technology, mobile phones are an indispensable part of human life. Factors such as brand, internal memory, wifi, battery power, camera and availability of 4G are now modifying consumers decisions on buying mobile phones. But people fail to link those factors with the price of mobile phones; in this case, this paper is aimed to figure out the problem by using machine learning algorithms like Support Vector Machine, Decision Tree, K Nearest Neighbors and Naive Bayes to train the mobile phone dataset before making predictions of the price level. We used appropriate algorithms to predict smartphone prices based on accuracy, precision, recall and F1 score. This not only helps customers have a better choice on the mobile phone but also gives advice to businesses selling mobile phones that the way to set reasonable prices with the different features they offer. This idea of predicting prices level will give support to customers to choose mobile phones wisely in the future. The result illustrates that among the 4 classifiers, SVM returns to the most desirable performance with 94.8\% of accuracy, 97.3 of F1 score (without feature selection) and 95.5\% of accuracy, 97.7\% of F1 score (with feature selection).
Authored by Ningyuan Hu
Nowadays, the MOBA game is the game type with the most audiences and players around the world. Recently, the League of Legends has become an official sport as an e-sport among 37 events in the 2022 Asia Games held in Hangzhou. As the development in the e-sport, analytical skills are also involved in this field. The topic of this research is to use the machine learning approach to analyze the data of the League of Legends and make a prediction about the result of the game. In this research, the method of machine learning is applied to the dataset which records the first 10 minutes in diamond-ranked games. Several popular machine learning (AdaBoost, GradientBoost, RandomForest, ExtraTree, SVM, Naïve Bayes, KNN, LogisticRegression, and DecisionTree) are applied to test the performance by cross-validation. Then several algorithms that outperform others are selected to make a voting classifier to predict the game result. The accuracy of the voting classifier is 72.68\%.
Authored by Qiyuan Shen
In this research work, we attempted to predict the creditworthiness of smartphone users in Indonesia during the COVID-19 pandemic using machine learning. Principal Component Analysis (PCA) and Kmeans algorithms are used for the prediction of creditworthiness with the used a dataset of 1050 respondents consisting of twelve questions to smartphone users in Indonesia during the COVID-19 pandemic. The four different classification algorithms (Logistic Regression, Support Vector Machine, Decision Tree, and Naive Bayes) were tested to classify the creditworthiness of smartphone users in Indonesia. The tests carried out included testing for accuracy, precision, recall, F1-score, and Area Under Curve Receiver Operating Characteristics (AUCROC) assesment. Logistic Regression algorithm shows the perfect performances whereas Naïve Bayes (NB) shows the least. The results of this research also provide new knowledge about the influential and non-influential variables based on the twelve questions conducted to the respondents of smartphone users in Indonesia during the COVID-19 pandemic.
Authored by R Winahyu, Maman Somantri, Oky Nurhayati
This paper provides an end-to-end solution to defend against known microarchitectural attacks such as speculative execution attacks, fault-injection attacks, covert and side channel attacks, and unknown or evasive versions of these attacks. Current defenses are attack specific and can have unacceptably high performance overhead. We propose an approach that reduces the overhead of state-of-art defenses by over 95%, by applying defenses only when attacks are detected. Many current proposed mitigations are not practical for deployment; for example, InvisiSpec has 27% overhead and Fencing has 74% overhead while protecting against only Spectre attacks. Other mitigations carry similar performance penalties. We reduce the overhead for InvisiSpec to 1.26% and for Fencing to 3.45% offering performance and security for not only spectre attacks but other known transient attacks as well, including the dangerous class of LVI and Rowhammer attacks, as well as covering a large set of future evasive and zero-day attacks. Critical to our approach is an accurate detector that is not fooled by evasive attacks and that can generalize to novel zero-day attacks. We use a novel Generative framework, Evasion Vaccination (EVAX) for training ML models and engineering new security-centric performance counters. EVAX significantly increases sensitivity to detect and classify attacks in time for mitigation to be deployed with low false positives (4 FPs in every 1M instructions in our experiments). Such performance enables efficient and timely mitigations, enabling the processor to automatically switch between performance and security as needed.
Authored by Samira Ajorpaz, Daniel Moghimi, Jeffrey Collins, Gilles Pokam, Nael Abu-Ghazaleh, Dean Tullsen
Many studies have been conducted to detect various malicious activities in cyberspace using classifiers built by machine learning. However, it is natural for any classifier to make mistakes, and hence, human verification is necessary. One method to address this issue is eXplainable AI (XAI), which provides a reason for the classification result. However, when the number of classification results to be verified is large, it is not realistic to check the output of the XAI for all cases. In addition, it is sometimes difficult to interpret the output of XAI. In this study, we propose a machine learning model called classification verifier that verifies the classification results by using the output of XAI as a feature and raises objections when there is doubt about the reliability of the classification results. The results of experiments on malicious website detection and malware detection show that the proposed classification verifier can efficiently identify misclassified malicious activities.
Authored by Koji Fujita, Toshiki Shibahara, Daiki Chiba, Mitsuaki Akiyama, Masato Uchida
The rapid shift towards smart cities, particularly in the era of pandemics, necessitates the employment of e-learning, remote learning systems, and hybrid models. Building adaptive and personalized education becomes a requirement to mitigate the downsides of distant learning while maintaining high levels of achievement. Explainable artificial intelligence (XAI), machine learning (ML), and the internet of behaviour (IoB) are just a few of the technologies that are helping to shape the future of smart education in the age of smart cities through Customization and personalization. This study presents a paradigm for smart education based on the integration of XAI and IoB technologies. The research uses data acquired on students' behaviours to determine whether or not the current education systems respond appropriately to learners' requirements. Despite the existence of sophisticated education systems, they have not yet reached the degree of development that allows them to be tailored to learners' cognitive needs and support them in the absence of face-to-face instruction. The study collected data on 41 learner's behaviours in response to academic activities and assessed whether the running systems were able to capture such behaviours and respond appropriately or not; the study used evaluation methods that demonstrated that there is a change in students' academic progression concerning monitoring using IoT/IoB to enable a relative response to support their progression.
Authored by Ossama Embarak
In this paper, we present the architecture of a Smart Industry inspired platform designed for Agriculture 4.0 applications and, specifically, to optimize an ecosystem of SW and HW components for animal repelling. The platform implementation aims to obtain reliability and energy efficiency in a system aimed to detect, recognize, identify, and repel wildlife by generating specific ultrasound signals. The wireless sensor network is composed of OpenMote hardware devices coordinated on a mesh network based on the 6LoWPAN protocol, and connected to an FPGA-based board. The system, activated when an animal is detected, elaborates the data received from a video camera connected to FPGA-based hardware devices and then activates different ultrasonic jammers belonging to the OpenMotes network devices. This way, in real-time wildlife will be progressively moved away from the field to be preserved by the activation of specific ultrasonic generators. To monitor the daily behavior of the wildlife, the ecosystem is expanded using a time series database running on a Cloud platform.
Authored by Marialaura Tamburello, Giuseppe Caruso, Stefano Giordano, Davide Adami, Mike Ojo