In this paper, stock selection strategy design based on machine learning and multi-factor analysis is a research hotspot in quantitative investment field. Four machine learning algorithms including support vector machine, gradient lifting regression, random forest and linear regression are used to predict the rise and fall of stocks by taking stock fundamentals as input variables. The portfolio strategy is constructed on this basis. Finally, the stock selection strategy is further optimized. The empirical results show that the multifactor quantitative stock selection strategy has a good stock selection effect, and yield performance under the support vector machine algorithm is the best. With the increase of the number of factors, there is an inverse relationship between the fitting degree and the yield under various algorithms.
Authored by Chengzhao Zhang, Huiyue Tang
An IDS is a system that helps in detecting any kind of doubtful activity on a computer network. It is capable of identifying suspicious activities at both the levels i.e. locally at the system level and in transit at the network level. Since, the system does not have its own dataset as a result it is inefficient in identifying unknown attacks. In order to overcome this inefficiency, we make use of ML. ML assists in analysing and categorizing attacks on diverse datasets. In this study, the efficacy of eight machine learning algorithms based on KDD CUP99 is assessed. Based on our implementation and analysis, amongst the eight Algorithms considered here, Support Vector Machine (SVM), Random Forest (RF) and Decision Tree (DT) have the highest testing accuracy of which got SVM does have the highest accuracy
Authored by Utkarsh Dixit, Suman Bhatia, Pramod Bhatia
Sentiment Analysis (SA) is an approach for detecting subjective information such as thoughts, outlooks, reactions, and emotional state. The majority of previous SA work treats it as a text-classification problem that requires labelled input to train the model. However, obtaining a tagged dataset is difficult. We will have to do it by hand the majority of the time. Another concern is that the absence of sufficient cross-domain portability creates challenging situation to reuse same-labelled data across applications. As a result, we will have to manually classify data for each domain. This research work applies sentiment analysis to evaluate the entire vaccine twitter dataset. The work involves the lexicon analysis using NLP libraries like neattext, textblob and multi class classification using BERT. This word evaluates and compares the results of the machine learning algorithms.
Authored by Amarjeet Rawat, Himani Maheshwari, Manisha Khanduja, Rajiv Kumar, Minakshi Memoria, Sanjeev Kumar
This study develops a framework for personalized care to tackle heart disease risk using an at-home system. The machine learning models used to predict heart disease are Logistic Regression, K - Nearest Neighbor, Support Vector Machine, Naive Bayes, Decision Tree, Random Forest and XG Boost. Timely and efficient detection of heart disease plays an important role in health care. It is essential to detect cardiovascular disease (CVD) at the earliest, consult a specialist doctor before the severity of the disease and start medication. The performance of the proposed model was assessed using the Cleveland Heart Disease dataset from the UCI Machine Learning Repository. Compared to all machine learning algorithms, the Random Forest algorithm shows a better performance accuracy score of 90.16\%. The best model may evaluate patient fitness rather than routine hospital visits. The proposed work will reduce the burden on hospitals and help hospitals reach only critical patients.
Authored by Goutam Sahoo, Keerthana Kanike, Santos Das, Poonam Singh
A good ecological environment is crucial to attracting talents, cultivating talents, retaining talents and making talents fully effective. This study provides a solution to the current mainstream problem of how to deal with excellent employee turnover in advance, so as to promote the sustainable and harmonious human resources ecological environment of enterprises with a shortage of talents.This study obtains open data sets and conducts data preprocessing, model construction and model optimization, and describes a set of enterprise employee turnover prediction models based on RapidMiner workflow. The data preprocessing is completed with the help of the data statistical analysis software IBM SPSS Statistic and RapidMiner.Statistical charts, scatter plots and boxplots for analysis are generated to realize data visualization analysis. Machine learning, model application, performance vector, and cross-validation through RapidMiner s multiple operators and workflows. Model design algorithms include support vector machines, naive Bayes, decision trees, and neural networks. Comparing the performance parameters of the algorithm model from the four aspects of accuracy, precision, recall and F1-score. It is concluded that the performance of the decision tree algorithm model is the highest. The performance evaluation results confirm the effectiveness of this model in sustainable exploring of enterprise employee turnover prediction in human resource management.
Authored by Yong Shi
With the development of technology, mobile phones are an indispensable part of human life. Factors such as brand, internal memory, wifi, battery power, camera and availability of 4G are now modifying consumers decisions on buying mobile phones. But people fail to link those factors with the price of mobile phones; in this case, this paper is aimed to figure out the problem by using machine learning algorithms like Support Vector Machine, Decision Tree, K Nearest Neighbors and Naive Bayes to train the mobile phone dataset before making predictions of the price level. We used appropriate algorithms to predict smartphone prices based on accuracy, precision, recall and F1 score. This not only helps customers have a better choice on the mobile phone but also gives advice to businesses selling mobile phones that the way to set reasonable prices with the different features they offer. This idea of predicting prices level will give support to customers to choose mobile phones wisely in the future. The result illustrates that among the 4 classifiers, SVM returns to the most desirable performance with 94.8\% of accuracy, 97.3 of F1 score (without feature selection) and 95.5\% of accuracy, 97.7\% of F1 score (with feature selection).
Authored by Ningyuan Hu
Nowadays, the MOBA game is the game type with the most audiences and players around the world. Recently, the League of Legends has become an official sport as an e-sport among 37 events in the 2022 Asia Games held in Hangzhou. As the development in the e-sport, analytical skills are also involved in this field. The topic of this research is to use the machine learning approach to analyze the data of the League of Legends and make a prediction about the result of the game. In this research, the method of machine learning is applied to the dataset which records the first 10 minutes in diamond-ranked games. Several popular machine learning (AdaBoost, GradientBoost, RandomForest, ExtraTree, SVM, Naïve Bayes, KNN, LogisticRegression, and DecisionTree) are applied to test the performance by cross-validation. Then several algorithms that outperform others are selected to make a voting classifier to predict the game result. The accuracy of the voting classifier is 72.68\%.
Authored by Qiyuan Shen
In this research work, we attempted to predict the creditworthiness of smartphone users in Indonesia during the COVID-19 pandemic using machine learning. Principal Component Analysis (PCA) and Kmeans algorithms are used for the prediction of creditworthiness with the used a dataset of 1050 respondents consisting of twelve questions to smartphone users in Indonesia during the COVID-19 pandemic. The four different classification algorithms (Logistic Regression, Support Vector Machine, Decision Tree, and Naive Bayes) were tested to classify the creditworthiness of smartphone users in Indonesia. The tests carried out included testing for accuracy, precision, recall, F1-score, and Area Under Curve Receiver Operating Characteristics (AUCROC) assesment. Logistic Regression algorithm shows the perfect performances whereas Naïve Bayes (NB) shows the least. The results of this research also provide new knowledge about the influential and non-influential variables based on the twelve questions conducted to the respondents of smartphone users in Indonesia during the COVID-19 pandemic.
Authored by R Winahyu, Maman Somantri, Oky Nurhayati
Bus factor is a metric that identifies how resilient is the project to the sudden engineer turnover. It states the minimal number of engineers that have to be hit by a bus for a project to be stalled. Even though the metric is often discussed in the community, few studies consider its general relevance. Moreover, the existing tools for bus factor estimation focus solely on the data from version control systems, even though there exists other channels for knowledge generation and distribution. With a survey of 269 engineers, we find that the bus factor is perceived as an important problem in collective development, and determine the highest impact channels of knowledge generation and distribution in software development teams. We also propose a multimodal bus factor estimation algorithm that uses data on code reviews and meetings together with the VCS data. We test the algorithm on 13 projects developed at JetBrains and compared its results to the results of the state-of-the-art tool by Avelino et al. against the ground truth collected in a survey of the engineers working on these projects. Our algorithm is slightly better in terms of both predicting the bus factor as well as key developers compared to the results of Avelino et al. Finally, we use the interviews and the surveys to derive a set of best practices to address the bus factor issue and proposals for the possible bus factor assessment tool.
Authored by Elgun Jabrayilzade, Mikhail Evtikhiev, Eray Tüzün, Vladimir Kovalenko
Flexibility and speed in the development of new industrial machines are essential factors for the success of capital goods industries. When assembling a printed circuit board (PCB), since all the components are surface mounted devices (SMD), the whole process is automatic. However, in many PCBs, it is necessary to place components that are not SMDs, called pin through hole components (PTH), having to be inserted manually, which leads to delays in the production line. This work proposes and validates a prototype work cell based on a collaborative robot and vision systems whose objective is to insert these components in a completely autonomous or semi-autonomous way. Different tests were made to validate this work cell, showing the correct implementation and the possibility of replacing the human worker on this PCB assembly task.
Authored by Mauro Queirós, João Pereira, Valdemar Leiras, José Meireles, Jaime Fonseca, João Borges
Recent approaches have proven the effectiveness of local outlier factor-based outlier detection when applied over traffic flow probability distributions. However, these approaches used distance metrics based on the Bhattacharyya coefficient when calculating probability distribution similarity. Consequently, the limited expressiveness of the Bhattacharyya coefficient restricted the accuracy of the methods. The crucial deficiency of the Bhattacharyya distance metric is its inability to compare distributions with non-overlapping sample spaces over the domain of natural numbers. Traffic flow intensity varies greatly, which results in numerous non-overlapping sample spaces, rendering metrics based on the Bhattacharyya coefficient inappropriate. In this work, we address this issue by exploring alternative distance metrics and showing their applicability in a massive real-life traffic flow data set from 26 vital intersections in The Hague. The results on these data collected from 272 sensors for more than two years show various advantages of the Earth Mover's distance both in effectiveness and efficiency.
Authored by Erik Andersen, Marco Chiarandini, Marwan Hassani, Stefan Jänicke, Panagiotis Tampakis, Arthur Zimek
Objective measures are ubiquitous in the formulation, design and implementation of deep space missions. Tour durations, flyby altitudes, propellant budgets, power consumption, and other metrics are essential to developing and managing NASA missions. But beyond the simple metrics of cost and workforce, it has been difficult to identify objective, quantitative measures that assist in evaluating choices made during formulation or implementation phases in terms of their impact on flight operations. As part of the development of the Europa Clipper Mission system, a set of operations metrics have been defined along with the necessary design information and software tooling to calculate them. We have applied these methods and metrics to help assess the impact to the flight team on the six options for the Clipper Tour that are currently being vetted for selection in the fall of 2021. To generate these metrics, the Clipper MOS team first designed the set of essential processes by which flight operations will be conducted, using a standard approach and template to identify (among other aspects) timelines for each process, along with their time constraints (e.g., uplinks for sequence execution). Each of the resulting 50 processes is documented in a common format and concurred by stakeholders. Process timelines were converted into generic schedules and workforce-loaded using COTS scheduling software, based on the inputs of the process authors and domain experts. Custom code was generated to create an operations schedule for a specific portion of Clipper's prime mission, with instances of a given process scheduled based on specific timing rules (e.g., process X starts once per week on Thursdays) or relative to mission events (e.g., sequence generation process begins on a Monday, at least three weeks before each Europa closest approach). Over a 5-month period, and for each of six Clipper candidate tours, the result was a 20,000+ line, workforce-loaded schedule that documents all of the process-driven work effort at the level of individual roles, along with a significant portion of the level-of-effort work. Post-processing code calculated the absolute and relative number of work hours during a nominal 5 day / 40 hour work week, the work effort during 2nd and 3rd shift, as well as 1st shift on weekends. The resultant schedules and shift tables were used to generate objective measures that can be related to both human factors and to operational risk and showed that Clipper tours which utilize 6:1 resonant (21.25 day) orbits instead of 4:1 resonant (14.17 day) orbits during the first dozen or so Europa flybys are advantageous to flight operations. A similar approach can be extended to assist missions in more objective assessments of a number of mission issues and trades, including tour selection and spacecraft design for operability.
Authored by Duane Bindschadler, Nari Hwangpo, Marc Sarrel
This paper provides an end-to-end solution to defend against known microarchitectural attacks such as speculative execution attacks, fault-injection attacks, covert and side channel attacks, and unknown or evasive versions of these attacks. Current defenses are attack specific and can have unacceptably high performance overhead. We propose an approach that reduces the overhead of state-of-art defenses by over 95%, by applying defenses only when attacks are detected. Many current proposed mitigations are not practical for deployment; for example, InvisiSpec has 27% overhead and Fencing has 74% overhead while protecting against only Spectre attacks. Other mitigations carry similar performance penalties. We reduce the overhead for InvisiSpec to 1.26% and for Fencing to 3.45% offering performance and security for not only spectre attacks but other known transient attacks as well, including the dangerous class of LVI and Rowhammer attacks, as well as covering a large set of future evasive and zero-day attacks. Critical to our approach is an accurate detector that is not fooled by evasive attacks and that can generalize to novel zero-day attacks. We use a novel Generative framework, Evasion Vaccination (EVAX) for training ML models and engineering new security-centric performance counters. EVAX significantly increases sensitivity to detect and classify attacks in time for mitigation to be deployed with low false positives (4 FPs in every 1M instructions in our experiments). Such performance enables efficient and timely mitigations, enabling the processor to automatically switch between performance and security as needed.
Authored by Samira Ajorpaz, Daniel Moghimi, Jeffrey Collins, Gilles Pokam, Nael Abu-Ghazaleh, Dean Tullsen
The number of publications related to Explainable Artificial Intelligence (XAI) has increased rapidly this last decade. However, the subjective nature of explainability has led to a lack of consensus regarding commonly used definitions for explainability and with differing problem statements falling under the XAI label resulting in a lack of comparisons. This paper proposes in broad terms a simple comparison framework for XAI methods based on the output and what we call the practical attributes. The aim of the framework is to ensure that everything that can be held constant for the purpose of comparison, is held constant and to ignore many of the subjective elements present in the area of XAI. An example utilizing such a comparison along the lines of the proposed framework is performed on local, post-hoc, model-agnostic XAI algorithms which are designed to measure the feature importance/contribution for a queried instance. These algorithms are assessed on two criteria using synthetic datasets across a range of classifiers. The first is based on selecting features which contribute to the underlying data structure and the second is how accurately the algorithms select the features used in a decision tree path. The results from the first comparison showed that when the classifier was able to pick up the underlying pattern in the model, the LIME algorithm was the most accurate at selecting the underlying ground truth features. The second test returned mixed results with some instances in which the XAI algorithms were able to accurately return the features used to produce predictions, however this result was not consistent.
Authored by Guo Yeo, Irene Hudson, David Akman, Jeffrey Chan
The growing complexity of wireless networks has sparked an upsurge in the use of artificial intelligence (AI) within the telecommunication industry in recent years. In network slicing, a key component of 5G that enables network operators to lease their resources to third-party tenants, AI models may be employed in complex tasks, such as short-term resource reservation (STRR). When AI is used to make complex resource management decisions with financial and service quality implications, it is important that these decisions be understood by a human-in-the-loop. In this paper, we apply state-of-the-art techniques from the field of Explainable AI (XAI) to the problem of STRR. Using real-world data to develop an AI model for STRR, we demonstrate how our XAI methodology can be used to explain the real-time decisions of the model, to reveal trends about the model’s general behaviour, as well as aid in the diagnosis of potential faults during the model’s development. In addition, we quantitatively validate the faithfulness of the explanations across an extensive range of XAI metrics to ensure they remain trustworthy and actionable.
Authored by Pieter Barnard, Irene Macaluso, Nicola Marchetti, Luiz DaSilva
Many studies have been conducted to detect various malicious activities in cyberspace using classifiers built by machine learning. However, it is natural for any classifier to make mistakes, and hence, human verification is necessary. One method to address this issue is eXplainable AI (XAI), which provides a reason for the classification result. However, when the number of classification results to be verified is large, it is not realistic to check the output of the XAI for all cases. In addition, it is sometimes difficult to interpret the output of XAI. In this study, we propose a machine learning model called classification verifier that verifies the classification results by using the output of XAI as a feature and raises objections when there is doubt about the reliability of the classification results. The results of experiments on malicious website detection and malware detection show that the proposed classification verifier can efficiently identify misclassified malicious activities.
Authored by Koji Fujita, Toshiki Shibahara, Daiki Chiba, Mitsuaki Akiyama, Masato Uchida
In order to solve the problems that may arise from the negative impact of EV charging loads on the power distribution network, it is very important to predict the distribution network variability according to EV charging loads. If appropriate facility reinforcement or system operation is made through evaluation of the impact of EV charging load, it will be possible to prevent facility failure in advance and maintain the power quality at a certain level, enabling stable network operation. By analysing the degree of change in the predicted load according to the EV load characteristics through the load prediction model, it is possible to evaluate the influence of the distribution network according to the EV linkage. This paper aims to investigate the effect of EV charging load on voltage stability, power loss, reliability index and economic loss of distribution network. For this, we transformed univariate time series of EV charging data into a multivariate time series using feature engineering techniques. Then, time series forecast models are trained based on the multivariate dataset. Finally, XAI techniques such as LIME and SHAP are applied to the models to obtain the feature importance analysis results.
Authored by H. Lee, H. Lim, B. Lee
Electrical load forecasting is an essential part of the smart grid to maintain a stable and reliable grid along with helping decisions for economic planning. With the integration of more renewable energy resources, especially solar photovoltaic (PV), and transitioning into a prosumer-based grid, electrical load forecasting is deemed to play a crucial role on both regional and household levels. However, most of the existing forecasting methods can be considered black-box models due to deep digitalization enablers, such as Deep Neural Networks (DNN), where human interpretation remains limited. Additionally, the black box character of many models limits insights and applicability. In order to mitigate this shortcoming, eXplainable Artificial Intelligence (XAI) is introduced as a measure to get transparency into the model’s behavior and human interpretation. By utilizing XAI, experienced power market and system professionals can be integrated into developing the data-driven approach, even without knowing the data science domain. In this study, an electrical load forecasting model utilizing an XAI tool for a Norwegian residential building was developed and presented.
Authored by Eilert Henriksen, Ugur Halden, Murat Kuzlu, Umit Cali
This work proposed a unified approach to increase the explainability of the predictions made by Convolution Neural Networks (CNNs) on medical images using currently available Explainable Artificial Intelligent (XAI) techniques. This method in-cooperates multiple techniques such as LISA aka Local Interpretable Model Agnostic Explanations (LIME), integrated gradients, Anchors and Shapley Additive Explanations (SHAP) which is Shapley values-based approach to provide explanations for the predictions provided by Blackbox models. This unified method increases the confidence in the black-box model’s decision to be employed in crucial applications under the supervision of human specialists. In this work, a Chest X-ray (CXR) classification model for identifying Covid-19 patients is trained using transfer learning to illustrate the applicability of XAI techniques and the unified method (LISA) to explain model predictions. To derive predictions, an image-net based Inception V2 model is utilized as the transfer learning model.
Authored by Sudil Abeyagunasekera, Yuvin Perera, Kenneth Chamara, Udari Kaushalya, Prasanna Sumathipala, Oshada Senaweera
The rapid shift towards smart cities, particularly in the era of pandemics, necessitates the employment of e-learning, remote learning systems, and hybrid models. Building adaptive and personalized education becomes a requirement to mitigate the downsides of distant learning while maintaining high levels of achievement. Explainable artificial intelligence (XAI), machine learning (ML), and the internet of behaviour (IoB) are just a few of the technologies that are helping to shape the future of smart education in the age of smart cities through Customization and personalization. This study presents a paradigm for smart education based on the integration of XAI and IoB technologies. The research uses data acquired on students' behaviours to determine whether or not the current education systems respond appropriately to learners' requirements. Despite the existence of sophisticated education systems, they have not yet reached the degree of development that allows them to be tailored to learners' cognitive needs and support them in the absence of face-to-face instruction. The study collected data on 41 learner's behaviours in response to academic activities and assessed whether the running systems were able to capture such behaviours and respond appropriately or not; the study used evaluation methods that demonstrated that there is a change in students' academic progression concerning monitoring using IoT/IoB to enable a relative response to support their progression.
Authored by Ossama Embarak
XAI with natural language processing aims to produce human-readable explanations as evidence for AI decision-making, which addresses explainability and transparency. However, from an HCI perspective, the current approaches only focus on delivering a single explanation, which fails to account for the diversity of human thoughts and experiences in language. This paper thus addresses this gap, by proposing a generative XAI framework, INTERACTION (explain aNd predicT thEn queRy with contextuAl CondiTional varIational autO-eNcoder). Our novel framework presents explanation in two steps: (step one) Explanation and Label Prediction; and (step two) Diverse Evidence Generation. We conduct intensive experiments with the Transformer architecture on a benchmark dataset, e-SNLI [1]. Our method achieves competitive or better performance against state-of-the-art baseline models on explanation generation (up to 4.7% gain in BLEU) and prediction (up to 4.4% gain in accuracy) in step one; it can also generate multiple diverse explanations in step two.
Authored by Jialin Yu, Alexandra Cristea, Anoushka Harit, Zhongtian Sun, Olanrewaju Aduragba, Lei Shi, Noura Moubayed
Artificial intelligence(AI) is used in decision support systems which learn and perceive features as a function of the number of layers and the weights computed during training. Due to their inherent black box nature, it is insufficient to consider accuracy, precision and recall as metrices for evaluating a model's performance. Domain knowledge is also essential to identify features that are significant by the model to arrive at its decision. In this paper, we consider a use case of face mask recognition to explain the application and benefits of XAI. Eight models used to solve the face mask recognition problem were selected. GradCAM Explainable AI (XAI) is used to explain the state-of-art models. Models that were selecting incorrect features were eliminated even though, they had a high accuracy. Domain knowledge relevant to face mask recognition viz., facial feature importance is applied to identify the model that picked the most appropriate features to arrive at the decision. We demonstrate that models with high accuracies need not be necessarily select the right features. In applications requiring rapid deployment, this method can act as a deciding factor in shortlisting models with a guarantee that the models are looking at the right features for arriving at the classification. Furthermore, the outcomes of the model can be explained to the user enhancing their confidence on the AI model being deployed in the field.
Authored by K Srikanth, T Ramesh, Suja Palaniswamy, Ranganathan Srinivasan
The security of Energy Data collection is the basis of achieving reliability and security intelligent of smart grid. The newest security communication of Data collection is Zero Trust communication; The Strategy of Zero Trust communication is that don’t trust any device of outside or inside. Only that device authenticate is successful and software and hardware is more security, the Energy intelligent power system allow the device enroll into network system, otherwise deny these devices. When the device has been communicating with the Energy system, the Zero Trust still need to detect its security and vulnerability, if device have any security issue or vulnerability issue, the Zero Trust deny from network system, it ensures that Energy power system absolute security, which lays a foundation for the security analysis of intelligent power unit.
Authored by Yan Chen, Xingchen Zhou, Jian Zhu, Hongbin Ji
How can high-level directives concerning risk, cybersecurity and compliance be operationalized in the central nervous system of any organization above a certain complexity? How can the effectiveness of technological solutions for security be proven and measured, and how can this technology be aligned with the governance and financial goals at the board level? These are the essential questions for any CEO, CIO or CISO that is concerned with the wellbeing of the firm. The concept of Zero Trust (ZT) approaches information and cybersecurity from the perspective of the asset to be protected, and from the value that asset represents. Zero Trust has been around for quite some time. Most professionals associate Zero Trust with a particular architectural approach to cybersecurity, involving concepts such as segments, resources that are accessed in a secure manner and the maxim “always verify never trust”. This paper describes the current state of the art in Zero Trust usage. We investigate the limitations of current approaches and how these are addressed in the form of Critical Success Factors in the Zero Trust Framework developed by ON2IT ‘Zero Trust Innovators’ (1). Furthermore, this paper describes the design and engineering of a Zero Trust artefact that addresses the problems at hand (2), according to Design Science Research (DSR). The last part of this paper outlines the setup of an empirical validation trough practitioner oriented research, in order to gain a broader acceptance and implementation of Zero Trust strategies (3). The final result is a proposed framework and associated technology which, via Zero Trust principles, addresses multiple layers of the organization to grasp and align cybersecurity risks and understand the readiness and fitness of the organization and its measures to counter cybersecurity risks.
Authored by Yuri Bobbert, Jeroen Scheerder
Under the situation of regular epidemic prevention and control, teleworking has gradually become a normal working mode. With the development of modern information technologies such as big data, cloud computing and mobile Internet, it's become a problem that how to build an effective security defense system to ensure the information security of teleworking in complex network environment while ensuring the availability, collaboration and efficiency of teleworking. One of the solutions is Zero Trust Network(ZTN), most enterprise infrastructures will operate in a hybrid zero trust/perimeter-based mode while continuing to invest in IT modernization initiatives and improve organization business processes. In this paper, we have systematically studied the zero trust principles, the logical components of zero trust architecture and the key technology of zero trust network. Based on the abstract model of zero trust architecture and information security technologies, a prototype has been realized which suitable for iOS terminals to access enterprise resources safely in teleworking mode.
Authored by Wengao Fang, Xiaojuan Guan