Publications | Science of Security Virtual Organization

Malicious attack detection based on traffic-flow information fusion

While vehicle-to-everything communication technology enables information sharing and cooperative control for vehicles, it also poses a significant threat to the vehicles' driving security owing to cyber-attacks. In particular, Sybil malicious attacks hidden in the vehicle broadcast information flow are challenging to detect, thereby becoming an urgent issue requiring attention. Several researchers have considered this problem and proposed different detection schemes. However, the detection performance of existing schemes based on plausibility checks and neighboring observers is affected by the traffic and attacker densities. In this study, we propose a malicious attack detection scheme based on traffic-flow information fusion, which enables the detection of Sybil attacks without neighboring observer nodes. Our solution is based on the basic safety message, which is broadcast by vehicles periodically. It first constructs the basic features of traffic flow to reflect the traffic state, subsequently fuses it with the road detector information to add the road fusion features, and then classifies them using machine learning algorithms to identify malicious attacks. The experimental results demonstrate that our scheme achieves the detection of Sybil attacks with an accuracy greater than 90 % at different traffic and attacker densities. Our solutions provide security for achieving a usable vehicle communication network.

Authored by Ye Chen, Yingxu Lai, Zhaoyi Zhang, Hanmei Li, Yuhang Wang

A machine learning based approach for the detection of sybil attacks in C-ITS

The intrusion detection systems are vital for the sustainability of Cooperative Intelligent Transportation Systems (C-ITS) and the detection of sybil attacks are particularly challenging. In this work, we propose a novel approach for the detection of sybil attacks in C-ITS environments. We provide an evaluation of our approach using extensive simulations that rely on real traces, showing our detection approach's effectiveness.

Authored by Badis Hammi, Mohamed Idir, Rida Khatoun

Sybil Attack Detection in VANETs using an AdaBoost Classifier

Smart cities are a wide range of projects made to facilitate the problems of everyday life and ensure security. Our interest focuses only on the Intelligent Transport System (ITS) that takes care of the transportation issues using the Vehicular Ad-Hoc Network (VANET) paradigm as its base. VANETs are a promising technology for autonomous driving that provides many benefits to the user conveniences to improve road safety and driving comfort. VANET is a promising technology for autonomous driving that provides many benefits to the user's conveniences by improving road safety and driving comfort. The problem with such rapid development is the continuously increasing digital threats. Among all these threats, we will target the Sybil attack since it has been proved to be one of the most dangerous attacks in VANETs. It allows the attacker to generate multiple forged identities to disseminate numerous false messages, disrupt safety-related services, or misuse the systems. In addition, Machine Learning (ML) is showing a significant influence on classification problems, thus we propose a behavior-based classification algorithm that is tested on the provided VeReMi dataset coupled with various machine learning techniques for comparison. The simulation results prove the ability of our proposed mechanism to detect the Sybil attack in VANETs.

Authored by Dhia Laouiti, Marwane Ayaida, Nadhir Messai, Sameh Najeh, Leila Najjar, Ferdaous Chaabane

Short-Term Time Series Forecasting based on Edge Machine Learning Techniques for IoT devices

As the effects of climate change are becoming more and more evident, the importance of improved situation awareness is also gaining more attention, both in the context of preventive environmental monitoring and in the context of acute crisis response. One important aspect of situation awareness is the correct and thorough monitoring of air pollutants. The monitoring is threatened by sensor faults, power or network failures, or other hazards leading to missing or incorrect data transmission. For this reason, in this work we propose two complementary approaches for predicting missing sensor data and a combined technique for detecting outliers. The proposed solution can enhance the performance of low-cost sensor systems, closing the gap of missing measurements due to network unavailability, detecting drift and outliers thus paving the way to its use as an alert system for reportable events. The techniques have been deployed and tested also in a low power microcontroller environment, verifying the suitability of such a computing power to perform the inference locally, leading the way to an edge implementation of a virtual sensor digital twin.

Authored by Martina Rasch, Antonio Martino, Mario Drobics, Massimo Merenda

Facial Privacy Preservation using FGSM and Universal Perturbation attacks

Research done in Facial Privacy so far has entrenched the scope of gleaning race, age, and gender from a human’s facial image that are classifiable and compliant biometric attributes. Noticeable distortions, morphing, and face-swapping are some of the techniques that have been researched to restore consumers’ privacy. By fooling face recognition models, these techniques cater superficially to the needs of user privacy, however, the presence of visible manipulations negatively affects the aesthetic of the image. The objective of this work is to highlight common adversarial techniques that can be used to introduce granular pixel distortions using white-box and black-box perturbation algorithms that ensure the privacy of users’ sensitive or personal data in face images, fooling AI facial recognition models while maintaining the aesthetics of and visual integrity of the image.

Authored by Nishchal Jagadeesha

PPIoV: A Privacy Preserving-Based Framework for IoV- Fog Environment Using Federated Learning and Blockchain

The integration of the Internet-of-Vehicles (IoV) and fog computing benefits from cooperative computing and analysis of environmental data while avoiding network congestion and latency. However, when private data is shared across fog nodes or the cloud, there exist privacy issues that limit the effectiveness of IoV systems, putting drivers' safety at risk. To address this problem, we propose a framework called PPIoV, which is based on Federated Learning (FL) and Blockchain technologies to preserve the privacy of vehicles in IoV.Typical machine learning methods are not well suited for distributed and highly dynamic systems like IoV since they train on data with local features. Therefore, we use FL to train the global model while preserving privacy. Also, our approach is built on a scheme that evaluates the reliability of vehicles participating in the FL training process. Moreover, PPIoV is built on blockchain to establish trust across multiple communication nodes. For example, when the local learned model updates from the vehicles and fog nodes are communicated with the cloud to update the global learned model, all transactions take place on the blockchain. The outcome of our experimental study shows that the proposed method improves the global model's accuracy as a result of allowing reputed vehicles to update the global model.

Authored by Jamal Alotaibi, Lubna Alazzawi

Enhancing Cyber Security in IoT Systems using FL-based IDS with Differential Privacy

Nowadays, IoT networks and devices exist in our everyday life, capturing and carrying unlimited data. However, increasing penetration of connected systems and devices implies rising threats for cybersecurity with IoT systems suffering from network attacks. Artificial Intelligence (AI) and Machine Learning take advantage of huge volumes of IoT network logs to enhance their cybersecurity in IoT. However, these data are often desired to remain private. Federated Learning (FL) provides a potential solution which enables collaborative training of attack detection model among a set of federated nodes, while preserving privacy as data remain local and are never disclosed or processed on central servers. While FL is resilient and resolves, up to a point, data governance and ownership issues, it does not guarantee security and privacy by design. Adversaries could interfere with the communication process, expose network vulnerabilities, and manipulate the training process, thus affecting the performance of the trained model. In this paper, we present a federated learning model which can successfully detect network attacks in IoT systems. Moreover, we evaluate its performance under various settings of differential privacy as a privacy preserving technique and configurations of the participating nodes. We prove that the proposed model protects the privacy without actually compromising performance. Our model realizes a limited performance impact of only ∼ 7% less testing accuracy compared to the baseline while simultaneously guaranteeing security and applicability.

Authored by Zacharias Anastasakis, Konstantinos Psychogyios, Terpsi Velivassaki, Stavroula Bourou, Artemis Voulkidis, Dimitrios Skias, Antonis Gonos, Theodore Zahariadis

Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers

Graph-based Semi-Supervised Learning (GSSL) is a practical solution to learn from a limited amount of labelled data together with a vast amount of unlabelled data. However, due to their reliance on the known labels to infer the unknown labels, these algorithms are sensitive to data quality. It is therefore essential to study the potential threats related to the labelled data, more specifically, label poisoning. In this paper, we propose a novel data poisoning method which efficiently approximates the result of label inference to identify the inputs which, if poisoned, would produce the highest number of incorrectly inferred labels. We extensively evaluate our approach on three classification problems under 24 different experimental settings each. Compared to the state of the art, our influence-driven attack produces an average increase of error rate 50% higher, while being faster by multiple orders of magnitude. Moreover, our method can inform engineers of inputs that deserve investigation (relabelling them) before training the learning model. We show that relabelling one-third of the poisoned inputs (selected based on their influence) reduces the poisoning effect by 50%. ACM Reference Format: Adriano Franci, Maxime Cordy, Martin Gubri, Mike Papadakis, and Yves Le Traon. 2022. Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers. In 1st Conference on AI Engineering - Software Engineering for AI (CAIN’22), May 16–24, 2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3522664.3528606

Authored by Adriano Franci, Maxime Cordy, Martin Gubri, Mike Papadakis, Yves Le Traon

A Survey on Data Poisoning Attacks and Defenses

With the widespread deployment of data-driven services, the demand for data volumes continues to grow. At present, many applications lack reliable human supervision in the process of data collection, which makes the collected data contain low-quality data or even malicious data. This low-quality or malicious data make AI systems potentially face much security challenges. One of the main security threats in the training phase of machine learning is data poisoning attacks, which compromise model integrity by contaminating training data to make the resulting model skewed or unusable. This paper reviews the relevant researches on data poisoning attacks in various task environments: first, the classification of attacks is summarized, then the defense methods of data poisoning attacks are sorted out, and finally, the possible research directions in the prospect.

Authored by Jiaxin Fan, Qi Yan, Mohan Li, Guanqun Qu, Yang Xiao

Detection and Mitigation of Targeted Data Poisoning Attacks in Federated Learning

Federated learning (FL) has emerged as a promising paradigm for distributed training of machine learning models. In FL, several participants train a global model collaboratively by only sharing model parameter updates while keeping their training data local. However, FL was recently shown to be vulnerable to data poisoning attacks, in which malicious participants send parameter updates derived from poisoned training data. In this paper, we focus on defending against targeted data poisoning attacks, where the attacker’s goal is to make the model misbehave for a small subset of classes while the rest of the model is relatively unaffected. To defend against such attacks, we first propose a method called MAPPS for separating malicious updates from benign ones. Using MAPPS, we propose three methods for attack detection: MAPPS + X-Means, MAPPS + VAT, and their Ensemble. Then, we propose an attack mitigation approach in which a "clean" model (i.e., a model that is not negatively impacted by an attack) can be trained despite the existence of a poisoning attempt. We empirically evaluate all of our methods using popular image classification datasets. Results show that we can achieve \textgreater 95% true positive rates while incurring only \textless 2% false positive rate. Furthermore, the clean models that are trained using our proposed methods have accuracy comparable to models trained in an attack-free scenario.

Authored by Pinar Erbil, Emre Gursoy

Robust and Resilient Federated Learning for Securing Future Networks

Machine Learning (ML) and Artificial Intelligence (AI) techniques are widely adopted in the telecommunication industry, especially to automate beyond 5G networks. Federated Learning (FL) recently emerged as a distributed ML approach that enables localized model training to keep data decentralized to ensure data privacy. In this paper, we identify the applicability of FL for securing future networks and its limitations due to the vulnerability to poisoning attacks. First, we investigate the shortcomings of state-of-the-art security algorithms for FL and perform an attack to circumvent FoolsGold algorithm, which is known as one of the most promising defense techniques currently available. The attack is launched with the addition of intelligent noise at the poisonous model updates. Then we propose a more sophisticated defense strategy, a threshold-based clustering mechanism to complement FoolsGold. Moreover, we provide a comprehensive analysis of the impact of the attack scenario and the performance of the defense mechanism.

Authored by Yushan Siriwardhana, Pawani Porambage, Madhusanka Liyanage, Mika Ylianttila

A Robust Framework for Adaptive Selection of Filter Ensembles to Detect Adversarial Inputs

Existing defense strategies against adversarial attacks (AAs) on AI/ML are primarily focused on examining the input data streams using a wide variety of filtering techniques. For instance, input filters are used to remove noisy, misleading, and out-of-class inputs along with a variety of attacks on learning systems. However, a single filter may not be able to detect all types of AAs. To address this issue, in the current work, we propose a robust, transferable, distribution-independent, and cross-domain supported framework for selecting Adaptive Filter Ensembles (AFEs) to minimize the impact of data poisoning on learning systems. The optimal filter ensembles are determined through a Multi-Objective Bi-Level Programming Problem (MOBLPP) that provides a subset of diverse filter sequences, each exhibiting fair detection accuracy. The proposed framework of AFE is trained to model the pristine data distribution to identify the corrupted inputs and converges to the optimal AFE without vanishing gradients and mode collapses irrespective of input data distributions. We presented preliminary experiments to show the proposed defense outperforms the existing defenses in terms of robustness and accuracy.

Authored by Arunava Roy, Dipankar Dasgupta

Analysis of Intrusion Detection Performance by Smoothing Factor of Gaussian NB Model Using Modified NSL-KDD Dataset

Recently, research on AI-based network intrusion detection has been actively conducted. In previous studies, the machine learning models such as SVM (Support Vector Machine) and RF (Random Forest) showed consistently high performance, whereas the NB (Naïve Bayes) showed various performances with large deviations. In the paper, after analyzing the cause of the NB models showing various performances addressed in the several studies, we measured the performance of the Gaussian NB model according to the smoothing factor that is closely related to these causes. Furthermore, we compared the performance of the Gaussian NB model with that of the other models as a zero-day attack detection system. As a result of the experiment, the accuracy was 38.80% and 87.99% in case that the smoothing factor is 0 and default respectively, and the highest accuracy was 94.53% in case that the smoothing factor is 1e-01. In the experiment, we used only some types of the attack data in the NSL-KDD dataset. The experiments showed the applicability of the Gaussian NB model as a zero-day attack detection system in the future. In addition, it is clarified that the smoothing factor of the Gaussian NB model determines the shape of gaussian distribution that is related to the likelihood.

Authored by Kijung Bong, Jonghyun Kim

IoT DDoS Traffic Detection Using Adaptive Heuristics Assisted With Machine Learning

DDoS is a major issue in network security and a threat to service providers that renders a service inaccessible for a period of time. The number of Internet of Things (IoT) devices has developed rapidly. Nevertheless, it is proven that security on these devices is frequently disregarded. Many detection methods exist and are mostly focused on Machine Learning. However, the best method has not been defined yet. The aim of this paper is to find the optimal volumetric DDoS attack detection method by first comparing different existing machine learning methods, and second, by building an adaptive lightweight heuristics model relying on few traffic attributes and simple DDoS detection rules. With this new simple model, our goal is to decrease the classification time. Finally, we compare machine learning methods with our adaptive new heuristics method which shows promising results both on the accuracy and performance levels.

Authored by Rani Rahbani, Jawad Khalife

Hybridization of Deep Learning & Machine Learning For IoT Based Intrusion Classification

With the rise of IoT applications, about 20.4 billion devices will be online in 2020, and that number will rise to 75 billion a month by 2025. Different sensors in IoT devices let them get and process data remotely and in real time. Sensors give them information that helps them make smart decisions and manage IoT environments well. IoT Security is one of the most important things to think about when you're developing, implementing, and deploying IoT platforms. People who use the Internet of Things (IoT) say that it allows people to communicate, monitor, and control automated devices from afar. This paper shows how to use Deep learning and machine learning to make an IDS that can be used on IoT platforms as a service. In the proposed method, a cnn mapped the features, and a random forest classifies normal and attack classes. In the end, the proposed method made a big difference in all performance parameters. Its average performance metrics have gone up 5% to 6%.

Authored by Mehul Kapoor, Puneet Kaur

On the Security of Python Virtual Machines: An Empirical Study

Python continues to be one of the most popular programming languages and has been used in many safety-critical fields such as medical treatment, autonomous driving systems, and data science. These fields put forward higher security requirements to Python ecosystems. However, existing studies on machine learning systems in Python concentrate on data security, model security and model privacy, and just assume the underlying Python virtual machines (PVMs) are secure and trustworthy. Unfortunately, whether such an assumption really holds is still unknown.This paper presents, to the best of our knowledge, the first and most comprehensive empirical study on the security of CPython, the official and most deployed Python virtual machine. To this end, we first designed and implemented a software prototype dubbed PVMSCAN, then use it to scan the source code of the latest CPython (version 3.10) and other 10 versions (3.0 to 3.9), which consists of 3,838,606 lines of source code. Empirical results give relevant findings and insights towards the security of Python virtual machines, such as: 1) CPython virtual machines are still vulnerable, for example, PVMSCAN detected 239 vulnerabilities in version 3.10, including 55 null dereferences, 86 uninitialized variables and 98 dead stores; Python/C API-related vulnerabilities are very common and have become one of the most severe threats to the security of PVMs: for example, 70 Python/C API-related vulnerabilities are identified in CPython 3.10; 3) the overall quality of the code remained stable during the evolution of Python VMs with vulnerabilities per thousand line (VPTL) to be 0.50; and 4) automatic vulnerability rectification is effective: 166 out of 239 (69.46%) vulnerabilities can be rectified by a simple yet effective syntax-directed heuristics.We have reported our empirical results to the developers of CPython, and they have acknowledged us and already confirmed and fixed 2 bugs (as of this writing) while others are still being analyzed. This study not only demonstrates the effectiveness of our approach, but also highlights the need to improve the reliability of infrastructures like Python virtual machines by leveraging state-of-the-art security techniques and tools.

Authored by Xinrong Lin, Baojian Hua, Qiliang Fan

A Review on The Concerns of Security Audit Using Machine Learning Techniques

Successful information and communication technology (ICT) may propel administrative procedures forward quickly. In order to achieve efficient usage of TCT in their businesses, ICT strategies and plans should be examined to ensure that they align with the organization's visions and missions. Efficient software and hardware work together to provide relevant data that aids in the improvement of how we do business, learn, communicate, entertain, and work. This exposes them to a risky environment that is prone to both internal and outside threats. The term “security” refers to a level of protection or resistance to damage. Security can also be thought of as a barrier between assets and threats. Important terms must be understood in order to have a comprehensive understanding of security. This research paper discusses key terms, concerns, and challenges related to information systems and security auditing. Exploratory research is utilised in this study to find an explanation for the observed occurrences, problems, or behaviour. The study's findings include a list of various security risks that must be seriously addressed in any Information System and Security Audit.

Authored by Saloni, Dilpreet Arora

NiNSRAPM: An Ensemble Learning Based Non-intrusive Network Security Risk Assessment Prediction Model

Cybersecurity insurance is one of the important means of cybersecurity risk management and the development of cyber insurance is inseparable from the support of cyber risk assessment technology. Cyber risk assessment can not only help governments and organizations to better protect themselves from related risks, but also serve as a basis for cybersecurity insurance underwriting, pricing, and formulating policy content. Aiming at the problem that cybersecurity insurance companies cannot conduct cybersecurity risk assessments on policyholders before the policy is signed without the authorization of the policyholder or in legal, combining with the need that cybersecurity insurance companies want to obtain network security vulnerability risk profiles of policyholders conveniently, quickly and at low cost before the policy signing, this study proposed a non-intrusive network security vulnerability risk assessment method based on ensemble machine learning. Our model uses only open source intelligence and publicly available network information data to rate cyber vulnerability risk of an organization, achieving an accuracy of 70.6% compared to a rating based on comprehensive information by cybersecurity experts.

Authored by Jun-Zheng Yang, Feng Liu, Yuan-Jie Zhao, Lu-Lu Liang, Jia-Yin Qi

Strong PUF Security Metrics: Response Sensitivity to Small Challenge Perturbations

This paper belongs to a sequence of manuscripts that discuss generic and easy-to-apply security metrics for Strong PUFs. These metrics cannot and shall not fully replace in-depth machine learning (ML) studies in the security assessment of Strong PUF candidates. But they can complement the latter, serve in initial PUF complexity analyses, and are much easier and more efficient to apply: They do not require detailed knowledge of various ML methods, substantial computation times, or the availability of an internal parametric model of the studied PUF. Our metrics also can be standardized particularly easily. This avoids the sometimes inconclusive or contradictory findings of existing ML-based security test, which may result from the usage of different or non-optimized ML algorithms and hyperparameters, differing hardware resources, or varying numbers of challenge-response pairs in the training phase.This first manuscript within the abovementioned sequence treats one of the conceptually most straightforward security metrics on that path: It investigates the effects that small perturbations in the PUF-challenges have on the resulting PUF-responses. We first develop and implement several sub-metrics that realize this approach in practice. We then empirically show that these metrics have surprising predictive power, and compare our obtained test scores with the known real-world security of several popular Strong PUF designs. The latter include (XOR) Arbiter PUFs, Feed-Forward Arbiter PUFs, and (XOR) Bistable Ring PUFs. Along the way, our manuscript also suggests techniques for representing the results of our metrics graphically, and for interpreting them in a meaningful manner.

Authored by Fynn Kappelhoff, Rasmus Rasche, Debdeep Mukhopadhyay, Ulrich Rührmair

Cluster, Cloud, Grid Computing via Network Communication Using Control Communication and Monitoring of Smart Grid

Traditional power consumption management systems are not showing enough reliability and thus, smart grid technology has been introduced to reduce the excess power wastages. In the context of smart grid systems, network communication is another term that is used for developing the network between the users and the load profiles. Cloud computing and clustering are also executed for efficient power management. Based on the facts, this research is going to identify wireless network communication systems to monitor and control smart grid power consumption. Primary survey-based research has been carried out with 62 individuals who worked in the smart grid system, tracked, monitored and controlled the power consumptions using WSN technology. The survey was conducted online where the respondents provided their opinions via a google survey form. The responses were collected and analyzed on Microsoft Excel. Results show that hybrid commuting of cloud and edge computing technology is more advantageous than individual computing. Respondents agreed that deep learning techniques will be more beneficial to analyze load profiles than machine learning techniques. Lastly, the study has explained the advantages and challenges of using smart grid network communication systems. Apart from the findings from primary research, secondary journal articles were also observed to emphasize the research findings.

Authored by Santosh Kumar, N Kumar, B.T. Geetha, M. Sangeetha, Kalyan Chakravarthi, Vikas Tripathi

Anomaly Detection in Smart Grids: A Survey From Cybersecurity Perspective

Smart grid is the next generation for power generation, consumption and distribution. However, with the introduction of smart communication in such sensitive components, major risks from cybersecurity perspective quickly emerged. This survey reviews and reports on the state-of-the-art techniques for detecting cyber attacks in smart grids, mainly through machine learning techniques.

Authored by Ahmad Alkuwari, Saif Al-Kuwari, Marwa Qaraqe

Enhancing Performance of Compressive Sensing-based State Estimators using Dictionary Learning

Smart grids integrate computing and communication infrastructure with conventional power grids to improve situational awareness, control, and safety. Several technologies such as automatic fault detection, automated reconfiguration, and outage management require close network monitoring. Therefore, utilities utilize sensing equipment such as PMUs (phasor measurement units), smart meters, and bellwether meters to obtain grid measurements. However, the expansion in sensing equipment results in an increased strain on existing communication infrastructure. Prior works overcome this problem by exploiting the sparsity of power consumption data in the Haar, Hankel, and Toeplitz transformation bases to achieve sub-Nyquist compression. However, data-driven dictionaries enable superior compression ratios and reconstruction accuracy by learning the sparsifying basis. Therefore, this work proposes using dictionary learning to learn the sparsifying basis of smart meter data. The smart meter data sent to the data centers are compressed using a random projection matrix prior to transmission. These measurements are aggregated to obtain the compressed measurements at the primary nodes. Compressive sensing-based estimators are then utilized to estimate the system states. This approach was validated on the IEEE 33-node distribution system and showed superior reconstruction accuracy over conventional transformation bases and over-complete dictionaries. Voltage magnitude and angle estimation error less than 0.3% mean absolute percentage error and 0.04 degree mean absolute error, respectively, were achieved at compression ratios as high as eight.

Authored by Rahul Madbhavi, Babji Srinivasan

Varangian: A Git Bot for Augmented Static Analysis

The complexity and scale of modern software programs often lead to overlooked programming errors and security vulnerabilities. Developers often rely on automatic tools, like static analysis tools, to look for bugs and vulnerabilities. Static analysis tools are widely used because they can understand nontrivial program behaviors, scale to millions of lines of code, and detect subtle bugs. However, they are known to generate an excess of false alarms which hinder their utilization as it is counterproductive for developers to go through a long list of reported issues, only to find a few true positives. One of the ways proposed to suppress false positives is to use machine learning to identify them. However, training machine learning models requires good quality labeled datasets. For this purpose, we developed D2A [3], a differential analysis based approach that uses the commit history of a code repository to create a labeled dataset of Infer [2] static analysis output.

Authored by Saurabh Pujar, Yunhui Zheng, Luca Buratti, Burn Lewis, Alessandro Morari, Jim Laredo, Kevin Postlethwait, Christoph Görn

Generative Data Augmentation for Non-IID Problem in Decentralized Clinical Machine Learning

Swarm learning (SL) is an emerging promising decentralized machine learning paradigm and has achieved high performance in clinical applications. SL solves the problem of a central structure in federated learning by combining edge computing and blockchain-based peer-to-peer network. While there are promising results in the assumption of the independent and identically distributed (IID) data across participants, SL suffers from performance degradation as the degree of the non-IID data increases. To address this problem, we propose a generative augmentation framework in swarm learning called SL-GAN, which augments the non-IID data by generating the synthetic data from participants. SL-GAN trains generators and discriminators locally, and periodically aggregation via a randomly elected coordinator in SL network. Under the standard assumptions, we theoretically prove the convergence of SL-GAN using stochastic approximations. Experimental results demonstrate that SL-GAN outperforms state-of-art methods on three real world clinical datasets including Tuberculosis, Leukemia, COVID-19.

Authored by Zirui Wang, Shaoming Duan, Chengyue Wu, Wenhao Lin, Xinyu Zha, Peiyi Han, Chuanyi Liu

Predicting Confidentiality, Integrity, and Availability from SQL Injection Payload

SQL Injection has been around as a harmful and prolific threat on web applications for more than 20 years, yet it still poses a huge threat to the World Wide Web. Rapidly evolving web technology has not eradicated this threat; In 2017 51 % of web application attacks are SQL injection attacks. Most conventional practices to prevent SQL injection attacks revolves around secure web and database programming and administration techniques. Despite developer ignorance, a large number of online applications remain susceptible to SQL injection attacks. There is a need for a more effective method to detect and prevent SQL Injection attacks. In this research, we offer a unique machine learning-based strategy for identifying potential SQL injection attack (SQL injection attack) threats. Application of the proposed method in a Security Information and Event Management(SIEM) system will be discussed. SIEM can aggregate and normalize event information from multiple sources, and detect malicious events from analysis of these information. The result of this work shows that a machine learning based SQL injection attack detector which uses SIEM approach possess high accuracy in detecting malicious SQL queries.

Authored by Yohan Muliono, Mohamad Darus, Chrisando Pardomuan, Muhammad Ariffin, Aditya Kurniawan