For a long time, SQL injection has been considered one of the most serious security threats. NoSQL databases are becoming increasingly popular as big data and cloud computing technologies progress. NoSQL injection attacks are designed to take advantage of applications that employ NoSQL databases. NoSQL injections can be particularly harmful because they allow unrestricted code execution. In this paper we use supervised learning and natural language processing to construct a model to detect NoSQL injections. Our model is designed to work with MongoDB, CouchDB, CassandraDB, and Couchbase queries. Our model has achieved an F1 score of 0.95 as established by 10-fold cross validation.
Authored by Sivakami Praveen, Alysha Dcouth, A Mahesh
Ethical bias in machine learning models has become a matter of concern in the software engineering community. Most of the prior software engineering works concentrated on finding ethical bias in models rather than fixing it. After finding bias, the next step is mitigation. Prior researchers mainly tried to use supervised approaches to achieve fairness. However, in the real world, getting data with trustworthy ground truth is challenging and also ground truth can contain human bias. Semi-supervised learning is a technique where, incrementally, labeled data is used to generate pseudo-labels for the rest of data (and then all that data is used for model training). In this work, we apply four popular semi-supervised techniques as pseudo-labelers to create fair classification models. Our framework, Fair-SSL, takes a very small amount (10%) of labeled data as input and generates pseudo-labels for the unlabeled data. We then synthetically generate new data points to balance the training data based on class and protected attribute as proposed by Chakraborty et al. in FSE 2021. Finally, classification model is trained on the balanced pseudo-labeled data and validated on test data. After experimenting on ten datasets and three learners, we find that Fair-SSL achieves similar performance as three state-of-the-art bias mitigation algorithms. That said, the clear advantage of Fair-SSL is that it requires only 10% of the labeled training data. To the best of our knowledge, this is the first SE work where semi-supervised techniques are used to fight against ethical bias in SE ML models. To facilitate open science and replication, all our source code and datasets are publicly available at https://github.com/joymallyac/FairSSL. CCS CONCEPTS • Software and its engineering → Software creation and management; • Computing methodologies → Machine learning. ACM Reference Format: Joymallya Chakraborty, Suvodeep Majumder, and Huy Tu. 2022. Fair-SSL: Building fair ML Software with less data. In International Workshop on Equitable Data and Technology (FairWare ‘22), May 9, 2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3524491.3527305
Authored by Joymallya Chakraborty, Suvodeep Majumder, Huy Tu
In recent times, the occurrence of malware attacks are increasing at an unprecedented rate. Particularly, the image-based malware attacks are spreading worldwide and many people get harmful malware-based images through the technique called steganography. In the existing system, only open malware and files from the internet can be identified. However, the image-based malware cannot be identified and detected. As a result, so many phishers make use of this technique and exploit the target. Social media platforms would be totally harmful to the users. To avoid these difficulties, Machine learning can be implemented to find the steganographic malware images (contents). The proposed methodology performs an automatic detection of malware and steganographic content by using Machine Learning. Steganography is used to hide messages from apparently innocuous media (e.g., images), and steganalysis is the approach used for detecting this malware. This research work proposes a machine learning (ML) approach to perform steganalysis. In the existing system, only open malware and files from the internet are identified but in the recent times many people get harmful malware-based images through the technique called steganography. Social media platforms would be totally harmful to the users. To avoid these difficulties, the proposed Machine learning has been developed to appropriately detect the steganographic malware images (contents). Father, the steganalysis method using machine learning has been developed for performing logistic classification. By using this, the users can avoid sharing the malware images in social media platforms like WhatsApp, Facebook without downloading it. It can be also used in all the photo-sharing sites such as google photos.
Authored by Henry Samuel, Santhanam Kumar, R. Aishwarya, G. Mathivanan
This research evaluates the accuracy of two methods of authorship prediction: syntactical analysis and n-gram, and explores its potential usage. The proposed algorithm measures n-gram, and counts adjectives, adverbs, verbs, nouns, punctuation, and sentence length from the training data, and normalizes each metric. The proposed algorithm compares the metrics of training samples to testing samples and predicts authorship based on the correlation they share for each metric. The severity of correlation between the testing and training data produces significant weight in the decision-making process. For example, if analysis of one metric approximates 100% positive correlation, the weight in the decision is assigned a maximum value for that metric. Conversely, a 100% negative correlation receives the minimum value. This new method of authorship validation holds promise for future innovation in fraud protection, the study of historical documents, and maintaining integrity within academia.
Authored by Jared Nelson, Mohammad Shekaramiz
Network Intrusion Detection Systems (IDSs) have been used to increase the level of network security for many years. The main purpose of such systems is to detect and block malicious activity in the network traffic. Researchers have been improving the performance of IDS technology for decades by applying various machine-learning techniques. From the perspective of academia, obtaining a quality dataset (i.e. a sufficient amount of captured network packets that contain both malicious and normal traffic) to support machine learning approaches has always been a challenge. There are many datasets publicly available for research purposes, including NSL-KDD, KDDCUP 99, CICIDS 2017 and UNSWNB15. However, these datasets are becoming obsolete over time and may no longer be adequate or valid to model and validate IDSs against state-of-the-art attack techniques. As attack techniques are continuously evolving, datasets used to develop and test IDSs also need to be kept up to date. Proven performance of an IDS tested on old attack patterns does not necessarily mean it will perform well against new patterns. Moreover, existing datasets may lack certain data fields or attributes necessary to analyse some of the new attack techniques. In this paper, we argue that academia needs up-to-date high-quality datasets. We compare publicly available datasets and suggest a way to provide up-to-date high-quality datasets for researchers and the security industry. The proposed solution is to utilize the network traffic captured from the Locked Shields exercise, one of the world’s largest live-fire international cyber defence exercises held annually by the NATO CCDCOE. During this three-day exercise, red team members consisting of dozens of white hackers selected by the governments of over 20 participating countries attempt to infiltrate the networks of over 20 blue teams, who are tasked to defend a fictional country called Berylia. After the exercise, network packets captured from each blue team’s network are handed over to each team. However, the countries are not willing to disclose the packet capture (PCAP) files to the public since these files contain specific information that could reveal how a particular nation might react to certain types of cyberattacks. To overcome this problem, we propose to create a dedicated virtual team, capture all the traffic from this team’s network, and disclose it to the public so that academia can use it for unclassified research and studies. In this way, the organizers of Locked Shields can effectively contribute to the advancement of future artificial intelligence (AI) enabled security solutions by providing annual datasets of up-to-date attack patterns.
Authored by Maj. Halisdemir, Hacer Karacan, Mauno Pihelgas, Toomas Lepik, Sungbaek Cho
State-of-the-art approaches in gait analysis usually rely on one isolated tracking system, generating insufficient data for complex use cases such as sports, rehabilitation, and MedTech. We address the opportunity to comprehensively understand human motion by a novel data model combining several motion-tracking methods. The model aggregates pose estimation by captured videos and EMG and EIT sensor data synchronously to gain insights into muscle activities. Our demonstration with biceps curl and sitting/standing pose generates time-synchronous data and delivers insights into our experiment’s usability, advantages, and challenges.
Authored by Sebastian Rettlinger, Bastian Knaus, Florian Wieczorek, Nikolas Ivakko, Simon Hanisch, Giang Nguyen, Thorsten Strufe, Frank Fitzek
The increasing data generation rate and the proliferation of deep learning applications have led to the development of machine learning-as-a-service (MLaaS) platforms by major Cloud providers. The existing MLaaS platforms, however, fall short in protecting the clients’ private data. Recent distributed MLaaS architectures such as federated learning have also shown to be vulnerable against a range of privacy attacks. Such vulnerabilities motivated the development of privacy-preserving MLaaS techniques, which often use complex cryptographic prim-itives. Such approaches, however, demand abundant computing resources, which undermine the low-latency nature of evolving applications such as autonomous driving.To address these challenges, we propose SCLERA–an efficient MLaaS framework that utilizes trusted execution environment for secure execution of clients’ workloads. SCLERA features a set of optimization techniques to reduce the computational complexity of the offloaded services and achieve low-latency inference. We assessed SCLERA’s efficacy using image/video analytic use cases such as scene detection. Our results show that SCLERA achieves up to 23× speed-up when compared to the baseline secure model execution.
Authored by Abhinav Kumar, Reza Tourani, Mona Vij, Srikathyayani Srikanteswara
Phishing is a method of online fraud where attackers are targeted to gain access to the computer systems for monetary benefits or personal gains. In this case, the attackers pose themselves as legitimate entities to gain the users' sensitive information. Phishing has been significant concern over the past few years. The firms are recording an increase in phishing attacks primarily aimed at the firm's intellectual property and the employees' sensitive data. As a result, these attacks force firms to spend more on information security, both in technology-centric and human-centric approaches. With the advancements in cyber-security in the last ten years, many techniques evolved to detect phishing-related activities through websites and emails. This study focuses on the latest techniques used for detecting phishing attacks, including the usage of Visual selection features, Machine Learning (ML), and Artificial Intelligence (AI) to see the phishing attacks. New strategies for identifying phishing attacks are evolving, but limited standardized knowledge on phishing identification and mitigation is accessible from user awareness training. So, this study also focuses on the role of security-awareness movements to minimize the impact of phishing attacks. There are many approaches to train the user regarding these attacks, such as persona-centred training, anti-phishing techniques, visual discrimination training and the usage of spam filters, robust firewalls and infrastructure, dynamic technical defense mechanisms, use of third-party certified software to mitigate phishing attacks from happening. Therefore, the purpose of this paper is to carry out a systematic analysis of literature to assess the state of knowledge in prominent scientific journals on the identification and prevention of phishing. Forty-three journal articles with the perspective of phishing detection and prevention through awareness training were reviewed from 2011 to 2020. This timely systematic review also focuses on the gaps identified in the selected primary studies and future research directions in this area.
Authored by Kanchan Patil, Sai Arra
Phishing activity is undertaken by the hackers to compromise the computer networks and financial system. A compromised computer system or network provides data and or processing resources to the world of cybercrime. Cybercrimes are projected to cost the world \$6 trillion by 2021, in this context phishing is expected to continue being a growing challenge. Statistics around phishing growth over the last decade support this theory as phishing numbers enjoy almost an exponential growth over the period. Recent reports on the complexity of the phishing show that the fight against phishing URL as a means of building more resilient cyberspace is an evolving challenge. Compounding the problem is the lack of cyber security expertise to handle the expected rise in incidents. Previous research have proposed different methods including neural network, data mining technique, heuristic-based phishing detection technique, machine learning to detect phishing websites. However, recently phishers have started to use more sophisticated techniques to attack the internet users such as VoIP phishing, spear phishing etc. For these modern methods, the traditional ways of phishing detection provide low accuracy. Hence, the requirement arises for the application and development of modern tools and techniques to use as a countermeasure against such phishing attacks. Keeping in view the nature of recent phishing attacks, it is imperative to develop a state-of-the art anti-phishing tool which should be able to predict the phishing attacks before the occurrence of actual phishing incidents. We have designed such a tool that will work efficiently to detect the phishing websites so that a user can understand easily the risk of using of his personal and financial data.
Authored by Rajeev Shah, Mohammad Hasan, Shayla Islam, Asif Khan, Taher Ghazal, Ahmad Khan
Global cybersecurity threats have grown as a result of the evolving digital transformation. Cybercriminals have more opportunities as a result of digitization. Initially, cyberthreats take the form of phishing in order to gain confidential user credentials.As cyber-attacks get more sophisticated and sophisticated, the cybersecurity industry is faced with the problem of utilising cutting-edge technology and techniques to combat the ever-present hostile threats. Hackers use phishing to persuade customers to grant them access to a company’s digital assets and networks. As technology progressed, phishing attempts became more sophisticated, necessitating the development of tools to detect phishing.Machine learning is unsupervised one of the most powerful weapons in the fight against terrorist threats. The features used for phishing detection, as well as the approaches employed with machine learning, are discussed in this study.In this light, the study’s major goal is to propose a unique, robust ensemble machine learning model architecture that gives the highest prediction accuracy with the lowest error rate, while also recommending a few alternative robust machine learning models.Finally, the Random forest algorithm attained a maximum accuracy of 96.454 percent. But by implementing a hybrid model including the 3 classifiers- Decision Trees,Random forest, Gradient boosting classifiers, the accuracy increases to 98.4 percent.
Authored by Josna Philomina, K Fathima, S Gayathri, Glory Elias, Abhinaya Menon
During pandemic COVID-19 outbreaks, number of cyber-attacks including phishing activities have increased tremendously. Nowadays many technical solutions on phishing detection were developed, however these approaches were either unsuccessful or unable to identify phishing pages and detect malicious codes efficiently. One of the downside is due to poor detection accuracy and low adaptability to new phishing connections. Another reason behind the unsuccessful anti-phishing solutions is an arbitrary selected URL-based classification features which may produce false results to the detection. Therefore, in this work, an intelligent phishing detection and prevention model is designed. The proposed model employs a self-destruct detection algorithm in which, machine learning, especially supervised learning algorithm was used. All employed rules in algorithm will focus on URL-based web characteristic, which attackers rely upon to redirect the victims to the simulated sites. A dataset from various sources such as Phish Tank and UCI Machine Learning repository were used and the testing was conducted in a controlled lab environment. As a result, a chrome extension phishing detection were developed based on the proposed model to help in preventing phishing attacks with an appropriate countermeasure and keep users aware of phishing while visiting illegitimate websites. It is believed that this smart phishing detection and prevention model able to prevent fraud and spam websites and lessen the cyber-crime and cyber-crisis that arise from year to year.
Authored by Amir Rose, Nurlida Basir, Nur Heng, Nurzi Zaizi, Madihah Saudi
People are increasingly sharing their details online as internet usage grows. Therefore, fraudsters have access to a massive amount of information and financial activities. The attackers create web pages that seem like reputable sites and transmit the malevolent content to victims to get them to provide subtle information. Prevailing phishing security measures are inadequate for detecting new phishing assaults. To accomplish this aim, objective to meet for this research is to analyses and compare phishing website and legitimate by analyzing the data collected from open-source platforms through a survey. Another objective for this research is to propose a method to detect fake sites using Decision Tree and Random Forest approaches. Microsoft Form has been utilized to carry out the survey with 30 participants. Majority of the participants have poor awareness and phishing attack and does not obverse the features of interface before accessing the search browser. With the data collection, this survey supports the purpose of identifying the best phishing website detection where Decision Tree and Random Forest were trained and tested. In achieving high number of feature importance detection and accuracy rate, the result demonstrates that Random Forest has the best performance in phishing website detection compared to Decision Tree.
Authored by Mohammed Alkawaz, Stephanie Steven, Omar Mohammad, Md Johar
Phishing has become a prominent method of data theft among hackers, and it continues to develop. In recent years, many strategies have been developed to identify phishing website attempts using machine learning particularly. However, the algorithms and classification criteria that have been used are highly different from the real issues and need to be compared. This paper provides a detailed comparison and evaluation of the performance of several machine learning algorithms across multiple datasets. Two phishing website datasets were used for the experiments: the Phishing Websites Dataset from UCI (2016) and the Phishing Websites Dataset from Mendeley (2018). Because these datasets include different types of class labels, the comparison algorithms can be applied in a variety of situations. The tests showed that Random Forest was better than other classification methods, with an accuracy of 88.92% for the UCI dataset and 97.50% for the Mendeley dataset.
Authored by Wendy Sarasjati, Supriadi Rustad, Purwanto, Heru Santoso, Muljono, Abdul Syukur, Fauzi Rafrastara, De Setiadi
Side-channel attacks have been a constant threat to computing systems. In recent times, vulnerabilities in the architecture were discovered and exploited to mount and execute a state-of-the-art attack such as Spectre. The Spectre attack exploits a vulnerability in the Intel-based processors to leak confidential data through the covert channel. There exist some defenses to mitigate the Spectre attack. Among multiple defenses, hardware-assisted attack/intrusion detection (HID) systems have received overwhelming response due to its low overhead and efficient attack detection. The HID systems deploy machine learning (ML) classifiers to perform anomaly detection to determine whether the system is under attack. For this purpose, a performance monitoring tool profiles the applications to record hardware performance counters (HPC), utilized for anomaly detection. Previous HID systems assume that the Spectre is executed as a standalone application. In contrast, we propose an attack that dynamically generates variations in the injected code to evade detection. The attack is injected into a benign application. In this manner, the attack conceals itself as a benign application and gen-erates perturbations to avoid detection. For the attack injection, we exploit a return-oriented programming (ROP)-based code-injection technique that reuses the code, called gadgets, present in the exploited victim's (host) memory to execute the attack, which, in our case, is the CR-Spectre attack to steal sensitive data from a target victim (target) application. Our work focuses on proposing a dynamic attack that can evade HID detection by injecting perturbations, and its dynamically generated variations thereof, under the cloak of a benign application. We evaluate the proposed attack on the MiBench suite as the host. From our experiments, the HID performance degrades from 90% to 16%, indicating our Spectre-CR attack avoids detection successfully.
Authored by Abhijitt Dhavlle, Setareh Rafatirad, Houman Homayoun, Sai Dinakarrao
Smart Security Solutions are in high demand with the ever-increasing vulnerabilities within the IT domain. Adjusting to a Work-From-Home (WFH) culture has become mandatory by maintaining required core security principles. Therefore, implementing and maintaining a secure Smart Home System has become even more challenging. ARGUS provides an overall network security coverage for both incoming and outgoing traffic, a firewall and an adaptive bandwidth management system and a sophisticated CCTV surveillance capability. ARGUS is such a system that is implemented into an existing router incorporating cloud and Machine Learning (ML) technology to ensure seamless connectivity across multiple devices, including IoT devices at a low migration cost for the customer. The aggregation of the above features makes ARGUS an ideal solution for existing Smart Home System service providers and users where hardware and infrastructure is also allocated. ARGUS was tested on a small-scale smart home environment with a Raspberry Pi 4 Model B controller. Its intrusion detection system identified an intrusion with 96% accuracy while the physical surveillance system predicts the user with 81% accuracy.
Authored by R.M. Ratnayake, G.D.N.D.K. Abeysiriwardhena, G.A.J. Perera, Amila Senarathne, R. Ponnamperuma, B.A. Ganegoda
In order to prevent malicious environment, more and more applications use anti-sandbox technology to detect the running environment. Malware often uses this technology against analysis, which brings great difficulties to the analysis of applications. Research on anti-sandbox countermeasure technology based on application virtualization can solve such problems, but there is no good solution for sensor simulation. In order to prevent detection, most detection systems can only use real device sensors, which brings great hidden dangers to users’ privacy. Aiming at this problem, this paper proposes and implements a sensor anti-sandbox countermeasure technology for Android system. This technology uses the CNN-LSTM model to identify the activity of the real machine sensor data, and according to the recognition results, the real machine sensor data is classified and stored, and then an automatic data simulation algorithm is designed according to the stored data, and finally the simulation data is sent back by using the Hook technology for the application under test. The experimental results show that the method can effectively simulate the data characteristics of the acceleration sensor and prevent the triggering of anti-sandbox behaviors.
Authored by Jin Yang, Yunqing Liu
Supervisory control and data acquisition (SCADA) systems play pivotal role in the operation of modern critical infrastructures (CIs). Technological advancements, innovations, economic trends, etc. have continued to improve SCADA systems effectiveness and overall CIs’ throughput. However, the trends have also continued to expose SCADA systems to security menaces. Intrusions and attacks on SCADA systems can cause service disruptions, equipment damage or/and even fatalities. The use of conventional intrusion detection models have shown trends of ineffectiveness due to the complexity and sophistication of modern day SCADA attacks and intrusions. Also, SCADA characteristics and requirement necessitate exceptional security considerations with regards to intrusive events’ mitigations. This paper explores the viability of supervised learning algorithms in detecting intrusions specific to SCADA systems and their communication protocols. Specifically, we examine four supervised learning algorithms: Random Forest, Naïve Bayes, J48 Decision Tree and Sequential Minimal Optimization-Support Vector Machines (SMO-SVM) for evaluating SCADA datasets. Two SCADA datasets were used for evaluating the performances of our approach. To improve the classification performances, feature selection using principal component analysis was used to preprocess the datasets. Using prominent classification metrics, the SVM-SMO presented the best overall results with regards to the two datasets. In summary, results showed that supervised learning algorithms were able to classify intrusions targeted against SCADA systems with satisfactory performances.
Authored by Oyeniyi Alimi, Khmaies Ouahada, Adnan Abu-Mahfouz, Suvendi Rimer, Kuburat Alimi
Cooperative secure computing based on the relationship between numerical value and numerical interval is not only the basic problems of secure multiparty computing but also the core problems of cooperative secure computing. It is of substantial theoretical and practical significance for information security in relation to scientific computing to continuously investigate and construct solutions to such problems. Based on the Goldwasser-Micali homomorphic encryption scheme, this paper propose the Morton rule, according to the characteristics of the interval, a double-length vector is constructed to participate in the exclusive-or operation, and an efficient cooperative decision-making solution for integer and integer interval security is designed. This solution can solve more basic problems in cooperative security computation after suitable transformations. A theoretical analysis shows that this solution is safe and efficient. Finally, applications that are based on these protocols are presented.
Authored by Shaofeng Lu, Chengzhe Lv, Wei Wang, Changqing Xu, Huadan Fan, Yuefeng Lu, Yulong Hu, Wenxi Li
Since the advent of the Software Defined Networking (SDN) in 2011 and formation of Open Networking Foundation (ONF), SDN inspired projects have emerged in various fields of computer networks. Almost all the networking organizations are working on their products to be supported by SDN concept e.g. openflow. SDN has provided a great flexibility and agility in the networks by application specific control functions with centralized controller, but it does not provide security guarantees for security vulnerabilities inside applications, data plane and controller platform. As SDN can also use third party applications, an infected application can be distributed in the network and SDN based systems may be easily collapsed. In this paper, a security threats assessment model has been presented which highlights the critical areas with security requirements in SDN. Based on threat assessment model a proposed Security Threats Assessment and Diagnostic System (STADS) is presented for establishing a reliable SDN framework. The proposed STADS detects and diagnose various threats based on specified policy mechanism when different components of SDN communicate with controller to fulfil network requirements. Mininet network emulator with Ryu controller has been used for implementation and analysis.
Authored by Pradeep Sharma, Brijesh Kumar, S.S Tyagi
The dynamic state of networks presents a challenge for the deployment of distributed applications and protocols. Ad-hoc schedules in the updating phase might lead to a lot of ambiguity and issues. By separating the control and data planes and centralizing control, Software Defined Networking (SDN) offers novel opportunities and remedies for these issues. However, software-based centralized architecture for distributed environments introduces significant challenges. Security is a main and crucial issue in SDN. This paper presents a deep study of the state-of-the-art of security challenges and solutions for the SDN paradigm. The conducted study helped us to propose a dynamic approach to efficiently detect different security violations and incidents caused by network updates including forwarding loop, forwarding black hole, link congestion, network policy violation, etc. Our solution relies on an intelligent approach based on the use of Machine Learning and Artificial Intelligence Algorithms.
Authored by Amina SAHBI, Faouzi JAIDI, Adel BOUHOULA
The development of autonomous agents have gained renewed interest, largely due to the recent successes of machine learning. Social robots can be considered a special class of autonomous agents that are often intended to be integrated into sensitive environments. We present experiences from our work with two specific humanoid social service robots, and highlight how eschewing privacy and security by design principles leads to implementations with serious privacy and security flaws. The paper introduces the robots as platforms and their associated features, ecosystems and cloud platforms that are required for certain use cases or tasks. The paper encourages design aims for privacy and security, and then in this light studies the implementation from two different manufacturers. The results show a worrisome lack of design focus in handling privacy and security. The paper aims not to cover all the security flaws and possible mitigations, but does look closer into the use of the WebSocket protocol and it’s challenges when used for operational control. The conclusions of the paper provide insights on how manufacturers can rectify the discovered security flaws and presents key policies like accountability when it comes to implementing technical features of autonomous agents.
Authored by Dennis Biström, Magnus Westerlund, Bob Duncan, Martin Jaatun
The new architecture of transformer networks proposed in the work can be used to create an intelligent chat bot that can learn the process of communication and immediately model responses based on what has been said. The essence of the new mechanism is to divide the information flow into two branches containing the history of the dialogue with different levels of granularity. Such a mechanism makes it possible to build and develop the personality of a dialogue agent in the process of dialogue, that is, to accurately imitate the natural behavior of a person. This gives the interlocutor (client) the feeling of talking to a real person. In addition, making modifications to the structure of such a network makes it possible to identify a likely attack using social engineering methods. The results obtained after training the created system showed the fundamental possibility of using a neural network of a new architecture to generate responses close to natural ones. Possible options for using such neural network dialogue agents in various fields, and, in particular, in information security systems, are considered. Possible options for using such neural network dialogue agents in various fields, and, in particular, in information security systems, are considered. The new technology can be used in social engineering attack detection systems, which is a big problem at present. The novelty and prospects of the proposed architecture of the neural network also lies in the possibility of creating on its basis dialogue systems with a high level of biological plausibility.
Authored by V. Ryndyuk, Y. Varakin, E. Pisarenko
The volume of SMS messages sent on a daily basis globally has continued to grow significantly over the past years. Hence, mobile phones are becoming increasingly vulnerable to SMS spam messages, thereby exposing users to the risk of fraud and theft of personal data. Filtering of messages to detect and eliminate SMS spam is now a critical functionality for which different types of machine learning approaches are still being explored. In this paper, we propose a system for detecting SMS spam using a semi-supervised novelty detection approach based on one class SVM classifier. The system is built as an anomaly detector that learns only from normal SMS messages thus enabling detection models to be implemented in the absence of labelled SMS spam training examples. We evaluated our proposed system using a benchmark dataset consisting of 747 SMS spam and 4827 non-spam messages. The results show that our proposed method out-performed the traditional supervised machine learning approaches based on binary, frequency or TF-IDF bag-of-words. The overall accuracy was 98% with 100% SMS spam detection rate and only around 3% false positive rate.
Authored by Suleiman Yerima, Abul Bashar
In today’s digital world, Mobile SMS (short message service) communication has almost become a part of every human life. Meanwhile each mobile user suffers from the harass of Spam SMS. These Spam SMS constitute veritable nuisance to mobile subscribers. Though hackers or spammers try to intrude in mobile computing devices, SMS support for mobile devices become more vulnerable as attacker tries to intrude into the system by sending unsolicited messages. An attacker can gain remote access over mobile devices. We propose a novel approach that can analyze message content and find features using the TF-IDF techniques to efficiently detect Spam Messages and Ham messages using different Machine Learning Classifiers. The Classifiers going to use in proposed work can be measured with the help of metrics such as Accuracy, Precision and Recall. In our proposed approach accuracy rate will be increased by using the Voting Classifier.
Authored by Ganesh Ubale, Siddharth Gaikwad
Community question answering (CQA) websites have become very popular platforms attracting numerous participants to share and acquire knowledge and information in Internet However, with the rapid growth of crowdsourcing systems, many malicious users organize collusive attacks against the CQA platforms for promoting a target (product or service) via posting suggestive questions and deceptive answers. These manipulate deceptive contents, aggregating into multiple collusive questions and answers (Q&As) spam groups, can fully control the sentiment of a target and distort the decision of users, which pollute the CQA environment and make it less credible. In this paper, we propose a Pattern and Burstiness based Collusive Q&A Spam Detection method (PBCSD) to identify the deceptive questions and answers. Specifically, we intensively study the campaign process of crowdsourcing tasks and summarize the clues in the Q&As’ vocabulary usage level when collusive attacks are launched. Based on the clues, we extract the Q&A groups using frequent pattern mining and further purify them by the burstiness on posting time of Q&As. By designing several discriminative features at the Q&A group level, multiple machine learning based classifiers can be used to judge the groups as deceptive or ordinary, and the Q&As in deceptive groups are finally identified as collusive Q&A spam. We evaluate the proposed PBCSD method in a real-world dataset collected from Baidu Zhidao, a famous CQA platform in China, and the experimental results demonstrate the PBCSD is effective for collusive Q&A spam detection and outperforms a number of state-of-art methods.
Authored by Mingming Xu, Lu Zhang, Haiting Zhu