Security is a critical aspect in the process of designing, developing, and testing software systems. Due to the increasing need for security-related skills within software systems, there is a growing demand for these skills to be taught in computer science. A series of security modules was developed not only to meet the demand but also to assess the impact of these modules on teaching critical cyber security topics in computer science courses. This full paper in the innovative practice category presents the outcomes of six security modules in a freshman-level course at two institutions. The study adopts a Model-Eliciting Activity (MEA) as a project for students to demonstrate an understanding of the security concepts. Two experimental studies were conducted: 1) Teaching effectiveness of implementing cyber security modules and MEA project, 2) Students’ experiences in conceptual modeling tasks in problem-solving. In measuring the effectiveness of teaching security concepts with the MEA project, students’ performance, attitudes, and interests as well as the instructor’s effectiveness were assessed. For the conceptual modeling tasks in problem-solving, the results of student outcomes were analyzed. After implementing the security modules with the MEA project, students showed a great understanding of cyber security concepts and an increased interest in broader computer science concepts. The instructor’s beliefs about teaching, learning, and assessment shifted from teacher-centered to student-centered during their experience with the security modules and MEA project. Although 64.29% of students’ solutions do not seem suitable for real-world implementation, 76.9% of the developed solutions showed a sufficient degree of creativity.
Authored by Jeong Yang, Young Kim, Brandon Earwood
This study explores the Critical Success Factors (CSFs) for Security Education, Training and Awareness (SETA) programmes. Data is gathered from 20 key informants (using semi-structured interviews) from various geographic locations including the Gulf nations, Middle East, USA, UK, and Ireland. The analysis of these key informant interviews produces eleven CSFs for SETA programmes. These CSFs are mapped along the phases of a SETA programme lifecycle (design, development, implementation, and evaluation).
Authored by Areej Alyami, David Sammon, Karen Neville, Carolanne Mahony
The Internet of Things is an emerging technology for recent marketplace. In IoT, the heterogeneous devices are connected through the medium of the Internet for seamless communication. The devices used in IoT are resource-constrained in terms of memory, power and processing. Due to that, IoT system is unable to implement hi-end security for malicious cyber-attacks. The recent era is all about connecting IoT devices in various domains like medical, agriculture, transport, power, manufacturing, supply chain, education, etc. and thus need to be prevented from attacks and analyzed after attacks for legal action. The legal analysis of IoT data, devices and communication is called IoT forensics which is highly indispensable for various types of attacks on IoT system. This paper will review types of IoT attacks and its preventive measures in cyber security. It will also help in ascertaining IoT forensics and its challenges in detail. This paper will conclude with the high requirement of cyber security in IoT domains with implementation of standard rules for IoT forensics.
Authored by Madhavi Dave
Unsupervised cross-domain NER task aims to solve the issues when data in a new domain are fully-unlabeled. It leverages labeled data from source domain to predict entities in unlabeled target domain. Since training models on large domain corpus is time-consuming, in this paper, we consider an alternative way by introducing syntactic dependency structure. Such information is more accessible and can be shared between sentences from different domains. We propose a novel framework with dependency-aware GNN (DGNN) to learn these common structures from source domain and adapt them to target domain, alleviating the data scarcity issue and bridging the domain gap. Experimental results show that our method outperforms state-of-the-art methods.
Authored by Luchen Liu, Xixun Lin, Peng Zhang, Lei Zhang, Bin Wang
Guidelines, directives, and policy statements are usually presented in “linear” text form - word after word, page after page. However necessary, this practice impedes full understanding, obscures feedback dynamics, hides mutual dependencies and cascading effects and the like-even when augmented with tables and diagrams. The net result is often a checklist response as an end in itself. All this creates barriers to intended realization of guidelines and undermines potential effectiveness. We present a solution strategy using text as “data”, transforming text into a structured model, and generate network views of the text(s), that we then can use for vulnerability mapping, risk assessments and note control point analysis. For proof of concept we draw on NIST conceptual model and analysis of guidelines for smart grid cybersecurity, more than 600 pages of text.
Authored by Nazli Choucri, Gaurav Agarwal
Deep learning models rely on single word features and location features of text to achieve good results in text relation extraction tasks. However, previous studies have failed to make full use of semantic information contained in sentence dependency syntax trees, and data sparseness and noise propagation still affect classification models. The BERT(Bidirectional Encoder Representations from Transformers) pretrained language model provides a better representation of natural language processing tasks. And entity enhancement methods have been proved to be effective in relation extraction tasks. Therefore, this paper proposes a combination of the shortest dependency path and entity-enhanced BERT pre-training language model for model construction to reduce the impact of noise terms on the classification model and obtain more semantically expressive feature representation. The algorithm is tested on SemEval-2010 Task 8 English relation extraction dataset, and the F1 value of the final experiment can reach 0. 881.
Authored by Zeyu Sun, Chi Zhang
Cyber Attack is the most challenging issue all over the world. Nowadays, Cyber-attacks are increasing on digital systems and organizations. Innovation and utilization of new digital technology, infrastructure, connectivity, and dependency on digital strategies are transforming day by day. The cyber threat scope has extended significantly. Currently, attackers are becoming more sophisticated, well-organized, and professional in generating malware programs in Python, C Programming, C++ Programming, Java, SQL, PHP, JavaScript, Ruby etc. Accurate attack modeling techniques provide cyber-attack planning, which can be applied quickly during a different ongoing cyber-attack. This paper aims to create a new cyber-attack model that will extend the existing model, which provides a better understanding of the network’s vulnerabilities.Moreover, It helps protect the company or private network infrastructure from future cyber-attacks. The final goal is to handle cyber-attacks efficacious manner using attack modeling techniques. Nowadays, many organizations, companies, authorities, industries, and individuals have faced cybercrime. To execute attacks using our model where honeypot, the firewall, DMZ and any other security are available in any environment.
Authored by Mostafa Al-Amin, Mirza Khatun, Mohammed Uddin
To solve the problem of an excessive number of component vulnerabilities and limited defense resources in industrial cyber physical systems, a method for analyzing security critical components of system is proposed. Firstly, the components and vulnerability information in the system are modeled based on SysML block definition diagram. Secondly, as SysML block definition diagram is challenging to support direct analysis, a block security dependency graph model is proposed. On this basis, the transformation rules from SysML block definition graph to block security dependency graph are established according to the structure of block definition graph and its vulnerability information. Then, the calculation method of component security importance is proposed, and a security critical component analysis tool is designed and implemented. Finally, an example of a Drone system is given to illustrate the effectiveness of the proposed method. The application of this method can provide theoretical and technical support for selecting key defense components in the industrial cyber physical system.
Authored by Junjie Zhao, Bingfeng Xu, Xinkai Chen, Bo Wang, Gaofeng He
Modern vehicles have multiple electronic control units (ECUs) that are connected together as part of a complex distributed cyber-physical system (CPS). The ever-increasing communication between ECUs and external electronic systems has made these vehicles particularly susceptible to a variety of cyber-attacks. In this work, we present a novel anomaly detection framework called TENET to detect anomalies induced by cyber-attacks on vehicles. TENET uses temporal convolutional neural networks with an integrated attention mechanism to learn the dependency between messages traversing the in-vehicle network. Post deployment in a vehicle, TENET employs a robust quantitative metric and classifier, together with the learned dependencies, to detect anomalous patterns. TENET is able to achieve an improvement of 32.70% in False Negative Rate, 19.14% in the Mathews Correlation Coefficient, and 17.25% in the ROC-AUC metric, with 94.62% fewer model parameters, and 48.14% lower inference time compared to the best performing prior works on automotive anomaly detection.
Authored by Sooryaa Thiruloga, Vipin Kukkala, Sudeep Pasricha
Steady advancement in Artificial Intelligence (AI) development over recent years has caused AI systems to become more readily adopted across industry and military use-cases globally. As powerful as these algorithms are, there are still gaping questions regarding their security and reliability. Beyond adversarial machine learning, software supply chain vulnerabilities and model backdoor injection exploits are emerging as potential threats to the physical safety of AI reliant CPS such as autonomous vehicles. In this work in progress paper, we introduce the concept of AI supply chain vulnerabilities with a provided proof of concept autonomous exploitation framework. We investigate the viability of algorithm backdoors and software third party library dependencies for applicability into modern AI attack kill chains. We leverage an autonomous vehicle case study for demonstrating the applicability of our offensive methodologies within a realistic AI CPS operating environment.
Authored by Daniel Williams, Chelece Clark, Rachel McGahan, Bradley Potteiger, Daniel Cohen, Patrick Musau
Third-party libraries with rich functionalities facilitate the fast development of JavaScript software, leading to the explosive growth of the NPM ecosystem. However, it also brings new security threats that vulnerabilities could be introduced through dependencies from third-party libraries. In particular, the threats could be excessively amplified by transitive dependencies. Existing research only considers direct dependencies or reasoning transitive dependencies based on reachability analysis, which neglects the NPM-specific dependency resolution rules as adapted during real installation, resulting in wrongly resolved dependencies. Consequently, further fine-grained analysis, such as precise vulnerability propagation and their evolution over time in dependencies, cannot be carried out precisely at a large scale, as well as deriving ecosystem-wide solutions for vulnerabilities in dependencies. To fill this gap, we propose a knowledge graph-based dependency resolution, which resolves the inner dependency relations of dependencies as trees (i.e., dependency trees), and investigates the security threats from vulnerabilities in dependency trees at a large scale. Specifically, we first construct a complete dependency-vulnerability knowledge graph (DVGraph) that captures the whole NPM ecosystem (over 10 million library versions and 60 million well-resolved dependency relations). Based on it, we propose a novel algorithm (DTResolver) to statically and precisely resolve dependency trees, as well as transitive vulnerability propagation paths, for each package by taking the official dependency resolution rules into account. Based on that, we carry out an ecosystem-wide empirical study on vulnerability propagation and its evolution in dependency trees. Our study unveils lots of useful findings, and we further discuss the lessons learned and solutions for different stakeholders to mitigate the vulnerability impact in NPM based on our findings. For example, we implement a dependency tree based vulnerability remediation method (DTReme) for NPM packages, and receive much better performance than the official tool (npm audit fix).
Authored by Chengwei Liu, Sen Chen, Lingling Fan, Bihuan Chen, Yang Liu, Xin Peng
Legacy programs are normally monolithic (that is, all code runs in a single process and is not partitioned), and a bug in a program may result in the entire program being vulnerable and therefore untrusted. Program partitioning can be used to separate a program into multiple partitions, so as to isolate sensitive data or privileged operations. Manual program partitioning requires programmers to rewrite the entire source code, which is cumbersome, error-prone, and not generic. Automatic program partitioning tools can separate programs according to the dependency graph constructed based on data or programs. However, programmers still need to manually implement remote service interfaces for inter-partition communication. Therefore, in this paper, we propose AutoSlicer, whose purpose is to partition a program more automatically, so that the programmer is only required to annotate sensitive data. AutoSlicer constructs accurate data dependency graphs (DDGs) by enabling execution flow graphs, and the DDG-based partitioning algorithm can compute partition information based on sensitive annotations. In addition, the code refactoring toolchain can automatically transform the source code into sensitive and insensitive partitions that can be deployed on the remote procedure call framework. The experimental evaluation shows that AutoSlicer can effectively improve the accuracy (13%-27%) of program partitioning by enabling EFG, and separate real-world programs with a relatively smaller performance overhead (0.26%-9.42%).
Authored by Weizhong Qiang, Hao Luo
Cyber-Physical Systems (CPS) are systems that contain digital embedded devices while depending on environmental influences or external configurations. Identifying relevant influences of a CPS as well as modeling dependencies on external influences is difficult. We propose to learn these dependencies with decision trees in combination with clustering. The approach allows to automatically identify relevant influences and receive a data-related explanation of system behavior involving the system's use-case. Our paper presents a case study of our method for a Real-Time Localization System (RTLS) proving the usefulness of our approach, and discusses further applications of a learned decision tree.
Authored by Swantje Plambeck, Görschwin Fey, Jakob Schyga, Johannes Hinckeldeyn, Jochen Kreutzfeldt
With the rapid development of cloud storage technology, an increasing number of enterprises and users choose to store data in the cloud, which can reduce the local overhead and ensure safe storage, sharing, and deletion. In cloud storage, safe data deletion is a critical and challenging problem. This paper proposes an assured data deletion scheme based on multi-authoritative users in the semi-trusted cloud storage scenario (MAU-AD), which aims to realize the secure management of the key without introducing any trusted third party and achieve assured deletion of cloud data. MAU-AD uses access policy graphs to achieve fine-grained access control and data sharing. Besides, the data security is guaranteed by mutual restriction between authoritative users, and the system robustness is improved by multiple authoritative users jointly managing keys. In addition, the traceability of misconduct in the system can be realized by blockchain technology. Through simulation experiments and comparison with related schemes, MAU-AD is proven safe and effective, and it provides a novel application scenario for the assured deletion of cloud storage data.
Authored by Junfeng Tian, Ruxin Bai, Tianfeng Zhang
Data secure deletion operation in storage media is an important function of data security management. The internal physical properties of SSDs are different from hard disks, and data secure deletion of disks can not apply to SSDs directly. Copyback operation is used to improve the data migration performance of SSDs but is rarely used due to error accumulation issue. We propose a data securely deletion algorithm based on copyback operation, which improves the efficiency of data secure deletion without affecting the reliability of data. First, this paper proves that the data secure delete operation takes a long time on the channel bus, increasing the I/O overhead, and reducing the performance of the SSDs. Secondly, this paper designs an efficient data deletion algorithm, which can process read requests quickly. The experimental results show that the proposed algorithm can reduce the response time of read requests by 21% and the response time of delete requests by 18.7% over the existing algorithm.
Authored by Rongzhen Zhu, Yuchen Wang, Pengpeng Bai, Zhiming Liang, Weiguo Wu, Lei Tang
Documents are a common method of storing infor-mation and one of the most conventional forms of expression of ideas. Cloud servers store a user's documents with thousands of other users in place of physical storage devices. Indexes corresponding to the documents are also stored at the cloud server to enable the users to retrieve documents of their interest. The index includes keywords, document identities in which the keywords appear, along with Term Frequency-Inverse Document Frequency (TF-IDF) values which reflect the keywords' relevance scores of the dataset. Currently, there are no efficient methods to delete keywords from millions of documents over cloud servers while avoiding any compromise to the user's privacy. Most of the existing approaches use algorithms that divide a bigger problem into sub-problems and then combine them like divide and conquer problems. These approaches don't focus entirely on fine-grained deletion. This work is focused on achieving fine-grained deletion of keywords by keeping the size of the TF-IDF matrix constant after processing the deletion query, which comprises of keywords to be deleted. The experimental results of the proposed approach confirm that the precision of ranked search still remains very high after deletion without recalculation of the TF-IDF matrix.
Authored by Kushagra Lavania, Gaurang Gupta, D.V.N. Kumar
At present, cloud service providers control the direct management rights of cloud data, and cloud data cannot be effectively and assured deleted, which may easily lead to security problems such as data residue and user privacy leakage. This paper analyzes the related research work of cloud data assured deletion in recent years from three aspects: encryption key deletion, multi-replica association deletion, and verifiable deletion. The advantages and disadvantages of various deletion schemes are analysed in detail, and finally the prospect of future research on assured deletion of cloud data is given.
Authored by Bin Li, Yu Fu, Kun Wang
With the rapid development of general cloud services, more and more individuals or collectives use cloud platforms to store data. Assured data deletion deserves investigation in cloud storage. In time-sensitive data storage scenarios, it is necessary for cloud platforms to automatically destroy data after the data owner-specified expiration time. Therefore, assured time-sensitive data deletion should be sought. In this paper, a fine-grained assured time-sensitive data deletion (ATDD) scheme in cloud storage is proposed by embedding the time trapdoor in Ciphertext-Policy Attribute-Based Encryption (CP-ABE). Time-sensitive data is self-destructed after the data owner-specified expiration time so that the authorized users cannot get access to the related data. In addition, a credential is returned to the data owner for data deletion verification. This proposed scheme provides solutions for fine-grained access control and verifiable data self-destruction. Detailed security and performance analysis demonstrate the security and the practicability of the proposed scheme.
Authored by Zhengyu Yue, Yuanzhi Yao, Weihai Li, Nenghai Yu
Functional dependencies (FDs) are widely applied in data management tasks. Since FDs on data are usually unknown, FD discovery techniques are studied for automatically finding hidden FDs from data. In this paper, we develop techniques to dynamically discover FDs in response to changes on data. Formally, given the complete set Σ of minimal and valid FDs on a relational instance r, we aim to find the complete set Σ$^\textrm\textbackslashprime$ of minimal and valid FDs on røplus\textbackslashDelta r, where \textbackslashDelta r is a set of tuple insertions and deletions. Different from the batch approaches that compute Σ$^\textrm\textbackslashprime$ on røplus\textbackslashDelta r from scratch, our dynamic method computes Σ$^\textrm\textbackslashprime$ in response to \textbackslashtriangle\textbackslashuparrow. by leveraging the known Σ on r, and avoids processing the whole of r for each update from \textbackslashDelta r. We tackle dynamic FD discovery on røplus\textbackslashDelta r by dynamic hitting set enumeration on the difference-set of røplus\textbackslashDelta r. Specifically, (1) leveraging auxiliary structures built on r, we first present an efficient algorithm to update the difference-set of r to that of røplus\textbackslashDelta r. (2) We then compute Σ$^\textrm\textbackslashprime$, by recasting dynamic FD discovery as dynamic hitting set enumeration on the difference-set of røplus\textbackslashDelta r and developing novel techniques for dynamic hitting set enumeration. (3) We finally experimentally verify the effectiveness and efficiency of our approaches, using real-life and synthetic data. The results show that our dynamic FD discovery method outperforms the batch counterparts on most tested data, even when \textbackslashDelta r is up to 30 % of r.
Authored by Renjie Xiao, Yong'an Yuan, Zijing Tan, Shuai Ma, Wei Wang
Data is a collection of information from the activities of the real world. The file in which such data is stored after transforming into a form that machines can process is generally known as data set. In the real world, many data sets are not complete, and they contain various types of noise. Missing values is of one such kind. Thus, imputing data of these missing values is one of the significant task of data pre-processing. This paper deals with two real time health care data sets namely life expectancy (LE) dataset and chronic kidney disease (CKD) dataset, which are very different in their nature. This paper provides insights on various data imputation techniques to fill missing values by analyzing them. When coming to Data imputation, it is very common to impute the missing values with measure of central tendencies like mean, median, mode Which can represent the central value of distribution but choosing the apt choice is real challenge. In accordance with best of our knowledge this is the first and foremost paper which provides the complete analysis of impact of basic data imputation techniques on various data distributions which can be classified based on the size of data set, number of missing values, type of data (categorical/numerical), etc. This paper compared and analyzed the original data distribution with the data distribution after each imputation in terms of their skewness, outliers and by various descriptive statistic parameters.
Authored by Sainath Sankepally, Nishoak Kosaraju, Mallikharjuna Rao
Missing values are an unavoidable problem for classification tasks of machine learning in medical data. With the rapid development of the medical system, large scale medical data is increasing. Missing values increase the difficulty of mining hidden but useful information in these medical datasets. Deletion and imputation methods are the most popular methods for dealing with missing values. Existing studies ignored to compare and discuss the deletion and imputation methods of missing values under the row missing rate and the total missing rate. Meanwhile, they rarely used experiment data sets that are mixed-type and large scale. In this work, medical data sets of various sizes and mixed-type are used. At the same time, performance differences of deletion and imputation methods are compared under the MCAR (Missing Completely At Random) mechanism in the baseline task using LR (Linear Regression) and SVM (Support Vector Machine) classifiers for classification with the same row and total missing rates. Experimental results show that under the MCAR missing mechanism, the performance of two types of processing methods is related to the size of datasets and missing rates. As the increasing of missing rate, the performance of two types for processing missing values decreases, but the deletion method decreases faster, and the imputation methods based on machine learning have more stable and better classification performance on average. In addition, small data sets are easily affected by processing methods of missing values.
Authored by Lijuan Ren, Tao Wang, Aicha Seklouli, Haiqing Zhang, Abdelaziz Bouras
This article proposes a health monitoring system platform for cross-river bridges based on big data. The system can realize regionalized bridge operation and maintenance management. The system has functions such as registration modification and deletion of sensor equipment, user registration modification and deletion, real-time display and storage of sensor monitoring data, and evaluation and early warning of bridge structure safety. The sensor is connected to the lower computer through the serial port, analog signal, fiber grating signal, etc. The lower computer converts a variety of signals into digital signals through the single-chip A/D sampling and demodulator, etc., and transmits it to the upper computer through the serial port. The upper computer uses ARMCortex-A9 Run the main program to realize multi-threaded network communication. The system platform is to test the validity of the model, and a variety of model verification methods are used for evaluation to ensure the reliability of the big data analysis method.
Authored by Di Yang, Lianfa Wang, Yufeng Zhang
With the advent of the era of big data, the files that need to be stored in the storage system will increase exponentially. Cloud storage has become the most popular data storage method due to its powerful convenience and storage capacity. However, in order to save costs, some cloud service providers, Malicious deletion of the user's infrequently accessed data causes the user to suffer losses. Aiming at data integrity and privacy issues, a blockchain-based cloud storage integrity verification scheme for recoverable data is proposed. The scheme uses the Merkle tree properties, anonymity, immutability and smart contracts of the blockchain to effectively solve the problems of cloud storage integrity verification and data damage recovery, and has been tested and analyzed that the scheme is safe and effective.
Authored by Ma Haifeng, Zhang Ji
In the present era of the internet, image watermarking schemes are used to provide content authentication, security and reliability of various multimedia contents. In this paper image watermarking scheme which utilizes the properties of Integer Wavelet Transform (IWT), Schur decomposition and Singular value decomposition (SVD) based is proposed. In the suggested method, the cover image is subjected to a 3-level Integer wavelet transform (IWT), and the HH3 subband is subjected to Schur decomposition. In order to retrieve its singular values, the upper triangular matrix from the HH3 subband’s Schur decomposition is then subjected to SVD. The watermark image is first encrypted using a chaotic map, followed by the application of a 3-level IWT to the encrypted watermark and the usage of singular values of the LL-subband to embed by manipulating the singular values of the processed cover image. The proposed scheme is tested under various attacks like filtering (median, average, Gaussian) checkmark (histogram equalization, rotation, horizontal and vertical flipping) and noise (Gaussian, Salt & Pepper Noise). The suggested scheme provides strong robustness against numerous attacks and chaotic encryption provides security to watermark.
Authored by Anurag Tiwari, Vinay Srivastava
Model compression is one of the most preferred techniques for efficiently deploying deep neural networks (DNNs) on resource- constrained Internet of Things (IoT) platforms. However, the simply compressed model is often vulnerable to adversarial attacks, leading to a conflict between robustness and efficiency, especially for IoT devices exposed to complex real-world scenarios. We, for the first time, address this problem by developing a novel framework dubbed Magical-Decomposition to simultaneously enhance both robustness and efficiency for hardware. By leveraging a hardware-friendly model compression method called singular value decomposition, the defending algorithm can be supported by most of the existing DNN hardware accelerators. To step further, by using a recently developed DNN interpretation tool, the underlying scheme of how the adversarial accuracy can be increased in the compressed model is highlighted clearly. Ablation studies and extensive experiments under various attacks/models/datasets consistently validate the effectiveness and scalability of the proposed framework.
Authored by Xin Cheng, Mei-Qi Wang, Yu-Bo Shi, Jun Lin, Zhong-Feng Wang