Publications | Science of Security Virtual Organization

Focus on New Test Cases in Continuous Integration Testing based on Reinforcement Learning

Measurement and Metrics Testing - In software regression testing, newly added test cases are more likely to fail, and therefore, should be prioritized for execution. In software regression testing for continuous integration, reinforcement learning-based approaches are promising and the RETECS (Reinforced Test Case Prioritization and Selection) framework is a successful application case. RETECS uses an agent composed of a neural network to predict the priority of test cases, and the agent needs to learn from historical information to make improvements. However, the newly added test cases have no historical execution information, thus using RETECS to predict their priority is more like ‘random’. In this paper, we focus on new test cases for continuous integration testing, and on the basis of the RETECS framework, we first propose a priority assignment method for new test cases to ensure that they can be executed first. Secondly, continuous integration is a fast iterative integration method where new test cases have strong fault detection capability within the latest periods. Therefore, we further propose an additional reward method for new test cases. Finally, based on the full lifecycle management, the ‘new’ additional rewards need to be terminated within a certain period, and this paper implements an empirical study. We conducted 30 iterations of the experiment on 12 datasets and our best results were 19.24\%, 10.67\%, and 34.05 positions better compared to the best parameter combination in RETECS for the NAPFD (Normalized Average Percentage of Faults Detected), RECALL and TTF (Test to Fail) metrics, respectively.

Authored by Fanliang Chen, Zheng Li, Ying Shang, Yang Yang

Mobile Health Application Security Assesment Based on OWASP Top 10 Mobile Vulnerabilities

Measurement and Metrics Testing - The increase of smartphone users in Indonesia is the reason for various sectors to improve their services through mobile applications, including the healthcare sector. The healthcare sector is considered a critical sector as it stores various health data of its users classified as confidential. This is the basis for the need to conduct a security analysis for mobile health applications, which are widely used in Indonesia. MobSF (Mobile Security Framework) and MARA (Mobile Application Reverse Engineering and Analysis) Framework are mobile application security analysis methods capable of assessing security levels based on OWASP (Open Web Application Security Project) Mobile Top 10 2016 classification, CVSS (Common Vulnerability Scoring System) and CWE (Common Weakness Enumeration). It is expected that the test results with MobSF and MARA can provide a safety metric for mobile health applications as a means of safety information for users and application developers.

Authored by Dimas Priambodo, Guntur Ajie, Hendy Rahman, Aldi Nugraha, Aulia Rachmawati, Marcella Avianti

WiP: Applicability of ISO Standard Side-Channel Leakage Tests to NIST Post-Quantum Cryptography

Measurement and Metrics Testing - FIPS 140-3 is the main standard deﬁning security requirements for cryptographic modules in U.S. and Canada; commercially viable hardware modules generally need to be compliant with it. The scope of FIPS 140-3 will also expand to the new NIST Post-Quantum Cryptography (PQC) standards when migration from older RSA and Elliptic Curve cryptography begins. FIPS 140-3 mandates the testing of the effectiveness of “non-invasive attack mitigations”, or side-channel attack countermeasures. At higher security levels 3 and 4, the FIPS 140-3 side-channel testing methods and metrics are expected to be those of ISO 17825, which is based on the older Test Vector Leakage Assessment (TVLA) methodology. We discuss how to apply ISO 17825 to hardware modules that implement lattice-based PQC standards for public-key cryptography – Key Encapsulation Mechanisms (KEMs) and Digital Signatures. We ﬁnd that simple “random key” vs. “ﬁxed key” tests are unsatisfactory due to the close linkage between public and private components of PQC keypairs. While the general statistical testing approach and requirements can remain consistent with older public-key algorithms, a non-trivial challenge in creating ISO 17825 testing procedures for PQC is the careful design of test vector inputs so that only relevant Critical Security Parameter (CSP) leakage is captured in power, electromagnetic, and timing measurements.

Authored by Markku-Juhani Saarinen

Strong PUF Security Metrics: Response Sensitivity to Small Challenge Perturbations

Measurement and Metrics Testing - This paper belongs to a sequence of manuscripts that discuss generic and easy-to-apply security metrics for Strong PUFs. These metrics cannot and shall not fully replace in-depth machine learning (ML) studies in the security assessment of Strong PUF candidates. But they can complement the latter, serve in initial PUF complexity analyses, and are much easier and more efficient to apply: They do not require detailed knowledge of various ML methods, substantial computation times, or the availability of an internal parametric model of the studied PUF. Our metrics also can be standardized particularly easily. This avoids the sometimes inconclusive or contradictory findings of existing ML-based security test, which may result from the usage of different or non-optimized ML algorithms and hyperparameters, differing hardware resources, or varying numbers of challenge-response pairs in the training phase.

Authored by Fynn Kappelhoff, Rasmus Rasche, Debdeep Mukhopadhyay, Ulrich Rührmair

Coverage-Guided Fuzz Testing for Cyber-Physical Systems

Measurement and Metrics Testing - Fuzz testing is an indispensable test-generation tool in software security. Fuzz testing uses automated directed randomness to explore a variety of execution paths in software, trying to expose defects such as buffer overflows. Since cyber-physical systems (CPS) are often safety-critical, testing models of CPS can also expose faults. However, while existing coverage-guided fuzz testing methods are effective for software, results can be disappointing when applied to CPS, where systems have continuous states and inputs are applied at different points in time.

Authored by Sanaz Sheikhi, Edward Kim, Parasara Duggirala, Stanley Bak

An Efficient Metric-Based Approach for Static Use-After-Free Detection

Measurement and Metrics Testing - Nowadays, attackers are increasingly using UseAfter-Free(UAF) vulnerabilities to create threats against software security. Existing static approaches for UAF detection are capable of ﬁnding potential bugs in the large code base. In most cases, analysts perform manual inspections to verify whether the warnings detected by static analysis are real vulnerabilities. However, due to the complex constraints of constructing UAF vulnerability, it is very time and cost-intensive to screen all warnings. In fact, many warnings should be discarded before the manual inspection phase because they are almost impossible to get triggered in real-world, and it is often overlooked by current static analysis techniques.

Authored by Haolai Wei, Liwei Chen, Xiaofan Nie, Zhijie Zhang, Yuantong Zhang, Gang Shi

Automated Test Case Prioritization and Evaluation using Genetic Algorithm

Measurement and Metrics Testing - Software testing is one of the most critical and essential processes in the software development life cycle. It is the most significant aspect that affects product quality. Quality and service are critical success factors, particularly in the software business development market. As a result, enterprises must execute software testing and invest resources in it to ensure that their generated software products meet the needs and expectations of end-users. Test prioritization and evaluation are the key factors in determining the success of software testing. Test suit coverage metrics are commonly used to evaluate the testing process. Soft Computing techniques like Genetic Algorithms and Particle Swarm Optimization have gained prominence in various aspects of testing. This paper proposes an automated Genetic Algorithm approach to prioritizing the test cases and the evaluation through code coverage metrics with the Coverlet tool. Coverlet is a.NET code coverage tool that works across platforms and supports line, branch, and method coverage. Coverlet gathers data from Cobertura coverage test runs, which are then utilized to generate reports. Resultant test suits generated were validated and analyzed and have had significant improvement over the generations.

Authored by Baswaraju Swathi

Metrics for Assessing Security of System-on-Chip

Measurement and Metrics Testing - Due to the increasing complexity of modern heterogeneous System-on-Chips (SoC) and the growing vulnerabilities, security risk assessment and quantiﬁcation is required to measure the trustworthiness of a SoC. This paper describes a systematic approach to model the security risk of a system for malicious hardware attacks. The proposed method uses graph analysis to assess the impact of an attack and the Common Vulnerability Scoring System (CVSS) is used to quantify the security level of the system. To demonstrate the applicability of the proposed metric, we consider two open source SoC benchmarks with different architectures. The overall risk is calculated using the proposed metric by computing the exploitability and impact of attack on critical components of a SoC.

Authored by Sujan Saha, Joel Mbongue, Christophe Bobda

Improving the Derivation of Sound Security Metrics

Measurement and Metrics Testing - We continue to tackle the problem of poorly defined security metrics by building on and improving our previous work on designing sound security metrics. We reformulate the previous method into a set of conditions that are clearer and more widely applicable for deriving sound security metrics. We also modify and enhance some concepts that led to an unforeseen weakness in the previous method that was subsequently found by users, thereby eliminating this weakness from the conditions. We present examples showing how the conditions can be used to obtain sound security metrics. To demonstrate the conditions’ versatility, we apply them to show that an aggregate security metric made up of sound security metrics is also sound. This is useful where the use of an aggregate measure may be preferred, to more easily understand the security of a system.

Authored by George Yee

Secuirty Metrics for Logic Circuits

Measurement and Metrics Testing - Any type of engineered design requires metrics for trading off both desirable and undesirable properties. For integrated circuits, typical properties include circuit size, performance, power, etc., where for example, performance is a desirable property and power consumption is not. Security metrics, on the other hand, are extremely difﬁcult to develop because there are active adversaries that intend to compromise the protected circuitry. This implies metric values may not be static quantities, but instead are measures that degrade depending on attack effectiveness. In order to deal with this dynamic aspect of a security metric, a general attack model is proposed that enables the effectiveness of various security approaches to be directly compared in the context of an attack. Here, we describe, deﬁne and demonstrate that the metrics presented are both meaningful and measurable.

Authored by Ruben Purdy, Danielle Duvalsaint, R. Blanton

Metrics for Assessing Security of System-on-Chip

Due to the increasing complexity of modern hetero-geneous System-on-Chips (SoC) and the growing vulnerabilities, security risk assessment and quantification is required to measure the trustworthiness of a SoC. This paper describes a systematic approach to model the security risk of a system for malicious hardware attacks. The proposed method uses graph analysis to assess the impact of an attack and the Common Vulnerability Scoring System (CVSS) is used to quantify the security level of the system. To demonstrate the applicability of the proposed metric, we consider two open source SoC benchmarks with different architectures. The overall risk is calculated using the proposed metric by computing the exploitability and impact of attack on critical components of a SoC.

Authored by Sujan Saha, Joel Mbongue, Christophe Bobda

Improving the Derivation of Sound Security Metrics

We continue to tackle the problem of poorly defined security metrics by building on and improving our previous work on designing sound security metrics. We reformulate the previous method into a set of conditions that are clearer and more widely applicable for deriving sound security metrics. We also modify and enhance some concepts that led to an unforeseen weakness in the previous method that was subsequently found by users, thereby eliminating this weakness from the conditions. We present examples showing how the conditions can be used to obtain sound security metrics. To demonstrate the conditions' versatility, we apply them to show that an aggregate security metric made up of sound security metrics is also sound. This is useful where the use of an aggregate measure may be preferred, to more easily understand the security of a system.

Authored by George Yee

BenchAV: A Security Benchmarking Framework for Autonomous Driving

Autonomous vehicles (AVs) are capable of making driving decisions autonomously using multiple sensors and a complex autonomous driving (AD) software. However, AVs introduce numerous unique security challenges that have the potential to create safety consequences on the road. Security mechanisms require a benchmark suite and an evaluation framework to generate comparable results. Unfortunately, AVs lack a proper benchmarking framework to evaluate the attack and defense mechanisms and quantify the safety measures. This paper introduces BenchAV – a security benchmark suite and evaluation framework for AVs to address current limitations and pressing challenges of AD security. The benchmark suite contains 12 security and performance metrics, and an evaluation framework that automates the metric collection process using Carla simulator and Robot Operating System (ROS).

Authored by Mohammad Hoque, Mahmud Hossain, Ragib Hasan

Authorship Verification via Linear Correlation Methods of n-gram and Syntax Metrics

This research evaluates the accuracy of two methods of authorship prediction: syntactical analysis and n-gram, and explores its potential usage. The proposed algorithm measures n-gram, and counts adjectives, adverbs, verbs, nouns, punctuation, and sentence length from the training data, and normalizes each metric. The proposed algorithm compares the metrics of training samples to testing samples and predicts authorship based on the correlation they share for each metric. The severity of correlation between the testing and training data produces significant weight in the decision-making process. For example, if analysis of one metric approximates 100% positive correlation, the weight in the decision is assigned a maximum value for that metric. Conversely, a 100% negative correlation receives the minimum value. This new method of authorship validation holds promise for future innovation in fraud protection, the study of historical documents, and maintaining integrity within academia.

Authored by Jared Nelson, Mohammad Shekaramiz

Isolating Compiler Optimization Faults via Differentiating Finer-grained Options

Code optimization is an essential feature for compilers and almost all software products are released by compiler optimizations. Consequently, bugs in code optimization will inevitably cast significant impact on the correctness of software systems. Locating optimization bugs in compilers is challenging as compilers typically support a large amount of optimization configurations. Although prior studies have proposed to locate compiler bugs via generating witness test programs, they are still time-consuming and not effective enough. To address such limitations, we propose an automatic bug localization approach, ODFL, for locating compiler optimization bugs via differentiating finer-grained options in this study. Specifically, we first disable the fine-grained options that are enabled by default under the bug-triggering optimization levels independently to obtain bug-free and bug-related fine-grained options. We then configure several effective passing and failing optimization sequences based on such fine-grained options to obtain multiple failing and passing compiler coverage. Finally, such generated coverage information can be utilized via Spectrum-Based Fault Localization formulae to rank the suspicious compiler files. We run ODFL on 60 buggy GCC compilers from an existing benchmark. The experimental results show that ODFL significantly outperforms the state-of-the-art compiler bug isolation approach RecBi in terms of all the evaluated metrics, demonstrating the effectiveness of ODFL. In addition, ODFL is much more efficient than RecBi as it can save more than 88% of the time for locating bugs on average.

Authored by Jing Yang, Yibiao Yang, Maolin Sun, Ming Wen, Yuming Zhou, Hai Jin