Publications | Science of Security Virtual Organization

Industrial Asset Management and Secure Sharing for an XAI Manufacturing Platform

Explainable AI is an emerging field that aims to address how black-box decisions of AI systems are made, by attempting to understand the steps and models involved in this decision-making. Explainable AI in manufacturing is supposed to deliver predictability, agility, and resiliency across targeted manufacturing apps. In this context, large amounts of data, which can be of high sensitivity and various formats need to be securely and efficiently handled. This paper proposes an Asset Management and Secure Sharing solution tailored to the Explainable AI and Manufacturing context in order to tackle this challenge. The proposed asset management architecture enables an extensive data management and secure sharing solution for industrial data assets. Industrial data can be pulled, imported, managed, shared, and tracked with a high level of security using this design. This paper describes the solution´s overall architectural design and gives an overview of the functionalities and incorporated technologies of the involved components, which are responsible for data collection, management, provenance, and sharing as well as for overall security.

Authored by Sangeetha Reji, Jonas Hetterich, Stamatis Pitsios, Vasilis Gkolemi, Sergi Perez-Castanos, Minas Pertselakis

Industrial Asset Management and Secure Sharing for an XAI Manufacturing Platform

Explainable AI is an emerging field that aims to address how black-box decisions of AI systems are made, by attempting to understand the steps and models involved in this decision-making. Explainable AI in manufacturing is supposed to deliver predictability, agility, and resiliency across targeted manufacturing apps. In this context, large amounts of data, which can be of high sensitivity and various formats need to be securely and efficiently handled. This paper proposes an Asset Management and Secure Sharing solution tailored to the Explainable AI and Manufacturing context in order to tackle this challenge. The proposed asset management architecture enables an extensive data management and secure sharing solution for industrial data assets. Industrial data can be pulled, imported, managed, shared, and tracked with a high level of security using this design. This paper describes the solution´s overall architectural design and gives an overview of the functionalities and incorporated technologies of the involved components, which are responsible for data collection, management, provenance, and sharing as well as for overall security.

Authored by Sangeetha Reji, Jonas Hetterich, Stamatis Pitsios, Vasilis Gkolemi, Sergi Perez-Castanos, Minas Pertselakis

Tell Me More: Black Box Explainability for APT Detection on System Provenance Graphs

Nowadays, companies, critical infrastructure and governments face cyber attacks every day ranging from simple denial-of-service and password guessing attacks to complex nationstate attack campaigns, so-called advanced persistent threats (APTs). Defenders employ intrusion detection systems (IDSs) among other tools to detect malicious activity and protect network assets. With the evolution of threats, detection techniques have followed with modern systems usually relying on some form of artificial intelligence (AI) or anomaly detection as part of their defense portfolio. While these systems are able to achieve higher accuracy in detecting APT activity, they cannot provide much context about the attack, as the underlying models are often too complex to interpret. This paper presents an approach to explain single predictions (i. e., detected attacks) of any graphbased anomaly detection systems. By systematically modifying the input graph of an anomaly and observing the output, we leverage a variation of permutation importance to identify parts of the graph that are likely responsible for the detected anomaly. Our approach treats the anomaly detection function as a black box and is thus applicable to any whole-graph explanation problems. Our results on two established datasets for APT detection (StreamSpot \& DARPA TC Engagement Three) indicate that our approach can identify nodes that are likely part of the anomaly. We quantify this through our area under baseline (AuB) metric and show how the AuB is higher for anomalous graphs. Further analysis via the Wilcoxon rank-sum test confirms that these results are statistically significant with a p-value of 0.0041\%.

Authored by Felix Welter, Florian Wilkens, Mathias Fischer

Cloud-Fog Trustworthy Computing for Information Sharing in Dynamic IoT System

Fog computing moves computation from the cloud to edge devices to support IoT applications with faster response times and lower bandwidth utilization. IoT users and linked gadgets are at risk to security and privacy breaches because of the high volume of interactions that occur in IoT environments. These features make it very challenging to maintain and quickly share dynamic IoT data. In this method, cloud-fog offers dependable computing for data sharing in a constantly changing IoT system. The extended IoT cloud, which initially offers vertical and horizontal computing architectures, then combines IoT devices, edge, fog, and cloud into a layered infrastructure. The framework and supporting mechanisms are designed to handle trusted computing by utilising a vertical IoT cloud architecture to protect the IoT cloud after the issues have been taken into account. To protect data integrity and information flow for different computing models in the IoT cloud, an integrated data provenance and information management method is selected. The effectiveness of the dynamic scaling mechanism is then contrasted with that of static serving instances.

Authored by Bommi Prasanthi, Dharavath Veeraswamy, Sravan Abhilash, Kesham Ganesh

Should robots indicate the trustworthiness of information from knowledge graphs?

The demo presents recent work on social robots that provide information from knowledge graphs in online graph databases. Sometimes more cooperative responses can be generated by using taxonomies and other semantic metadata that has been added to the knowledge graphs. Sometimes metadata about data provenance suggests higher or lower trustworthiness of the data. This raises the question whether robots should indicate trustworthiness when providing the information, and whether this should be done explicitly by meta-level comments or implicitly for example by modulating the robots’ tone of voice and generating positive and negative affect in the robots’ facial expressions.

Authored by Graham Wilcock, Kristiina Jokinen

Blockchain-Based Transparency and Data Provenance in the Wine Value Chain

Provenance 2022 - The food market is changing dramatically in the last century as the world population is growing with the unprecedented pace. The wine industry is recognized as part of both agriculture and food industry, but also as a commodity. Developments in information technology and digitalization are playing a major role in the introduction of new solutions in agriculture and food production. The idea is to improve productivity of farms and vineyards, improve quality of agriculture products by optimizing irrigation, pesticide usage, and overall efficiency of the process. Furthermore, the consumer awareness about food products, its quality and origin, is on the constant rise. The information about the product throughout the whole “farm-to-fork”, or in this case “vineyard-to-glass”, value chain needs to be collected and utilized by all the participating stakeholders, in order to get a better, healthier, and more affordable product. This paper address the considerations related to implementation of a blockchain-based transparency and data provenance in the food value chain, more specifically with a focus on the wine industry.

Authored by Tomo Popovic, Srdjan Krco, Nemanja Misic, Aleksandra Martinovic, Ivan Jovovic

A Domain-Specific Composition Environment for Provenance Query of Scientific Workflows

Provenance 2022 - Scientiﬁc Workﬂow Management Systems (SWfMS) systematically capture and store diverse provenance information at various phases. Scientists compose multitude of queries on this information. The support of integrated query composition and visualization in existing SWfMS is limited. Most systems do not support any custom query composition. VisTrails and Taverna introduced custom query languages vtPQL and TriQL to support limited workﬂow monitoring. Galaxy only tracks histories of operations and displays in lists. No SWfMS supports a scientistfriendly user interface for provenance query composition and visualization. In this paper, we propose a domain-speciﬁc composition environment for provenance query of scientiﬁc workﬂows. As a proof of concept, we developed a provenance system for bioinformatics workﬂow management system and evaluated it in multiple dimensions, one for measuring the subjective perception of participants on the usability of it using NASA-TLX and SUS survey instruments and the other for measuring the ﬂexibility through plugin integration using NASA-TLX.

Authored by Muhammad Hossain, Banani Roy, Chanchal Roy, Kevin Schneider

ProProv: A Language and Graphical Tool for Specifying Data Provenance Policies

Provenance 2022 - The Function-as-a-Service cloud computing paradigm has made large-scale application development convenient and efficient as developers no longer need to deploy or manage the necessary infrastructure themselves. However, as a consequence of this abstraction, developers lose insight into how their code is executed and data is processed. Cloud providers currently offer little to no assurance of the integrity of customer data. One approach to robust data integrity verification is the analysis of data provenance—logs that describe the causal history of data, applications, users, and non-person entities. This paper introduces ProProv, a new domain-specific language and graphical user interface for specifying policies over provenance metadata to automate provenance analyses.

Authored by Kevin Dennis, Shamaria Engram, Tyler Kaczmarek, Jay Ligatti

A Conceptual Framework for Automated Rule Generation in Provenance-based Intrusion Detection Systems

Provenance 2022 - Traditional Intrusion Detection Systems (IDS) are struggling to keep up with the increase in sophisticated cyberattacks such as Advanced Persistent Threats (APT) over the past years. Provenance-based Intrusion Detection Systems (PIDS) utilize data provenance concepts to enable ﬁne-grained event correlation, and the results show increased detection accuracy and reduced false-alarm rates compared to traditional IDS. Especially, rule-based approaches for the PIDS have demonstrated high detection accuracy, low false alarm, and fast detection time. However, rules are manually created by security experts, which is time-consuming and doesn’t ensure high-quality rule standards. To address this issue, we propose an automated rule generation framework to generate robust rules to describe malicious ﬁles automatically. As a result, high-quality rules can be used in PIDS to identify similar attacks and other affected systems promptly.

Authored by Michael Zipperle, Florian Gottwalt, Yu Zhang, Omar Hussain, Elizabeth Chang, Tharam Dillon

An Interaction Provenance-based Trust Management Scheme For Connected Vehicles

Provenance 2022 - Connected vehicles (CVs) have facilitated the development of intelligent transportation system that supports critical safety information sharing with minimum latency. However, CVs are vulnerable to different external and internal attacks. Though cryptographic techniques can mitigate external attacks, preventing internal attacks imposes challenges due to authorized but malicious entities. Thwarting internal attacks require identifying the trustworthiness of the participating vehicles. This paper proposes a trust management framework for CVs using interaction provenance that ensures privacy, considers both in-vehicle and vehicular network security incidents, and supports flexible security policies. For this purpose, we present an interaction provenance recording and trust management protocol. Different events are extracted from interaction provenance, and trustworthiness is calculated using fuzzy policies based on the events.

Authored by Mohammad Hoque, Ragib Hasan

Deepro: Provenance-based APT Campaigns Detection via GNN

Provenance 2022 - Advanced Persistent Threats (APTs) are typically sophisticated, stealthy and long-term attacks that are difficult to be detected and investigated. Recently proposed provenance graph based on system audit logs has become an important approach for APT detection and investigation. However, existing provenance-based approaches that either require rules based on expert knowledge or cannot pinpoint attack events in a provenance graph still cannot effectively mitigate APT attacks. In this paper, we present Deepro, a provenance-based APT campaign detection approach that not only effectively detects attack-relevant entities in a provenance graph but also precisely recovers APT campaigns based on the detected entities. Specifically, Deepro first customizes a general purpose GNN (Graph Neural Network) model to represent and detect process nodes in a provenance graph through automatically learning different patterns of attack behaviors and benign behaviors using the rich contextual information in the provenance graph. Then, Deepro further detects attack-relevant file and network entities according to their data dependencies with the detected process nodes. Finally, Deepro recovers APT campaigns through correlating detected entities based on their causality relationships in the provenance graph. We evaluated Deepro with ten real-world APT attacks. The evaluation result shows that Deepro can effectively detect attack events with an average 98.81\% F1-score and thus produces precise provenance sub-graphs of APT attacks.

Authored by Na Yan, Yu Wen, Luyao Chen, Yanna Wu, Boyang Zhang, Zhaoyang Wang, Dan Meng

How, Where, and Why Data Provenance Improves Query Debugging: A Visual Demonstration of Fine–Grained Provenance Analysis for SQL

Provenance 2022 - Data provenance is meta–information about the origin and processing history of data. We demonstrate the provenance analysis of SQL queries and use it for query debugging. How–provenance determines which query expressions have been relevant for evaluating selected pieces of output data. Likewise, Where– and Why–provenance determine relevant pieces of input data. The combined provenance notions can be explored visually and interactively. We support a feature–rich SQL dialect with correlated subqueries and focus on bag semantics. Our ﬁne–grained provenance analysis derives individual data provenance for table cells and SQL expressions.

Authored by Tobias Muller, Pascal Engel

ProProv: A Language and Graphical Tool for Specifying Data Provenance Policies

Privacy Policies and Measurement - The Function-as-a-Service cloud computing paradigm has made large-scale application development convenient and efficient as developers no longer need to deploy or manage the necessary infrastructure themselves. However, as a consequence of this abstraction, developers lose insight into how their code is executed and data is processed. Cloud providers currently offer little to no assurance of the integrity of customer data. One approach to robust data integrity verification is the analysis of data provenance—logs that describe the causal history of data, applications, users, and non-person entities. This paper introduces ProProv, a new domain-specific language and graphical user interface for specifying policies over provenance metadata to automate provenance analyses.

Authored by Kevin Dennis, Shamaria Engram, Tyler Kaczmarek, Jay Ligatti

DIComP: Lightweight Data-Driven Inference of Binary Compiler Provenance with High Accuracy

Binary analysis is pervasively utilized to assess software security and test vulnerabilities without accessing source codes. The analysis validity is heavily influenced by the inferring ability of information related to the code compilation. Among the compilation information, compiler type and optimization level, as the key factors determining how binaries look like, are still difficult to be inferred efficiently with existing tools. In this paper, we conduct a thorough empirical study on the binary's appearance under various compilation settings and propose a lightweight binary analysis tool based on the simplest machine learning method, called DIComP to infer the compiler and optimization level via most relevant features according to the observation. Our comprehensive evaluations demonstrate that DIComP can fully recognize the compiler provenance, and it is effective in inferring the optimization levels with up to 90% accuracy. Also, it is efficient to infer thousands of binaries at a millisecond level with our lightweight machine learning model (1MB).

Authored by Ligeng Chen, Zhongling He, Hao Wu, Fengyuan Xu, Yi Qian, Bing Mao