Publications | Science of Security Virtual Organization

Cloud-Fog Trustworthy Computing for Information Sharing in Dynamic IoT System

Fog computing moves computation from the cloud to edge devices to support IoT applications with faster response times and lower bandwidth utilization. IoT users and linked gadgets are at risk to security and privacy breaches because of the high volume of interactions that occur in IoT environments. These features make it very challenging to maintain and quickly share dynamic IoT data. In this method, cloud-fog offers dependable computing for data sharing in a constantly changing IoT system. The extended IoT cloud, which initially offers vertical and horizontal computing architectures, then combines IoT devices, edge, fog, and cloud into a layered infrastructure. The framework and supporting mechanisms are designed to handle trusted computing by utilising a vertical IoT cloud architecture to protect the IoT cloud after the issues have been taken into account. To protect data integrity and information flow for different computing models in the IoT cloud, an integrated data provenance and information management method is selected. The effectiveness of the dynamic scaling mechanism is then contrasted with that of static serving instances.

Authored by Bommi Prasanthi, Dharavath Veeraswamy, Sravan Abhilash, Kesham Ganesh

Should robots indicate the trustworthiness of information from knowledge graphs?

The demo presents recent work on social robots that provide information from knowledge graphs in online graph databases. Sometimes more cooperative responses can be generated by using taxonomies and other semantic metadata that has been added to the knowledge graphs. Sometimes metadata about data provenance suggests higher or lower trustworthiness of the data. This raises the question whether robots should indicate trustworthiness when providing the information, and whether this should be done explicitly by meta-level comments or implicitly for example by modulating the robots’ tone of voice and generating positive and negative affect in the robots’ facial expressions.

Authored by Graham Wilcock, Kristiina Jokinen

Blockchain-Based Transparency and Data Provenance in the Wine Value Chain

Provenance 2022 - The food market is changing dramatically in the last century as the world population is growing with the unprecedented pace. The wine industry is recognized as part of both agriculture and food industry, but also as a commodity. Developments in information technology and digitalization are playing a major role in the introduction of new solutions in agriculture and food production. The idea is to improve productivity of farms and vineyards, improve quality of agriculture products by optimizing irrigation, pesticide usage, and overall efficiency of the process. Furthermore, the consumer awareness about food products, its quality and origin, is on the constant rise. The information about the product throughout the whole “farm-to-fork”, or in this case “vineyard-to-glass”, value chain needs to be collected and utilized by all the participating stakeholders, in order to get a better, healthier, and more affordable product. This paper address the considerations related to implementation of a blockchain-based transparency and data provenance in the food value chain, more specifically with a focus on the wine industry.

Authored by Tomo Popovic, Srdjan Krco, Nemanja Misic, Aleksandra Martinovic, Ivan Jovovic

A Domain-Specific Composition Environment for Provenance Query of Scientific Workflows

Provenance 2022 - Scientiﬁc Workﬂow Management Systems (SWfMS) systematically capture and store diverse provenance information at various phases. Scientists compose multitude of queries on this information. The support of integrated query composition and visualization in existing SWfMS is limited. Most systems do not support any custom query composition. VisTrails and Taverna introduced custom query languages vtPQL and TriQL to support limited workﬂow monitoring. Galaxy only tracks histories of operations and displays in lists. No SWfMS supports a scientistfriendly user interface for provenance query composition and visualization. In this paper, we propose a domain-speciﬁc composition environment for provenance query of scientiﬁc workﬂows. As a proof of concept, we developed a provenance system for bioinformatics workﬂow management system and evaluated it in multiple dimensions, one for measuring the subjective perception of participants on the usability of it using NASA-TLX and SUS survey instruments and the other for measuring the ﬂexibility through plugin integration using NASA-TLX.

Authored by Muhammad Hossain, Banani Roy, Chanchal Roy, Kevin Schneider

ProProv: A Language and Graphical Tool for Specifying Data Provenance Policies

Provenance 2022 - The Function-as-a-Service cloud computing paradigm has made large-scale application development convenient and efficient as developers no longer need to deploy or manage the necessary infrastructure themselves. However, as a consequence of this abstraction, developers lose insight into how their code is executed and data is processed. Cloud providers currently offer little to no assurance of the integrity of customer data. One approach to robust data integrity verification is the analysis of data provenance—logs that describe the causal history of data, applications, users, and non-person entities. This paper introduces ProProv, a new domain-specific language and graphical user interface for specifying policies over provenance metadata to automate provenance analyses.

Authored by Kevin Dennis, Shamaria Engram, Tyler Kaczmarek, Jay Ligatti

A Conceptual Framework for Automated Rule Generation in Provenance-based Intrusion Detection Systems

Provenance 2022 - Traditional Intrusion Detection Systems (IDS) are struggling to keep up with the increase in sophisticated cyberattacks such as Advanced Persistent Threats (APT) over the past years. Provenance-based Intrusion Detection Systems (PIDS) utilize data provenance concepts to enable ﬁne-grained event correlation, and the results show increased detection accuracy and reduced false-alarm rates compared to traditional IDS. Especially, rule-based approaches for the PIDS have demonstrated high detection accuracy, low false alarm, and fast detection time. However, rules are manually created by security experts, which is time-consuming and doesn’t ensure high-quality rule standards. To address this issue, we propose an automated rule generation framework to generate robust rules to describe malicious ﬁles automatically. As a result, high-quality rules can be used in PIDS to identify similar attacks and other affected systems promptly.

Authored by Michael Zipperle, Florian Gottwalt, Yu Zhang, Omar Hussain, Elizabeth Chang, Tharam Dillon

An Interaction Provenance-based Trust Management Scheme For Connected Vehicles

Provenance 2022 - Connected vehicles (CVs) have facilitated the development of intelligent transportation system that supports critical safety information sharing with minimum latency. However, CVs are vulnerable to different external and internal attacks. Though cryptographic techniques can mitigate external attacks, preventing internal attacks imposes challenges due to authorized but malicious entities. Thwarting internal attacks require identifying the trustworthiness of the participating vehicles. This paper proposes a trust management framework for CVs using interaction provenance that ensures privacy, considers both in-vehicle and vehicular network security incidents, and supports flexible security policies. For this purpose, we present an interaction provenance recording and trust management protocol. Different events are extracted from interaction provenance, and trustworthiness is calculated using fuzzy policies based on the events.

Authored by Mohammad Hoque, Ragib Hasan

Deepro: Provenance-based APT Campaigns Detection via GNN

Provenance 2022 - Advanced Persistent Threats (APTs) are typically sophisticated, stealthy and long-term attacks that are difficult to be detected and investigated. Recently proposed provenance graph based on system audit logs has become an important approach for APT detection and investigation. However, existing provenance-based approaches that either require rules based on expert knowledge or cannot pinpoint attack events in a provenance graph still cannot effectively mitigate APT attacks. In this paper, we present Deepro, a provenance-based APT campaign detection approach that not only effectively detects attack-relevant entities in a provenance graph but also precisely recovers APT campaigns based on the detected entities. Specifically, Deepro first customizes a general purpose GNN (Graph Neural Network) model to represent and detect process nodes in a provenance graph through automatically learning different patterns of attack behaviors and benign behaviors using the rich contextual information in the provenance graph. Then, Deepro further detects attack-relevant file and network entities according to their data dependencies with the detected process nodes. Finally, Deepro recovers APT campaigns through correlating detected entities based on their causality relationships in the provenance graph. We evaluated Deepro with ten real-world APT attacks. The evaluation result shows that Deepro can effectively detect attack events with an average 98.81\% F1-score and thus produces precise provenance sub-graphs of APT attacks.

Authored by Na Yan, Yu Wen, Luyao Chen, Yanna Wu, Boyang Zhang, Zhaoyang Wang, Dan Meng

How, Where, and Why Data Provenance Improves Query Debugging: A Visual Demonstration of Fine–Grained Provenance Analysis for SQL

Provenance 2022 - Data provenance is meta–information about the origin and processing history of data. We demonstrate the provenance analysis of SQL queries and use it for query debugging. How–provenance determines which query expressions have been relevant for evaluating selected pieces of output data. Likewise, Where– and Why–provenance determine relevant pieces of input data. The combined provenance notions can be explored visually and interactively. We support a feature–rich SQL dialect with correlated subqueries and focus on bag semantics. Our ﬁne–grained provenance analysis derives individual data provenance for table cells and SQL expressions.

Authored by Tobias Muller, Pascal Engel

ProProv: A Language and Graphical Tool for Specifying Data Provenance Policies

Privacy Policies and Measurement - The Function-as-a-Service cloud computing paradigm has made large-scale application development convenient and efficient as developers no longer need to deploy or manage the necessary infrastructure themselves. However, as a consequence of this abstraction, developers lose insight into how their code is executed and data is processed. Cloud providers currently offer little to no assurance of the integrity of customer data. One approach to robust data integrity verification is the analysis of data provenance—logs that describe the causal history of data, applications, users, and non-person entities. This paper introduces ProProv, a new domain-specific language and graphical user interface for specifying policies over provenance metadata to automate provenance analyses.

Authored by Kevin Dennis, Shamaria Engram, Tyler Kaczmarek, Jay Ligatti

DIComP: Lightweight Data-Driven Inference of Binary Compiler Provenance with High Accuracy

Binary analysis is pervasively utilized to assess software security and test vulnerabilities without accessing source codes. The analysis validity is heavily influenced by the inferring ability of information related to the code compilation. Among the compilation information, compiler type and optimization level, as the key factors determining how binaries look like, are still difficult to be inferred efficiently with existing tools. In this paper, we conduct a thorough empirical study on the binary's appearance under various compilation settings and propose a lightweight binary analysis tool based on the simplest machine learning method, called DIComP to infer the compiler and optimization level via most relevant features according to the observation. Our comprehensive evaluations demonstrate that DIComP can fully recognize the compiler provenance, and it is effective in inferring the optimization levels with up to 90% accuracy. Also, it is efficient to infer thousands of binaries at a millisecond level with our lightweight machine learning model (1MB).

Authored by Ligeng Chen, Zhongling He, Hao Wu, Fengyuan Xu, Yi Qian, Bing Mao