Spotlight on Lablet Research #10 - Model-Based Explanation for Human-in-the-Loop Security

Spotlight on Lablet Research #10 -

Model-Based Explanation for Human-in-the-Loop Security

Lablet: Carnegie Mellon University

An effective response to security attacks often requires a combination of both automated and human-mediated actions. Currently, we lack adequate methods to reason about such human-system coordination, including ways to determine when to allocate tasks to each party and how to gain assurance that automated mechanisms are appropriately aligned with organizational needs and policies. This project focuses on combining human and automated actions in response to security attacks and will show how probabilistic models and model checkers can be used both to synthesize complex plans that involve a combination of human and automated actions, as well as to provide human-understandable explanations of mitigation plans proposed or carried out by the system.

Models that support attack-resiliency in systems need to address the allocation of tasks to humans and systems, and how the mechanisms align with organizational policies. These models include, for example, identification of when and how systems and humans should cooperate, how to provide self-explanation to support human hand-offs, and ways to assess the overall effectiveness of coordinated human-system approaches for mitigating sophisticated threats. In this project, the research team, led by Principal Investigator (PI) David Garlan, developed a model-based approach to: 1) reason about when and how systems and humans should cooperate with each other; 2) improve human understanding and trust in automated behavior through self-explanation; and 3) provide mechanisms for humans to correct a system's automated behavior when it is inappropriate. The team explored the effectiveness of the techniques in the context of coordinated system-human approaches for mitigating Advanced Persistent Threats (APTs).

As self-security becomes more automated, it becomes harder for humans who interact with the autonomous system to understand the behavior of the systems. Particularly while optimizing for multiple quality objectives and acting under uncertainty, it can be difficult for humans to understand the system behavior generated by an automated planner. The CMU researchers developed an approach with tool support that aims at clarifying system behavior through interactive explanation by allowing end-users to ask Why and Why-Not questions about specific behaviors of the system, and providing answers in the form of contrastive explanation. They further designed and piloted a human study to understand the effectiveness of explanations for human operators.

The research team conducted a human-subject experiment to evaluate the effectiveness of their explainable planning approach. The experimental design is based on one of the envisioned use-cases of the generated explanation. The use-case scenario aims to address the problem of potential misalignment between the end-user's objectives and the planning agent's reward function, particularly in multi-objective planning settings. The use-case of the explanation approach is to enable the end-user of a planning-based system to identify preference alignment or misalignment between the user's objectives and the planning agent's reward function. The main hypothesis of this experimental study is that the explainable planning approach can improve the end-user's ability to identify such preference (mis)alignment in multi-objective planning settings. This was divided into two sub-hypothesis - H1: Participants who receive the explanations are significantly more likely to correctly determine whether the robot's proposed plan is in line with their (given) preference; and H2: Participants who receive the explanations have significantly more reliable confidence in their determination (i.e., have higher confidence-weighted scores). In the study, each participant is prescribed a fixed preference over the 3 objectives. The participant is told that the robot may or may not be optimizing for a reward function that aligns with their prescribed preference. When presented with a navigation plan from the robot, the participant is asked to indicate whether they think the robot's plan is the best available option with respect to their prescribed preference. The study participants were divided into 2 groups. The control group did not receive an explanation from the robot, except the prediction of how much time the navigation would take, and how safe and intrusive the navigation is. The experimental group received a full explanation from the robot. The team measured the participant's accuracy of identifying preference alignment and misalignment, the confidence in their answer, and the amount of time they take to come to an answer. The experiment was conducted using Amazon’s Mechanical Turk, and the results that received were compelling:

  • H1: The experiment showed that the team’s explanation has a significant effect on the participants' correctness. Overall, the odds of the participants in the treatment group being correct is on average 3.8 times higher than that of the participants in the control group, with 95% confidence interval [2.04, 7.07]. (The fixed-effect logistic regression coefficient estimate is 1.335 and the standard error is 0.317.) Overall, H1 is supported.
  • H2: The team explanation had a significant effect on the participants' reliable confidence. Overall, the confidence-weighted score of the participants in the treatment group is on average 1.73 higher than that of the participants in the control group, on the scale of 4 to +4, with 95% confidence interval [1.04, 2.42]. (The fixed-effect linear regression coefficient estimate is 1.727 and the standard error is 0.351.) Overall, H2 is supported.

The researchers are also continuing to work on how to use graceful degradation to respond to security attacks, using DataLog as a reasoning engine. They improved the performance of the code by about a factor of about 100-500 through adding richness and realism to the architectural styles, as well as rewriting much of the attack trace generation code. Attack traces now overlay data flows across networks, to evaluate the effects of attacks on data flows (and their associated confidentiality, integrity, and availability security attributes), rather than simply network components, bringing the modeling much closer to the proper way to understand the impact of an attack. The researchers also added richness to a new Functional Perspective, refining and improving on aspects of the DoD Architecture Framework (DoDAF).

In the context of Markov Decision Process (MDP) planning, manually inspecting the solution policy and its value function to gain such understanding is infeasible due to the lack of domain semantics and concepts in which the end-users are interested. They also lack information about which, if any, of the objectives are conflicting in a problem instance, and what compromises had to be made. The team investigated an approach for generating an automated explanation of an MDP policy that is based on: (i) describing the expected consequences of the policy in terms of domain-specific, human-concept values, and relating those values to the overall expected cost of the policy, and (ii) explaining any tradeoff by contrasting the policy to counterfactual solutions (i.e., alternative policies that were not generated as a solution) on the basis of their human-concept values and the corresponding costs. The team demonstrates their approach on MDP problems with two different cost criteria, namely, the expected total-cost and average-cost criteria. Such an approach enhances resilient architectures by helping to explain and have stakeholders explore the decision-making process that goes into automated planning for maintaining system resilience.

Additional detail on this project can be found here.

Submitted by Anonymous on